Digital Event Horizon
Revolutionizing Language Accessibility: UK-LLM Brings AI to Welsh and Other Celtic Languages
The UK's latest innovation in Artificial Intelligence aims to empower speakers of Welsh and other Celtic languages, enabling high-quality AI reasoning and opening doors to new public services, research, and cultural opportunities.
NVIDIA has collaborated with University College London (UCL) and Bangor University to develop an AI model that can reason in both English and Welsh. The project, UK-LLM, aims to revolutionize language accessibility for public services, research, and cultural institutions. The AI model is designed to capture the nuances of Welsh, one of the UK's oldest living languages. The collaboration uses NVIDIA's open-source framework Nematron to build reasoning models tailored to virtually any language. The project aims to boost the active use of Welsh, with a goal of achieving a million speakers by 2050. UK-LLM is a significant step towards this goal, providing high-quality AI reasoning in Welsh for public services and research. The project has accelerated translation and training workloads using NVIDIA's DGX Cloud Lepton platform and Isambard-AI. The team has created a comprehensive dataset to train and test the AI model, supplementing existing Welsh data. The collaboration aims to develop AI models for other languages spoken across the UK and internationally.
NVIDIA's latest collaboration with University College London (UCL) and Bangor University has brought a groundbreaking AI model to the market, specifically designed to reason in both English and Welsh. The project, dubbed UK-LLM, is poised to revolutionize language accessibility, offering a new frontier for public services, research, and cultural institutions.
At the heart of this innovation lies NVIDIA Nemotron, an open-source framework that enables developers to build reasoning models tailored to virtually any language, domain, and workflow. By leveraging this technology, UCL and Bangor University have created a cutting-edge AI model capable of capturing the nuances of Welsh, one of the UK's oldest living languages.
Welsh, with approximately 850,000 speakers in Wales, has long been a language in need of support. The UK government's Cymraeg 2050 initiative aims to boost the active use of the language, with the goal of achieving a million speakers by 2050. UK-LLM is a significant step towards this goal, providing a platform for high-quality AI reasoning in Welsh that can be used to develop public services, research, and cultural resources.
The project's development has been made possible through the collaboration of two institutions: University College London (UCL) and Bangor University. UCL, with its Centre for Artificial Intelligence, and Bangor University, home to the Language Technologies Unit at Canolfan Bedwyr, have joined forces to develop an AI model that can reason in both English and Welsh.
By harnessing the power of NVIDIA's DGX Cloud Lepton platform and Isambard-AI, the UK-LLM team has accelerated their translation and training workloads. The platform features a GPU cluster and hundreds of NVIDIA GH200 Grace Hopper Superchips, making it an ideal choice for large-scale AI model development.
The new dataset supplements existing Welsh data from previous efforts, providing a comprehensive foundation for training and testing the AI model. Bangor University's senior terminologist and head of the Language Technologies Unit, Gruffudd Prys, has played a crucial role in verifying the accuracy of machine-translated training data and manually translated evaluation data.
Prys' experience with language technology for Welsh spans over two decades, bringing invaluable expertise to the collaboration. His team at Canolfan Bedwyr has helped assess how the model handles nuances of Welsh that AI typically struggles with, such as consonant changes based on neighboring words.
The UK-LLM project's success is a testament to NVIDIA's commitment to empowering developers and researchers through its open-source models, data, and recipes. The Nemotron framework is designed to be cost-effective and run anywhere, from laptop to cloud, making it accessible to enterprises across Europe.
Beyond Welsh, the UK-LLM team aims to develop AI models for other languages spoken across the UK, including Cornish, Irish, Scots, and Scottish Gaelic. They also plan to collaborate with international partners to build models for languages from Africa and Southeast Asia.
The collaboration between NVIDIA, UCL, and Bangor University has enabled the creation of a new training dataset in Welsh, post-trained on the 49-billion-parameter Llama Nemotron Super model and the 9-billion-parameter Nematron Nano model. This approach has accelerated the project's goal to build the best-ever language model for Welsh.
The UK-LLM project is poised to unlock new opportunities for public services, research, and cultural institutions across the UK. By enabling high-quality AI reasoning in Welsh, the initiative aims to make public services more accessible to everyone, regardless of their linguistic background.
This innovative collaboration between NVIDIA, UCL, and Bangor University has brought a groundbreaking AI model to the market, specifically designed to reason in both English and Welsh. The project's success is a testament to the power of open-source innovation and collaboration, offering new possibilities for language accessibility and cultural preservation.
Related Information:
https://www.digitaleventhorizon.com/articles/Empowering-the-Celtic-Languages-UK-LLM-Pioneers-AI-Revolution-for-Welsh-and-Beyond-deh.shtml
Published: Sat Sep 13 21:44:24 2025 by llama3.2 3B Q4_K_M