HyperAIHyperAI

Command Palette

Search for a command to run...

UK-LLM Unveils Welsh AI Model with NVIDIA Nemotron to Preserve Minority Languages and Boost Public Services

Reaching Across the Isles: UK-LLM Brings AI to UK Languages With NVIDIA Nemotron Celtic languages — including Cornish, Irish, Scottish Gaelic, and Welsh — are the U.K.’s oldest living languages. To support their speakers, the UK-LLM sovereign AI initiative is developing an AI model based on NVIDIA Nemotron that can reason in both English and Welsh, a language spoken by approximately 850,000 people in Wales today. Enabling high-quality AI reasoning in Welsh will help deliver public services such as healthcare, education, and legal resources in the language, ensuring broader access and inclusivity. “I want every corner of the U.K. to be able to harness the benefits of artificial intelligence. By enabling AI to reason in Welsh, we’re making sure that public services — from healthcare to education — are accessible to everyone, in the language they live by,” said U.K. Prime Minister Keir Starmer. “This is a powerful example of how the latest AI technology, trained on the U.K.’s most advanced AI supercomputer in Bristol, can serve the public good, protect cultural heritage, and unlock opportunity across the country.” The UK-LLM project, originally launched in 2023 as BritLLM and led by University College London, has previously released two models for U.K. languages. Its latest model for Welsh, developed in collaboration with Bangor University and NVIDIA, aligns with the Welsh government’s Cymraeg 2050 initiative, which aims to reach one million Welsh speakers by 2050. U.K.-based AI cloud provider Nscale will make the new model available to developers through its application programming interface (API). “The aim is to ensure that Welsh remains a living, breathing language that continues to develop with the times,” said Gruffudd Prys, senior terminologist and head of the Language Technologies Unit at Canolfan Bedwyr, the university’s center for Welsh language services, research, and technology. “AI shows enormous potential to help with second-language acquisition of Welsh as well as for enabling native speakers to improve their language skills.” This new model could also enhance access to Welsh resources by allowing public institutions and businesses in Wales to translate content or offer bilingual chatbot services. This supports healthcare providers, educators, broadcasters, retailers, and restaurants in delivering services in both Welsh and English. Beyond Welsh, the UK-LLM team plans to apply the same methodology to develop AI models for other U.K. languages, including Cornish, Irish, Scots, and Scottish Gaelic, and to collaborate internationally on models for African and Southeast Asian languages. “This collaboration with NVIDIA and Bangor University enabled us to create new training data and train a new model in record time, accelerating our goal to build the best-ever language model for Welsh,” said Pontus Stenetorp, professor of natural language processing and deputy director of the Centre for Artificial Intelligence at University College London. “Our aim is to take the insights gained from the Welsh model and apply them to other minority languages, in the U.K. and across the globe.” Leveraging Sovereign AI Infrastructure for Model Development The new Welsh model is built on NVIDIA Nemotron, a family of open-source models featuring open weights, datasets, and training recipes. The UK-LLM team used the 49-billion-parameter Llama Nemotron Super model and the 9-billion-parameter Nemotron Nano model, fine-tuning them on Welsh-language data. Due to limited existing Welsh training data, the team used NVIDIA NIM microservices to translate over 30 million entries from English to Welsh using models like gpt-oss-120b and DeepSeek-R1. They then trained the model on a GPU cluster via the NVIDIA DGX Cloud Lepton platform and used hundreds of NVIDIA GH200 Grace Hopper Superchips on Isambard-AI — the U.K.’s most powerful supercomputer, backed by £225 million in government funding and hosted at the University of Bristol. This new dataset supplements existing Welsh language resources from previous UK-LLM efforts. Capturing Linguistic Nuances With Careful Evaluation Bangor University, located in Gwynedd — the Welsh-speaking county with the highest proportion of native speakers — is providing essential linguistic and cultural expertise. Gruffudd Prys and his team are verifying the accuracy of machine-translated training data, manually translating evaluation sets, and assessing how the model handles complex Welsh features such as initial consonant mutations. The model, along with the Welsh training and evaluation datasets, will be made available for public and enterprise use, supporting further research, model development, and application innovation. “It’s one thing to have this AI capability exist in Welsh, but it’s another to make it open and accessible for everyone,” Prys said. “That subtle distinction can be the difference between this technology being used or not being used.” Deploy Sovereign AI Models With NVIDIA Nemotron, NIM Microservices The framework used to develop the Welsh model can serve as a blueprint for multilingual AI development worldwide. NVIDIA Nemotron models, data, and training recipes are publicly available, enabling developers to build reasoning models tailored to any language, domain, or workflow. Packaged as NVIDIA NIM microservices, these models are optimized for cost-effective compute and can run anywhere — from laptops to cloud environments. European enterprises will soon be able to run open, sovereign models on the Perplexity AI-powered search engine. Get started with NVIDIA Nemotron.

Related Links