Why Rwanda’s AI Innovators are Struggling to Scale

AI Quick Summary
- Rwanda's AI innovators face major challenges in developing localized AI systems, primarily due to a severe lack of quality, structured, and legally accessible local data.
- Companies like Yali Labs struggle with building Kinyarwanda Large Language Models (LLMs) because insufficient data leads to inaccurate predictions and high training costs.
- The high financial impact for startups is exacerbated by the need for expensive external GPU infrastructure, as local facilities are unavailable.
- Other developers note that public documents are often in non-machine-readable formats (e.g., PDFs, scanned images), hindering AI data preparation.
- The Rwandan Ministry of ICT and Innovation is addressing these issues with a National Data Sharing Policy (approved in 2025), a Data Governance Unit, and a Centralized Data Sharing Platform currently under development.
The Centralized Data Sharing Platform is expected to be fully operational by 2026, and Yali Labs launched its ALTA Foundry in January 2026 to help convert raw data into high-quality training data for Kinyarwanda AI models.
Rwanda’s growing community of artificial intelligence (AI) innovators is raising concerns about challenges that are slowing progress on locally relevant AI systems. At the center of the problem is data — the essential ingredient AI needs to learn and perform tasks like language understanding and prediction.
The Challenge of Localized Models
For Philbert Murwanashyaka, co‑founder of Yali Labs, the issue has been both technical and costly. His team began by developing what he calls Rwanda’s first Kinyarwanda AI tokenizer, a tool that helps break down text into pieces AI can understand. The aim was to build on that foundation and create a Kinyarwanda Large Language Model (LLM), an AI system capable of human‑level conversation in the local language.
But progress has been difficult.
“We were spending a lot of money training the model with less data, and at some point, we could fail,” Murwanashyaka said. “Because of the lack of quality and enough data, the model could predict things that did not make sense.”
Why Data Matters for AI
AI systems like LLMs require large volumes of clean, structured, and legally accessible data to learn patterns in language, grammar, meaning, and context. When data is scattered across different sources, poorly formatted, or locked behind copyright restrictions, the AI struggles to learn correctly.
Murwanashyaka explained that while some public institutions initially offered to share linguistic resources, the process often stalled before data could be released. In other cases, developers were told to rely on publicly available web content.
High Costs and Technical Limits
The financial impact has been heavy. Yali Labs has spent close to half a million US dollars training their AI model, a significant amount for a startup. Training LLMs requires GPU (Graphics Processing Unit) infrastructure, which accelerates the complex calculations necessary for AI learning. Because they lack local GPUs, the team had to partner with external providers, increasing training costs.
“Every training cycle came with a cost. We don’t have our own GPUs here,” Murwanashyaka said. “We count it as a loss because we never got the accurate model we wanted.”
Broader Views from the Tech Community
Other AI developers in Rwanda share similar concerns. Audace Niyonkuru, CEO of Digital Umuganda, pointed out that many public documents are stored as PDFs or scanned images, which are not easily readable by AI. Developers need machine‑readable formats that can be cleaned, labeled, and standardized before training can begin.
Without structured data, even the most skilled engineers cannot build reliable AI systems; a challenge that echoes across many African countries trying to develop local AI solutions.
Government Response and Future Outlook
The Ministry of ICT and Innovation recognizes the issue and says steps are being taken to overcome it. A National Data Sharing Policy approved in 2025 now provides a legal framework for data exchange between public institutions. A Data Governance Unit has also been established to standardize and manage datasets, while ensuring compliance with Rwanda’s data protection laws.
Officials say a Centralized Data Sharing Platform is in the final stages of development. Once launched, it will allow startups, researchers, and companies to discover and request high‑quality public datasets more easily through a unified interface. Collaborations are also underway with the Rwanda Cultural Heritage Academy to form the basis of richer Kinyarwanda datasets.
What This Means for Rwanda
AI tailored to local languages and challenges could have real benefits in education, public services, health, and business. But to reach that potential, Rwanda must continue building data infrastructure, legal clarity, and open collaboration between public institutions and private innovators.
By addressing these barriers, Rwanda can empower its tech community to develop homegrown AI solutions that reflect local needs, culture, and language positioning the country as a leader in responsible and inclusive AI development on the continent.
Source: The New Times
If you enjoyed this article, follow us on WhatsApp for daily tech updates. If you have an idea, need to be featured or need to partner, reach out to us at editorial@techinika.com or use our contact page.
Don't let the story end here.
Join 12+ others discussing this topic. Share your thoughts, ask questions, and connect with the community.
Up Next
How L-Guard is Re-Engineering Africa’s Road Safety EcosystemBy Kellycie Bayingana • 5 minutes read

