WAXAL Launch: Africa's Open Speech Dataset for 21+ Languages

AI Quick Summary
- Google and African research institutions launched WAXAL, a large-scale, open speech dataset on February 2, 2026, to address the lack of high-quality data for Africa's over 2,000 languages in voice-enabled technology.
- The three-year project was led by regional powerhouses like Digital Umuganda (Rwanda), Makerere University (Uganda), and the University of Ghana & Media Trust (Nigeria), who retain full legal ownership of the data.
- WAXAL adds 1,250 hours of natural, transcribed speech for 21 new African languages, complementing existing data like 2,400 hours for Kinyarwanda, to an open-source library.
- It provides Automatic Speech Recognition (ASR) to enable AI to understand local accents and dialects, and High-Fidelity Synthetic Voices (TTS) for natural, human-like computer speech.
- The initiative aims to empower African students, researchers, and entrepreneurs to build inclusive, localized voice-activated technologies for healthcare, banking, and education, impacting over 100 million people.
As of February 3, 2026, WAXAL's launch is very recent news, and no significant post-publication updates have been widely reported yet.
On February 2, 2026 Google and a consortium of leading African research institutions officially announced the launch of WAXAL, a large-scale, openly accessible speech dataset. While the world has become accustomed to voice-enabled technology, a profound scarcity of high-quality data has historically excluded Africa’s 2,000+ languages, leaving hundreds of millions of people unable to access the digital world in their native tongues.
Today’s release is the result of three years of intensive research and fieldwork aimed at dismantling this "data desert." The project has officially moved from a lab-based initiative to a live, sovereign digital infrastructure.
The Architects of the Infrastructure
This wasn’t a project built for the continent from the outside; it was a collaboration led by regional powerhouses who now retain full legal ownership of the data. This framework ensures that Africa’s linguistic backbone remains an African asset.
- Digital Umuganda (Rwanda): As the flagship partner, they utilized their "Mbaza" and "Afrivoice" pipelines to set the project's pace. Their foundation of 2,400+ hours of open-source Kinyarwanda speech data provided the technical blueprint for the entire continent.
- Makerere University (Uganda): The AI Lab in Kampala spearheaded the technical architecture for the East African region, ensuring the recordings met global machine-learning standards.
- University of Ghana & Media Trust (Nigeria): These institutions led the massive data collection pipelines across West Africa for languages like Yoruba and Hausa.
The Linguistic Map: 21+ Languages
The new WAXAL additions add 1,250 hours of natural, transcribed speech to the open-source library. These 21 languages now sit alongside Kinyarwanda, which has already set the pace for the continent with 2,400 hours of open-source speech data.
The dataset covers the following languages: Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili, and Yoruba.
The Tech Breakdown
To build inclusive tools, WAXAL provides the two critical layers of technology:
- Automatic Speech Recognition (ASR): This is the computer’s "ear." By capturing natural speech in real-world environments like markets and streets, WAXAL allows developers to build AI that understands authentic local accents and dialects.
- High-Fidelity Synthetic Voices (TTS): This is the computer’s "mouth." With over 20 hours of studio-quality recordings, WAXAL enables devices to talk back in a natural, human-like voice, which is essential for reaching populations with lower literacy levels.
Impact and the Future
The launch of WAXAL marks a definitive shift in the digital landscape. As Aisha Walcott-Bryant, Head of Google Research Africa, noted:
“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people.”
This is going to flip the script for African users and developers alike. By moving this foundation into local hands, WAXAL has officially become the new backbone for a continent-wide digital economy. Local innovators can now build voice-activated healthcare, banking, and education tools without relying on foreign gatekeepers. As of today, the conversation in African tech is finally being led by the people who live it, ensuring that a more inclusive digital future is built by Africa, for Africa.
Download the dataset and learn more: WAXAL on Hugging Face.
If you enjoyed this article, follow us on WhatsApp for daily tech updates. If you have an idea, need to be featured or need to partner, reach out to us at editorial@techinika.com or use our contact page.
Don't let the story end here.
Join 12+ others discussing this topic. Share your thoughts, ask questions, and connect with the community.
Up Next
Ethiopia to Launch Its First AI University to Drive Tech-Led GrowthBy ISHIMWE Jean Claude • 2 minutes read


