In November 2024, Saheed Azeez while he was working on his project Naijaweb— Zaheed had a dataset of 230 million GPT-2 tokens sourced from Nairaland—he shrugged off the complexity with casual ease. “It’s just web scraping,” he said, as if building one of Nigeria’s largest GPT-2 datasets was a weekend hobby. But behind that simplicity was a mind constantly reaching for the next challenge.
Fast forward to his latest project, Azeez’s new passion project, YarnGPT, reveals a different story—one filled with grit, failures, and breakthroughs. Unlike Naijaweb, YarnGPT isn’t just about data. It’s a text-to-speech AI model designed to read text aloud in authentic Nigerian accents.
Why YarnGPT Matters
At first glance, a Nigerian-accented text-to-speech model might not seem revolutionary. After all, AI can generate lifelike voices in seconds these days. But when you consider two key facts, YarnGPT becomes extraordinary:
- Azeez is a university student in Nigeria, juggling classes, limited resources, and tight budgets.
- Capturing the rich, nuanced diversity of Nigerian accents is technically challenging.
From complex mathematical algorithms to the painstaking process of tokenizing audio data, YarnGPT was anything but simple. Even Azeez, known for his modesty, admitted, “It was quite tasking, especially gathering the data needed to make this happen.”
How YarnGPT Came to Life
The spark for YarnGPT was lit by Naijaweb’s unexpected success. “The amount of conversations and interest people had in Naijaweb was a great motivation. Imagine getting featured on Techpoint Africa; it pushed me to do more,” Azeez shared.
But motivation also came from failure. After a disappointing job interview with a Nigerian AI company, Azeez realized he needed to sharpen his skills. YarnGPT became both a project and a personal proving ground.
The first hurdle? Data.
Building an AI model that sounds Nigerian required vast audio datasets. Azeez turned to Nollywood, extracting audio and subtitles from online movies. Nollywood churns out over 2,500 films annually, yet quality data was scarce.
“The problem with building in Nigeria is data. Replicating what’s been built overseas isn’t that hard, but data always gets in the way,” he explained.
Subtitles were often inaccurate, and audio quality varied wildly. To bridge the gap, Azeez leaned on Hugging Face, an open-source platform for machine learning. He mixed Nigerian movie audio with high-quality datasets from Hugging Face to train his model.
The Cost of Ambition
Without a personal GPU, Azeez relied on cloud computing platforms like Google Colab. The first attempt cost him $50 (about ₦80,000)—a hefty sum for a university student. Worse still, the model didn’t work as expected.
“The $50 cloud credit was burnt just like that. It was painful,” he admitted.
But setbacks didn’t deter him. Azeez discovered Oute AI, a platform with an autoregressive text-to-speech model.
“Basically, you give the model a piece of text, and it predicts one word at a time, adding each word back to the text to predict the next. It’s like how ChatGPT completes sentences,” he explained.
The Hard Part: Tokenizing Sound
While Oute AI provided a framework, Azeez still had to build his own model. Using SmolLM2-360M from Hugging Face, he added speech functionality—a process that required algorithmic wizardry and another $50 for training.
Training took three days.
Tokenization was the backbone of YarnGPT. Large language models (LLMs) process numbers, not words.
“If we tokenized the word CALCULATED, we could split it into four tokens: CAL-CU-LA-TED. A number is assigned to each token,” Azeez explained.
But tokenizing audio is a different beast.
“Audio doesn’t have natural breaks like text. So, we break continuous sound waves into smaller pieces the model can understand. It’s like turning a long speech into tiny puzzle pieces,” he said.
With resources from Hugging Face, Oute AI, and Nigerian repositories, YarnGPT was born.
Going Public: The Power of a Video
Despite his technical prowess, Azeez isn’t just a behind-the-scenes coder. To showcase YarnGPT, he shot a simple two-minute video with friends.
“I called my friend Aremu, and we found someone with an unused camera. We shot the video in another friend’s living room, rearranged the whole place, and even used their TV as the backdrop. His mum wasn’t too happy when she got back,” Azeez laughed.
The video went viral, racking up 138,000 views on X (formerly Twitter) and catching the attention of industry leaders like Timi Ajiboye, Co-founder of Hellicarrier (formerly BuyCoins).
Why YarnGPT Is a Game Changer
YarnGPT isn’t just about accents. It can read Nigerian languages like Hausa, Igbo, and Yoruba with impressive accuracy.
Potential applications are vast:
- Voice-overs for content creators
- Navigation aids for apps like Google Maps in Nigerian languages
- Accessibility tools for non-English speakers
Nigeria in the Global AI Race
While talents like Azeez and Ijemma Onwuzulike (creator of Igbo Speech) are pushing boundaries, Nigeria lags behind in the global AI race. The U.S. is investing over $500 billion in AI. Meanwhile, innovations like DeepSeek are disrupting markets, shaking even tech giants like Nvidia.
“Honestly, we’re way off. We’re not even in the race,” Azeez admitted. “But there’s hope. Instead of building from scratch, we can localize AI for our needs—adapting existing models for Nigerian languages and cultures.”
Nigeria’s Minister of Communications and Digital Economy, Bosun Tijani, is vocal about positioning the country as a key AI player. With brilliant minds like Azeez, maybe we’re closer than we think.
YarnGPT isn’t just an AI model; it’s a testament to what’s possible when passion meets persistence. And for Azeez, this is just the beginning.
Leave a Reply