Data catalogue

African Language Speech Datasets

Commercially licensed Twi, Wolof, and Fon speech data, validated and shipping today with volume scaling now. Every African audio dataset is speaker-verified, AI quality-filtered at capture, and delivered under a commercial license with an SLA.

Available now

Twi, Wolof & Fon speech datasets shipping today

Twi (Akan) Speech Dataset commercially licensed for ASR and IVR  Ghana

Twi (Akan) Speech Dataset

Ghana · ISO 639: ak

Available · scaling

Natural, code-switched Akan Twi speech for Ghanaian voice and ASR systems.

For: Ghanaian ASR training data & local-language IVR

Modality
Conversational & image-prompted speech
Format
WAV · 16 kHz + timestamped transcripts (JSON)
Licensing
Commercial license + SLA
Quality
>80% inter-annotator agreement
Sourcing
Fair-Trade, native speakers
Delivery
API or S3

Part of our 128-hour speaker-verified Twi, Wolof & Fon corpus fully annotated and growing.

Wolof Speech Dataset commercially licensed for ASR and IVR  Senegal

Wolof Speech Dataset

Senegal · ISO 639: wo

Available · scaling

Everyday Wolof speech, including French code-switching, for CPaaS platforms serving Senegal.

For: Wolof IVR Senegal & West African ASR training data

Modality
Conversational & image-prompted speech
Format
WAV · 16 kHz + timestamped transcripts (JSON)
Licensing
Commercial license + SLA
Quality
>80% inter-annotator agreement
Sourcing
Fair-Trade, native speakers
Delivery
API or S3

Part of our 128-hour speaker-verified Twi, Wolof & Fon corpus fully annotated and growing.

Fon Speech Dataset commercially licensed for ASR and IVR  Benin

Fon Speech Dataset

Benin · ISO 639: fon

Available · scaling

Natural, tonal Fon speech for voice and ASR systems serving Benin and the wider Gbe-speaking belt.

For: Beninese ASR training data & local-language IVR

Modality
Conversational & image-prompted speech
Format
WAV · 16 kHz + timestamped transcripts (JSON)
Licensing
Commercial license + SLA
Quality
>80% inter-annotator agreement
Sourcing
Fair-Trade, native speakers
Delivery
API or S3

Part of our 128-hour speaker-verified Twi, Wolof & Fon corpus fully annotated and growing.

Mission-Based Collection

More African Languages Coming

Need a language that isn't in our catalogue yet? We scope and staff a custom collection mission with our native-speaker network the same image-prompted elicitation, AI quality-filtering and >80% agreement standard as our shipping datasets.

  • Swahili
    Tanzania
  • Hausa
    Nigeria
  • Amharic
    Ethiopia
  • Yoruba
    Nigeria
  • Zulu
    South Africa
  • Fulfulde
    Senegal
  • Ewe
    Togo
  • Shona
    Zimbabwe
  • Lingala
    DR Congo
  • Igbo
    Nigeria
  • Oromo
    Ethiopia
  • Tigrinya
    Ethiopia
  • Somali
    Somalia
  • Xhosa
    South Africa
  • Luganda
    Uganda
  • Kikuyu
    Kenya
  • Krio
    Sierra Leone
  • Bambara
    Mali
  • Kinyarwanda
    Rwanda

Typical lead time: 4–8 weeks from kickoff to first delivery.

Don't see the dataset you need?

Or email us at contact@afriklang.com

We use analytics to improve our site.