Dark Mode

Home

Data Categories

AI Training Data

Top 25 Strategic Languages Conversational Audio Pack 174308 Hours

Verbalscripts Transcription LLC

Licensed LLM Data Provider

£950

Top 25 Strategic Languages Conversational Audio Pack 174308 Hours

Name: Top 25 Strategic Languages Conversational Audio Pack 174308 Hours
Creator: Verbalscripts Transcription LLC
Published: 2026-06-14T19:54:50.686Z
License: https://docs.opendatabay.com/ai-training-and-model-development-licenses/commercial-ai-training-and-fine-tuning-data-license

Audio, Speech & Acoustic Datasets

Tags and Keywords

Multiligual

Asr

Training

Global

Voice

Audio

Transcription

Conversation

Ai

Metadata

Recognition

Speechtotext

Top 25 Strategic Languages Conversational Audio Pack 174308 Hours Dataset on Opendatabay data marketplace

"No reviews yet"

£950

About

Overview

The African MENA Conversational Audio Dataset provides access to 75,807 available audio hours across key African, Middle Eastern, Francophone, Lusophone, and regional language groups. It is designed for AI companies building speech recognition, speech-to-text, multilingual voice AI, call-center automation, conversational AI, and low-resource language systems for African and MENA markets.

This listing represents a catalog-based commercial dataset. The uploaded sample file is only a preview catalog. The listed price represents a starting licensed delivery package and does not cover the full 75,807 hours. Final pricing depends on selected language, volume, enrichment files, delivery format, and licensing scope.

Dataset Contents

Available data may include conversational audio, mixed-down audio, speaker-separated stems where available, transcript and diarization files, language metadata, regional metadata, gender detection metadata, conversation summaries, sentiment analysis, and other structured enrichment fields depending on the licensed subset.

Delivery formats may include WAV, MP3, JSON, CSV, TXT, ZIP, or a custom buyer-ready structure.

Coverage

This dataset covers key African and MENA-related language groups including Arabic, French, Portuguese, Lingala, Yoruba, Hausa, Swahili, Shona, Wolof, Amharic, Malagasy, Afrikaans, and Somali.

Geographic coverage includes Africa, the Middle East, North Africa, West Africa, East Africa, Central Africa, Southern Africa, the Horn of Africa, Francophone Africa, Lusophone Africa, and the Indian Ocean region. Exact coverage varies by selected language and licensed subset.

Source and Collection Method

The data is sourced through vetted contributors, approved data collection partners, rights-cleared supplier relationships, and regional speech-data sourcing programs. Collection methods may include contributor-recorded conversational speech, language-specific audio collection projects, licensed conversational audio sources, and approved regional data collection arrangements.

Collection periods vary by language, region, and source partner. Specific collection-period details can be provided during buyer due diligence for the selected subset.

Consent, Legal Basis, and Rights Chain

The dataset is intended for approved commercial AI and machine learning use cases where the relevant consent, legal basis, supplier authorization, or licensing arrangement is in place. Depending on the specific subset, rights may be supported through contributor consent, rights-holder authorization, supplier license agreements, or project-specific data collection terms.

Subset-level documentation covering provenance, consent or collection basis, geography, permitted use, supplier chain, and licensing rights can be made available during qualified buyer due diligence or upon request, subject to confidentiality and the selected subset.

AI Training Use

The licensed data may be used for African language ASR, MENA speech recognition, speech-to-text model development, voice AI, call-center automation, language identification, speaker diarization, speech analytics, and multilingual LLM speech workflows, subject to the final license agreement.

The data itself may not be resold, redistributed, published, sublicensed, or shared outside the licensed environment unless expressly permitted in writing.

Delivery

Full delivery is handled through secure custom delivery. Delivery size depends on selected languages, selected hours, file formats, and enrichment layers. The uploaded sample file is only a catalog preview and is not the full dataset.

Pricing and Delivery Clarification

The listed price of GBP 950 is a starter licensed delivery package covering up to 25 selected audio hours from this catalog, subject to language availability, licensing scope, enrichment requirements, and delivery format.

This price does not cover the full available corpus shown in the catalog, including the full 515,849-hour, 174,308-hour, 75,807-hour, or 423,981-hour collections.

Larger orders, full-language packages, multi-language packages, and bulk licensing are quoted separately based on selected language, number of audio hours, transcript and diarization requirements, metadata enrichment, audio format, delivery format, and permitted AI/ML usage rights.

The uploaded catalog file is a preview of available coverage only. Full audio data is delivered through secure custom delivery after buyer confirmation, licensing review, and purchase.

Listing Stats

VIEWS

DELIVERY

CUSTOM, S3

LISTED

14/06/2026

UPDATED

16/06/2026

REGION

GLOBAL

QUALITY

5 / 5

£950

Download Dataset in Other Format

Recommended Datasets

Loading recommendations...