AI TRAINING DATA

Licensed AI training datasets for machine learning, LLM fine-tuning, NLP models, generative AI, and data-driven applications.

High-quality datasets for AI model training, LLM fine-tuning, NLP systems, RAG pipelines, computer vision, predictive analytics, and generative AI applications. Includes labeled datasets, benchmarking data, and privacy-safe training data designed to improve model accuracy and accelerate AI development.

Female Monologue Dataset: Tier 3 | Audio + Transcript Bundle data product
Marie DeVox provider on Opendatabay data collection card

Marie DeVox

Female Monologue Dataset: Tier 3 | Audio + Transcript Bundle

BEST FOR: Enterprise AI Research Labs & Data Engineers who require a multi-seat department licens...

Number of records

32

Size

219.7 MB

The Franciscan deal data product
Maria Radio Magyarorszag provider on Opendatabay data collection card

Maria Radio Magyarorszag

The Franciscan deal

This dataset consists of 40 original MP3 audio files from my motivational radio show focused on sta...

Number of records

40

Size

105.0 MB

SaaS Corporate English Vocal Dataset data product
Marie DeVox provider on Opendatabay data collection card

Marie DeVox

SaaS Corporate English Vocal Dataset

PROFESSIONAL AI VOICE DATASET - SAAS CORPORATE SERIES Format: LJ Speech Standard Compliance | 24-b...

Number of records

80

Size

20.1 MB

Global Conversational Audio Dataset 515,849 Hours Across 100 Languages data product
Verbalscripts Transcription LLC provider on Opendatabay data collection card

Verbalscripts Transcription LLC

Global Conversational Audio Dataset 515,849 Hours Across 100 Languages

Overview The Global Conversational Audio Dataset from Verbalscripts Transcription LLC provides acce...

Number of records

515.8K

Size

6.0 GB

African MENA Conversational Audio Dataset 75807 Hours data product
Verbalscripts Transcription LLC provider on Opendatabay data collection card

Verbalscripts Transcription LLC

African MENA Conversational Audio Dataset 75807 Hours

Overview The Top 25 Strategic Languages Conversational Audio Pack provides access to 174,308 availa...

Number of records

75.8K

Size

3.0 GB

Top 25 Strategic Languages Conversational Audio Pack 174308 Hours data product
Verbalscripts Transcription LLC provider on Opendatabay data collection card

Verbalscripts Transcription LLC

Top 25 Strategic Languages Conversational Audio Pack 174308 Hours

Overview The African MENA Conversational Audio Dataset provides access to 75,807 available audio ho...

Number of records

174.3K

Size

4.0 GB

Long Tail Multilingual Speech Dataset 423981 Hours 92 Languages data product
Verbalscripts Transcription LLC provider on Opendatabay data collection card

Verbalscripts Transcription LLC

Long Tail Multilingual Speech Dataset 423981 Hours 92 Languages

Overview The Long Tail Multilingual Speech Dataset provides access to 423,981 available audio hours...

Number of records

424K

Size

2.0 GB

Custom Conversational Audio Collection And Enrichment For AI Training data product
Verbalscripts Transcription LLC provider on Opendatabay data collection card

Verbalscripts Transcription LLC

Custom Conversational Audio Collection And Enrichment For AI Training

Overview Verbalscripts Transcription LLC provides custom conversational audio collection, transcrip...

Number of records

1

Size

5.8 GB

Real Industrial Video Dataset for Computer - wooden_window_factory_01 data product
OTL DATA S.R.L. provider on Opendatabay data collection card

OTL DATA S.R.L.

Real Industrial Video Dataset for Computer - wooden_window_factory_01

Overview ORION WWF1, short for wooden_window_factory_01, is a certified industrial AI sample datas...

Number of records

620

Size

16.1 GB

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training data product
Day By Day Recovery Resources provider on Opendatabay data collection card

Day By Day Recovery Resources

Pre AA Addiction Trajectory Dataset 100 JSONL Records for AI Training

This dataset provides 100 structured, scene-level behavioral records modeling the developmental traj...

Number of records

100

Size

354.0 KB

Simaihub Expert Navigation Foundation Pack data product
SimAIHub provider on Opendatabay data collection card

SimAIHub

Simaihub Expert Navigation Foundation Pack

Simaihub Expert Navigation Foundation Pack(V1) Product: Reinforcement-learning-ready expert naviga...

Number of records

1K

Size

10.6 GB

Sport Cars 2 sound recordings Audio Dataset for ML AI training data product
Krampfstadt Studio provider on Opendatabay data collection card

Krampfstadt Studio

Sport Cars 2 sound recordings Audio Dataset for ML AI training

Shift into high gear with Sport Cars 2, a premium collection featuring 10 iconic sports cars and per...

Number of records

2.2K

Size

26.7 GB

Show More Results