Opendatabay APP

Multilingual Clean Talking-Head Video Dataset

Synthetic Images & Vision Datasets

Tags and Keywords

Talking_head_dataset

Lip_audio_recognition

Speech_synthesis_training

Semantic_driven_video_generation

Multilingual_video

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Multilingual Clean Talking-Head Video Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

£99

About

Description

This dataset is a high-quality synthetic dataset that simulates real-person, front-facing talking-head videos without background noise.
The samples consist of single-speaker videos with clear lip movements and accurate audio-visual synchronization. The current language is English, with extensibility to multiple languages, including low-resource languages.
Video compositions include full-body, half-body, and close-up views. All samples contain no background noise, no watermarks, and no subtitles, ensuring clean and precise alignment between speech and lip movements.
The dataset is suitable for multimodal AI tasks such as speech recognition, lip reading, and lip-driven video generation.

Keywords

Phrases: talking head dataset, lip audio recognition, speech synthesis training, semantic driven video generation, multilingual video
Underscored: talking_head_dataset lip_audio_recognition speech_synthesis_training semantic_driven_video_generation multilingual_video

Application Scenarios

  • Intelligent speech synthesis system training
  • Lip-driven video generation model training
  • Audio-visual synchronization and alignment algorithm validation
  • Multimodal interaction tasks (speech + video) research and development

Collection Conditions and Sample Structure

  • Composition Types: single speaker, front-facing; full-body, half-body, and close-up views
  • Camera Motion: slight viewpoint variation following speaker movement (simulated head motion)
  • Environment Types: indoor and outdoor scenes with natural lighting
  • Language Types: English (extensible to Spanish, Malay, Vietnamese, Indonesian, Burmese, and others)
  • Background Conditions: no background noise, no subtitles, no watermarks

Data Specifications

  • Resolution: 1920 × 1080
  • Frame Rate: 30 FPS
  • Video Codec: MP4 (H.264)
  • Color Space: RGB
  • Capture Method: high-definition camera simulation

Dataset Statistics

  • Total Videos: 841 clips (calculated as ~3.5 hours × 150 clips per hour)
  • Total Duration: ~3.5 hours
  • Data Volume: ~20 GB

Sample Videos

image.png image.png

Additional Notes & Services

  • Instant Access: After purchase, you will receive a Google Drive download link for immediate access.
  • Usage Policy: Please adhere to all ethical standards and privacy regulations. Preprocessing may be required.
  • Actively Maintained: This dataset is continuously updated. Contact us for the latest version.
  • Full Customization Available: We can tailor image formats, annotations, and other specs to your project needs.
  • Flexible Delivery: We offer split packages and delivery via private server or cloud storage.
  • Free Sample Package: Available for qualified buyers to verify data quality.
  • Contact Us: For inquiries, customization, or samples, email us at contact4data-project@join-intelligence.com
  • Explore All Datasets: Visit our Notion Collection
  • Official Website:https://join-intelligence.com/

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

26/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

£99

Download Dataset in Unknown Format