Multilingual Clean Talking-Head Video Dataset
Synthetic Images & Vision Datasets
Tags and Keywords
Trusted By




"No reviews yet"
£99
About
Description
This dataset is a high-quality synthetic dataset that simulates real-person, front-facing talking-head videos without background noise.
The samples consist of single-speaker videos with clear lip movements and accurate audio-visual synchronization. The current language is English, with extensibility to multiple languages, including low-resource languages.
The samples consist of single-speaker videos with clear lip movements and accurate audio-visual synchronization. The current language is English, with extensibility to multiple languages, including low-resource languages.
Video compositions include full-body, half-body, and close-up views. All samples contain no background noise, no watermarks, and no subtitles, ensuring clean and precise alignment between speech and lip movements.
The dataset is suitable for multimodal AI tasks such as speech recognition, lip reading, and lip-driven video generation.
The dataset is suitable for multimodal AI tasks such as speech recognition, lip reading, and lip-driven video generation.
Keywords
Phrases: talking head dataset, lip audio recognition, speech synthesis training, semantic driven video generation, multilingual video
Underscored: talking_head_dataset lip_audio_recognition speech_synthesis_training semantic_driven_video_generation multilingual_video
Underscored: talking_head_dataset lip_audio_recognition speech_synthesis_training semantic_driven_video_generation multilingual_video
Application Scenarios
- Intelligent speech synthesis system training
- Lip-driven video generation model training
- Audio-visual synchronization and alignment algorithm validation
- Multimodal interaction tasks (speech + video) research and development
Collection Conditions and Sample Structure
- Composition Types: single speaker, front-facing; full-body, half-body, and close-up views
- Camera Motion: slight viewpoint variation following speaker movement (simulated head motion)
- Environment Types: indoor and outdoor scenes with natural lighting
- Language Types: English (extensible to Spanish, Malay, Vietnamese, Indonesian, Burmese, and others)
- Background Conditions: no background noise, no subtitles, no watermarks
Data Specifications
- Resolution: 1920 × 1080
- Frame Rate: 30 FPS
- Video Codec: MP4 (H.264)
- Color Space: RGB
- Capture Method: high-definition camera simulation
Dataset Statistics
- Total Videos: 841 clips (calculated as ~3.5 hours × 150 clips per hour)
- Total Duration: ~3.5 hours
- Data Volume: ~20 GB
Sample Videos

Additional Notes & Services
- Instant Access: After purchase, you will receive a Google Drive download link for immediate access.
- Usage Policy: Please adhere to all ethical standards and privacy regulations. Preprocessing may be required.
- Actively Maintained: This dataset is continuously updated. Contact us for the latest version.
- Full Customization Available: We can tailor image formats, annotations, and other specs to your project needs.
- Flexible Delivery: We offer split packages and delivery via private server or cloud storage.
- Free Sample Package: Available for qualified buyers to verify data quality.
- Contact Us: For inquiries, customization, or samples, email us at contact4data-project@join-intelligence.com
- Explore All Datasets: Visit our Notion Collection
- Official Website:https://join-intelligence.com/
Loading...
£99
Download Dataset in Unknown Format
Recommended Datasets
Loading recommendations...
