POV Robotic Behavior Video Dataset
Synthetic Images & Vision Datasets
Tags and Keywords
Trusted By




"No reviews yet"
£30,000
About
Description:
This dataset consists of first-person (POV) generated video content covering various daily life scenarios such as desk organization, household cleaning, and general home activities. The videos simulate human-perspective visual input, providing diverse embodied interaction data for Vision-Language-Action (VLA) model training and multimodal perception research.
It captures realistic household contexts with diverse participants and environments, emphasizing spatial awareness, hand-object interactions, and motion continuity. The dataset supports a wide range of embodied intelligence and interactive AI applications.
This dataset contains over 20,000 high-quality annotated images. The estimated price is approximately €8,000, and can be adjusted based on delivery format, licensing scope, and customization requirements.
Keywords: POV, first-person view, daily activity, model training, VLA
Application Scenarios:
- Multimodal video analysis
- Action recognition and human-object interaction modeling
- Interactive AI and embodied learning model training
- Video understanding and scene reasoning tasks
Resolution: 1920×1080
Frame Rate (FPS): 30 FPS
Video Format / Codec: MP4 (H.264)
Total Videos: 1,000,000+ short clips (each 6–8 seconds, ~500 clips/hour, total 2,000 hours)
Total Duration: 2,000+ hours
Data Volume: 2 TB+
Unique Individuals / Scenes: diverse participants and multiple household behavior categories
Annotation Status: annotated (multimodal)
b video descriptions (action category + interactive object + spatial context)
Label Categories / Distribution: fitness, cooking, teaching, makeup, walking, observation activities
Sample Images


Loading...
