TikTok Video Classification Dataset
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset presents TikTok user engagement data, specifically designed for classifying claims and opinions within short-form mobile videos. It directly supports TikTok's core mission to cultivate an inclusive, joyful, and authentic content environment, enabling users to safely discover, create, and connect. The dataset is instrumental in distinguishing between personal beliefs and unsubstantiated information published on the platform.
Columns
- # (Integer): A unique identification number assigned by TikTok to each video that contains a claim or an opinion.
- claim_status (Object): Categorises the content of a published video as either an “opinion” (representing a personal belief or thought) or a “claim” (indicating information that is unsourced or from an unverified origin).
- video_id (Integer): A random identifying number assigned to a video upon its publication on the TikTok platform.
- video_duration_sec (Integer): Measures the length of the published video in seconds.
- video_transcription_text (Object): Provides the transcribed textual content of the words spoken within the published video.
- verified_status (Object): Denotes the verification status of the TikTok user who published the video, either “verified” or “not verified”.
- author_ban_status (Object): Indicates the current permissions status of the user who published the video, categorised as “active”, “under scrutiny”, or “banned”.
- video_view_count (Float): The total number of times the published video has been viewed by other users.
- video_like_count (Float): The total number of times the published video has received likes from other users.
- video_share_count (Float): The total number of times the published video has been shared by other users.
- video_download_count (Float): The total number of times the published video has been downloaded by other users.
- video_comment_count (Float): The total number of comments posted on the published video.
Distribution
The dataset is supplied as a CSV file, specifically named
tiktok_dataset.csv
, with a file size of 3.08 MB. It comprises 12 distinct columns and contains approximately 19,400 records. While most columns have valid data for around 19,400 entries, some columns feature valid data for approximately 19,100 records due to minor missing values.Usage
This dataset is ideally suited for analysing user engagement patterns and content characteristics on TikTok. It can be utilised for:
- Content moderation and classification: Developing systems to identify and categorise claims versus opinions in video content.
- Social media analytics: Understanding how different content types impact user interaction and virality.
- Machine learning model development: Training and testing algorithms for text transcription analysis, user behaviour prediction, and content credibility assessment.
- Research into online information: Studying the spread of unsourced or unverified information on short-form video platforms.
Coverage
The dataset focuses on video claims and user engagement metrics on TikTok. Specific details regarding geographic scope, time range, or demographic coverage are not provided within the available information.
License
CC0: Public Domain
Who Can Use It
This dataset is highly beneficial for:
- Data Scientists and AI/ML Practitioners: Those looking to build and refine models for natural language processing, content classification, and predictive analytics.
- Social Media Researchers: Academics and analysts investigating platform dynamics, user behaviour, and information dissemination.
- Content Policy Developers: Individuals and teams responsible for establishing and enforcing content guidelines on digital platforms.
- Business Intelligence Analysts: Seeking insights into audience engagement and content performance on TikTok.
Dataset Name Suggestions
- TikTok Content Claims & Engagement Analytics
- TikTok Video Classification Dataset
- Short-Form Video Claims and User Interaction Data
- TikTok User Engagement Metrics for Content Analysis
- Social Video Content Moderation Dataset
Attributes
Original Data Source Link: TikTok Video Classification Dataset