Opendatabay APP

TikTok Moderation Report Analysis

Social Media and Posts

Tags and Keywords

Tiktok

Claims

Prediction

Moderation

Social

Media

Trusted By
Trusted by company1Trusted by company2Trusted by company3
TikTok Moderation Report Analysis Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Information on user reports of videos and comments from TikTok that contain user claims is provided. These reports flag content for moderator review, which typically generates a notable volume of content requiring timely attention. A predictive model is currently under development to determine whether a video features a claim or an opinion. A successful model is anticipated to help reduce the backlog of user reports and enable more efficient prioritisation for review. This data is suitable for exploratory data analysis, statistical analysis, and predictive modelling. It has been created for pedagogical purposes, aiming to facilitate learning and research in data analysis and machine learning.

Columns

  • claim: A numerical field representing a claim, ranging from 1 to 19,382, with an average value of 9,690.
  • claim_status: A categorical variable indicating if the content is a 'claim' (approximately 50% of entries) or an 'opinion' (around 49% of entries). A small percentage (2%) falls into an 'Other' category.
  • video_id: A unique identifier for each video, with values spanning from 1.23 billion to 10.00 billion.
  • video_duration_sec: The length of the video in seconds, ranging from 5 to 60 seconds, with an average duration of 32.4 seconds.
  • video_transcription_text: The textual transcription of the video's content, which can include various phrases. About 2% of entries in this column are null.
  • verified_status: Indicates whether the author's account is 'not verified' (approximately 94%) or 'verified' (about 6%).
  • author_ban_status: Details the author's ban status, categorised as 'active' (around 81%), 'under review' (about 11%), or 'other' (roughly 8%).
  • video_view_count: The number of times a video has been viewed, with counts from 20 to 1,000,000, averaging 255,000 views.
  • video_like_count: The total number of likes a video has received, ranging from 0 to 658,000, with an average of 84,300 likes.
  • video_share_count: The number of times a video has been shared, from 0 to 256,000, averaging 16,700 shares.
  • video_download_count: The number of times a video has been downloaded, ranging from 0 to 15,000, with an average of 1,050 downloads.
  • video_comment_count: The number of comments posted on a video, from 0 to 9,599, averaging 349 comments.

Distribution

The data is provided as a tabular CSV file, named tiktok_dataset.csv, with a file size of 3.08 MB. It consists of 12 columns and approximately 19,400 records. Most columns have 100% valid entries. However, several columns, including claim_status, video_transcription_text, video_view_count, video_like_count, video_share_count, video_download_count, and video_comment_count, show about 98% valid entries, with a small proportion (2%) being either missing or mismatched.

Usage

This data is suitable for:
  • Exploratory Data Analysis (EDA): For discovering patterns and characteristics within TikTok user reports.
  • Statistical Analysis: To quantify relationships between various video attributes and their claim status.
  • Predictive Modelling: For building machine learning models that classify videos as containing claims or opinions.
  • Social Media Analytics: To gain insights into content trends and user engagement related to claims on social platforms.
  • Improving Content Moderation: To develop tools that assist platforms like TikTok in the efficient prioritisation and processing of user-flagged content.
  • Academic Research: To facilitate studies in machine learning, natural language processing, and social media dynamics.

Coverage

The data originates from TikTok, focusing on user reports concerning video and comment content. While specific geographic or demographic scopes are not detailed, the data reflects activity and content types characteristic of the TikTok platform. No specific time range for data collection is provided.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Analysts: For conducting deep dives into social media data, identifying trends, and performing statistical assessments.
  • Machine Learning Engineers: For training and evaluating models for content classification and claim prediction tasks.
  • Researchers and Academics: To engage in studies related to online content, social media behaviour, and automated moderation systems.
  • Students: To utilise the data for educational projects in data science, statistics, and artificial intelligence.
  • Social Media Platforms and Moderation Teams: To develop and refine internal tools for more effective content review and user report management.

Dataset Name Suggestions

  • TikTok Content Claim Prediction
  • Social Media Video Opinion Classifier
  • TikTok Moderation Report Analysis
  • User Claim Detection on TikTok
  • Video Content Classification Dataset

Attributes

Original Data Source: TikTok Moderation Report Analysis

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format