Opendatabay APP

Toxic Comment Classification Dataset

Fraud Detection & Risk Management

Tags and Keywords

Toxic Comments

Natural Language Processing

Sentiment Analysis

Machine Learning

Content Moderation

Dataset Analysis

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Toxic Comment Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset comprises comments, each labelled to indicate whether it contains toxic content. The primary purpose is to facilitate the development and evaluation of models aimed at detecting and mitigating online toxicity, thereby promoting healthier online interactions.

Dataset Features

  • TC_ID: A unique identifier is assigned to each comment.
  • comment_text: The actual text of the comment extracted from Wikipedia's talk pages.
  • toxic: A binary label where '1' denotes a toxic comment and '0' indicates a non-toxic comment.

Distribution

  • Data Volume: The dataset contains 70157 rows and 3 columns in the provided sample.
  • Format: Structured in a tabular format with columns representing unique identifiers, comment texts, and toxicity labels.

Usage

This dataset is ideal for a variety of applications:
  • Toxicity Detection: Training machine learning models to identify and filter toxic comments in online platforms.
  • Sentiment Analysis: Analyzing the sentiment of user interactions to understand community dynamics.
  • Natural Language Processing (NLP): Developing and testing NLP algorithms focused on content moderation and abusive language detection.

Coverage

  • Geographic Coverage: Global, encompassing comments from Wikipedia users worldwide.
  • Time Range: The dataset includes comments from various periods, reflecting the diverse history of Wikipedia's discussions.
  • Demographics: Covers a wide range of contributors, including editors, administrators, and general users, without specific demographic distinctions.

License

CC0 (Public Domain)

Who Can Use It

  • Data Scientists: For developing and refining algorithms to detect toxic language.
  • Researchers: For studying online behavior, discourse analysis, and the effectiveness of moderation strategies.
  • Businesses: For implementing content moderation systems and enhancing user experience on their platforms.

Listing Stats

VIEWS

11

DOWNLOADS

1

LISTED

24/01/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free