Opendatabay APP

Twitter Hate Speech Detection Data

Social Media and Posts

Tags and Keywords

Twitter

Racism

Sexism

Nlp

Sentiment

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter Hate Speech Detection Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Consists of labelled social media interactions essential for developing sophisticated methods of language classification. It is primarily used for training machine learning models to identify and categorize online content as either acceptable or containing specific types of toxicity. The data is pivotal for researchers dedicated to Natural Language Processing (NLP) and the enhancement of safety measures across digital platforms, specifically focusing on the detection of racist or sexist content.

Columns

  • id: The unique identifier associated with each social media post.
  • label: A binary classification field. A value of '1' indicates the post is classified as racist or sexist, while '0' indicates the content is deemed not racist.
  • tweet: The actual text content retrieved from the social media platform.

Distribution

The data is structured in a tabular format and totals 3.1 MB, comprising 3 columns and 32,000 valid records. While the total record count is 32,000, there are 29,530 unique text entries. The information is expected to be refreshed on an annual basis. The distribution of the primary label shows that 29,720 records are non-toxic (label 0), with 2,242 records containing toxic content (label 1).

Usage

This collection is ideally applied in the development and calibration of algorithms for automated content moderation. It serves as fundamental training data for sentiment analysis and text classification tasks within the domain of social network data. It supports the building of models that actively filter out harmful language and improve digital community health.

Coverage

The data captures general social media text content from a major platform. The scope is focused purely on the linguistic markers of toxicity. Specific geographic origins, demographic segments of the users, or a definitive timeline are not detailed within the available source context.

License

CC0: Public Domain

Who Can Use It

Data scientists and machine learning engineers who specialise in text classification. Researchers studying online behaviour, harassment, or the propagation of malicious sentiment. Developers constructing AI-driven tools for filtering or safety within social media applications.

Dataset Name Suggestions

  • Twitter Hate Speech Detection Data
  • Online Toxicity Classification Project
  • Social Network Sentiment Analysis Resource

Attributes

Listing Stats

VIEWS

7

DOWNLOADS

1

LISTED

13/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format