Dark Mode

Home

Data Categories

Web & Social Media Data

Twitter Hate Speech Detection Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Twitter Hate Speech Detection Data

Social Media and Posts

Tags and Keywords

Twitter

Racism

Sexism

Nlp

Sentiment

Trusted By

Twitter Hate Speech Detection Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Consists of labelled social media interactions essential for developing sophisticated methods of language classification. It is primarily used for training machine learning models to identify and categorize online content as either acceptable or containing specific types of toxicity. The data is pivotal for researchers dedicated to Natural Language Processing (NLP) and the enhancement of safety measures across digital platforms, specifically focusing on the detection of racist or sexist content.

Columns

id: The unique identifier associated with each social media post.
label: A binary classification field. A value of '1' indicates the post is classified as racist or sexist, while '0' indicates the content is deemed not racist.
tweet: The actual text content retrieved from the social media platform.

Distribution

The data is structured in a tabular format and totals 3.1 MB, comprising 3 columns and 32,000 valid records. While the total record count is 32,000, there are 29,530 unique text entries. The information is expected to be refreshed on an annual basis. The distribution of the primary label shows that 29,720 records are non-toxic (label 0), with 2,242 records containing toxic content (label 1).

Usage

This collection is ideally applied in the development and calibration of algorithms for automated content moderation. It serves as fundamental training data for sentiment analysis and text classification tasks within the domain of social network data. It supports the building of models that actively filter out harmful language and improve digital community health.

Coverage

The data captures general social media text content from a major platform. The scope is focused purely on the linguistic markers of toxicity. Specific geographic origins, demographic segments of the users, or a definitive timeline are not detailed within the available source context.

License

CC0: Public Domain

Who Can Use It

Data scientists and machine learning engineers who specialise in text classification. Researchers studying online behaviour, harassment, or the propagation of malicious sentiment. Developers constructing AI-driven tools for filtering or safety within social media applications.

Dataset Name Suggestions

Twitter Hate Speech Detection Data
Online Toxicity Classification Project
Social Network Sentiment Analysis Resource

Attributes

Original Data Source: Twitter Hate Speech Detection Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

13/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Twitter Hate Speech Detection Data

Social Media and Posts

Tags and Keywords

Twitter

Racism

Sexism

Nlp

Sentiment

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS