Twitter Hate Speech Detection Data
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Consists of labelled social media interactions essential for developing sophisticated methods of language classification. It is primarily used for training machine learning models to identify and categorize online content as either acceptable or containing specific types of toxicity. The data is pivotal for researchers dedicated to Natural Language Processing (NLP) and the enhancement of safety measures across digital platforms, specifically focusing on the detection of racist or sexist content.
Columns
- id: The unique identifier associated with each social media post.
- label: A binary classification field. A value of '1' indicates the post is classified as racist or sexist, while '0' indicates the content is deemed not racist.
- tweet: The actual text content retrieved from the social media platform.
Distribution
The data is structured in a tabular format and totals 3.1 MB, comprising 3 columns and 32,000 valid records. While the total record count is 32,000, there are 29,530 unique text entries. The information is expected to be refreshed on an annual basis. The distribution of the primary label shows that 29,720 records are non-toxic (label 0), with 2,242 records containing toxic content (label 1).
Usage
This collection is ideally applied in the development and calibration of algorithms for automated content moderation. It serves as fundamental training data for sentiment analysis and text classification tasks within the domain of social network data. It supports the building of models that actively filter out harmful language and improve digital community health.
Coverage
The data captures general social media text content from a major platform. The scope is focused purely on the linguistic markers of toxicity. Specific geographic origins, demographic segments of the users, or a definitive timeline are not detailed within the available source context.
License
CC0: Public Domain
Who Can Use It
Data scientists and machine learning engineers who specialise in text classification. Researchers studying online behaviour, harassment, or the propagation of malicious sentiment. Developers constructing AI-driven tools for filtering or safety within social media applications.
Dataset Name Suggestions
- Twitter Hate Speech Detection Data
- Online Toxicity Classification Project
- Social Network Sentiment Analysis Resource
Attributes
Original Data Source: Twitter Hate Speech Detection Data
Loading...
