Dark Mode

Home

Data Categories

Web & Social Media Data

Twitter Emotion Extraction Raw and Cleaned Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Twitter Emotion Extraction Raw and Cleaned Data

Social Media and Posts

Tags and Keywords

Internet

Twitter

Emotion

Text

Nlp

Trusted By

Twitter Emotion Extraction Raw and Cleaned Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data provides a cleaned and raw collection of text extracted from Twitter, specifically designed for use in Emotion Extraction from text projects. This resource was compiled and cleaned for an Honours academic project, offering researchers both the unprocessed text scraped directly from the platform and the corresponding preprocessed text, along with the associated emotional category.

Columns

The material consists of 3 columns, all showing 100% validity across 917,000 records:

Emotion: The category name of the extracted emotion. Common categories include 'disappointed' (34%) and 'happy' (33%), with three total unique categories.
Content: The preprocessed and cleaned text data, derived from the raw Twitter content. This column contains 848,487 unique values.
Original Content: The raw, unfiltered text content originally scraped from Twitter without any preprocessing. This column contains 912,495 unique values.

Distribution

The information is available in a CSV file named dataset(clean).csv, which has a file size of 168.33 MB. The dataset includes 917,000 valid records. Data quality is high, with all columns being 100% valid and containing zero missing or mismatched entries. The dataset is static and is never expected to receive future updates.

Usage

This resource is ideally suited for machine learning practitioners and academics focusing on Natural Language Processing (NLP) and text classification. It can be used to develop and test models for emotion extraction from short-form text, comparing model performance between using the raw data and the preprocessed content.

Coverage

The scope covers text data derived from tweets collected from Twitter. The content focuses specifically on text suitable for multiclass classification tasks related to human emotion.

License

CC0: Public Domain

Who Can Use It

The dataset is intended for users interested in applying machine learning to textual data, including students, researchers, and data scientists working on NLP. It is highly suitable for those focusing on multiclass classification tasks. The material holds a maximum usability rating of 10.00.

Dataset Name Suggestions

Twitter Emotion Extraction Raw and Cleaned Data
Social Media Text Data for Emotion Classification
Twitter NLP Emotion Dataset

Attributes

Original Data Source: Twitter Emotion Extraction Raw and Cleaned Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Twitter Emotion Extraction Raw and Cleaned Data

Social Media and Posts

Tags and Keywords

Internet

Twitter

Emotion

Text

Nlp

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS