Twitter Emotion Extraction Raw and Cleaned Data
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Data provides a cleaned and raw collection of text extracted from Twitter, specifically designed for use in Emotion Extraction from text projects. This resource was compiled and cleaned for an Honours academic project, offering researchers both the unprocessed text scraped directly from the platform and the corresponding preprocessed text, along with the associated emotional category.
Columns
The material consists of 3 columns, all showing 100% validity across 917,000 records:
- Emotion: The category name of the extracted emotion. Common categories include 'disappointed' (34%) and 'happy' (33%), with three total unique categories.
- Content: The preprocessed and cleaned text data, derived from the raw Twitter content. This column contains 848,487 unique values.
- Original Content: The raw, unfiltered text content originally scraped from Twitter without any preprocessing. This column contains 912,495 unique values.
Distribution
The information is available in a CSV file named
dataset(clean).csv, which has a file size of 168.33 MB. The dataset includes 917,000 valid records. Data quality is high, with all columns being 100% valid and containing zero missing or mismatched entries. The dataset is static and is never expected to receive future updates.Usage
This resource is ideally suited for machine learning practitioners and academics focusing on Natural Language Processing (NLP) and text classification. It can be used to develop and test models for emotion extraction from short-form text, comparing model performance between using the raw data and the preprocessed content.
Coverage
The scope covers text data derived from tweets collected from Twitter. The content focuses specifically on text suitable for multiclass classification tasks related to human emotion.
License
CC0: Public Domain
Who Can Use It
The dataset is intended for users interested in applying machine learning to textual data, including students, researchers, and data scientists working on NLP. It is highly suitable for those focusing on multiclass classification tasks. The material holds a maximum usability rating of 10.00.
Dataset Name Suggestions
- Twitter Emotion Extraction Raw and Cleaned Data
- Social Media Text Data for Emotion Classification
- Twitter NLP Emotion Dataset
Attributes
Original Data Source: Twitter Emotion Extraction Raw and Cleaned Data
Loading...
