Twitter Emotion Classification Dataset
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for emotion recognition tasks, particularly focusing on English Twitter messages [3]. It provides a collection of tweets labelled with six basic human emotions: anger, fear, joy, love, sadness, and surprise [3]. An extended set of eight emotions (including anticipation, disgust, and trust) was originally collected, and the data has been preprocessed based on the methodology described in the accompanying research paper [3]. The dataset aims to provide robust linguistic building blocks for understanding and modelling how emotions are conveyed through text, which is crucial for contextualised affect representations [4].
Columns
- text: A string feature representing the original tweet content [5]. This column contains 2000 unique values, all of which are valid [5]. An example entry is: "im feeling quite sad and sorry for myself but ill snap out of it soon" [6].
- label: A classification label, representing one of the six basic emotions [7]. The possible integer values correspond to: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (implied as 5, though not explicitly listed with a number in the provided breakdown) [7]. This column also contains 2000 valid entries with a mean label value of 1.53 and a standard deviation of 1.47 [7].
Distribution
The dataset is primarily structured around preprocessed English Twitter messages [3]. While the exact file format for distribution is not explicitly stated in the provided text, a
test.csv
file is referenced, suggesting a CSV format [5]. The size of the downloaded dataset files is 3.95 MB, and the size of the generated dataset is 4.16 MB, leading to a total disk usage of 8.11 MB [6]. Both the text
and label
columns contain 2000 records [5, 7].Usage
This dataset is ideal for a variety of applications in Natural Language Processing (NLP) and machine learning, particularly:
- Emotion recognition and detection in textual data [3, 4].
- Developing and evaluating sentiment analysis models [4].
- Text classification tasks related to emotional states [4].
- Contextualised affect representation research [4].
- Building AI systems capable of understanding nuanced human emotions from text [4].
Coverage
The dataset consists exclusively of English Twitter messages [3]. There is no specific geographic or detailed time range coverage mentioned beyond the source being the Twitter API [3]. The data focuses on general emotional expressions within tweets and does not specify demographic group coverage [3, 4].
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- AI and Machine Learning Researchers: For developing and testing new algorithms for emotion recognition and sentiment analysis [4].
- Data Scientists: To build predictive models for understanding emotional content in social media data [4].
- NLP Practitioners: For training and fine-tuning language models on emotional expression [4].
- Students and Academics: As a valuable resource for projects and studies in computational linguistics and artificial intelligence [3].
Dataset Name Suggestions
- Emotion Tweets for NLP
- Twitter Emotion Classification Dataset
- CARER Emotion Dataset
- English Tweet Emotion Data
- Social Media Emotion Recognition Corpus
Attributes
Original Data Source: Twitter Emotion Classification Dataset