Text Emotion Classification Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to support research in Natural Language Processing (NLP) and emotion analysis, addressing the challenge of detecting emotions from text [1]. It contains a variety of text fragments, such as pieces of sentences (tweets), each labelled with the emotions they convey [1, 2]. The goal is to facilitate the building of efficient models for identifying emotions from textual data [1]. This dataset is particularly useful given the inherent difficulties in representing the wide range of human emotions in sufficiently labelled data, which often leads to class imbalance [1].
Columns
- text: These are fragments of text, typically pieces of sentences or tweets, intended for sentiment and emotion analysis [2].
- Emotion: This column classifies the emotion detected from the corresponding text fragment [2]. The emotions covered include empty, sadness, enthusiasm, neutral, worry, surprise, love, fun, hate, happiness, boredom, relief, and anger [2].
Distribution
The dataset is expected to be provided in a data file format, commonly CSV [3]. It contains approximately 840,000 records [4]. The distribution of emotions within the dataset shows that 'neutral' emotions account for 80%, 'love' accounts for 5%, and other emotions collectively make up the remaining 15%, which corresponds to 125,464 records [4]. There are 393,822 unique text values within the dataset [4].
Usage
This dataset is ideal for various applications in NLP and sentiment analysis [1, 2]. It can be used to:
- Develop and train machine learning models for emotion detection from text [1].
- Conduct research in the field of emotion analysis [1].
- Analyse sentiment in social media content like tweets [2].
- Address challenges related to multi-class classification and class imbalance in emotion recognition [1].
Coverage
The dataset has a global regional coverage [1]. It focuses on text data, specifically pieces of sentences or tweets [2]. Specific geographic or time range details beyond global availability are not specified in the sources.
License
CCO
Who Can Use It
The dataset is intended for use by:
- Data scientists working on text analysis [1].
- Researchers in Natural Language Processing (NLP) [1].
- Machine learning engineers developing emotion detection systems [1].
- Academics and students studying sentiment and emotion analysis [1].
Dataset Name Suggestions
- Text Emotion Classification Dataset
- Social Media Emotion Analysis Dataset
- Emotion Detection from Text
- NLP Emotion Recognition Data
Attributes
Original Data Source: emotion analysis based on text