Google GoEmotions Dataset
Reviews & Ratings
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The Google AI GoEmotions dataset contains comments from Reddit users, each labelled with its emotional colouring. This dataset is primarily designed to train neural networks for performing deep analysis of text tonality. Unlike many existing emotion classification datasets that often cover narrow areas like news headlines or movie subtitles, and typically use a limited scale of six basic emotions (anger, surprise, disgust, joy, fear, and sadness), GoEmotions offers a much broader emotional spectrum. This expansion enables the development of more sensitive chatbots, enhanced models for detecting hazardous online behaviour, and improved customer support services through a deeper understanding of textual emotion. The emotion categories were collaboratively identified by Google and psychologists, encompassing 12 positive, 11 negative, 4 ambiguous, and 1 neutral emotion, making it well-suited for tasks requiring fine-grained emotion differentiation.
Columns
- id: A unique identifier for each comment. This column contains approximately 58,011 unique values.
- text: The original text content from the Reddit comment. There are around 57,732 unique text entries.
- example_very_unclear: A boolean field indicating whether the text content is considered very unclear. Approximately 207,814 entries are marked as
false
, while 3,411 are marked astrue
. - admiration: An emotion label.
- amusement: An emotion label.
- anger: An emotion label.
- annoyance: An emotion label.
- approval: An emotion label.
- caring: An emotion label.
- confusion: An emotion label. (Note: The dataset includes additional columns for the remaining 20+ fine-grained emotion labels mentioned in the dataset's description.)
Distribution
This dataset is typically provided in a CSV data file format. It contains a substantial number of records, with the sum of
false
and true
values in the example_very_unclear
column suggesting over 210,000 individual comments or records. The structure is organised to facilitate direct use in machine learning and natural language processing tasks.Usage
This dataset is ideal for several applications, particularly for projects focused on emotion recognition and text analysis. Its primary use is for training neural networks to perform deep analysis of text tonality. This capability can be leveraged to develop more sensitive chatbots, create models for detecting dangerous online behaviour, and significantly improve customer support services by allowing systems to better understand the emotional nuances in user communications.
Coverage
The dataset comprises comments sourced from Reddit users, which implies a global geographic coverage. Specific details regarding the time range of the comments or the precise demographics of the Reddit users are not provided within the available information.
License
CCO
Who Can Use It
This dataset is particularly valuable for:
- AI and Machine Learning Researchers: For advancing the field of emotion recognition and fine-grained sentiment analysis.
- Natural Language Processing (NLP) Developers: To build applications that require the ability to discern and react to emotional states in text.
- Chatbot Developers: To design and implement conversational AI that exhibits higher emotional intelligence and provides more empathetic interactions.
- Data Scientists: Interested in exploring and modelling human emotions expressed through social media text.
Dataset Name Suggestions
- Google GoEmotions Dataset
- Reddit Emotion Comments Dataset
- Fine-Grained Emotion Text Dataset
Attributes
Original Data Source: Go Emotions: Google Emotions Dataset