Opendatabay APP

Google GoEmotions Dataset

Reviews & Ratings

Tags and Keywords

Internet

Text

Ratings

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Google GoEmotions Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The Google AI GoEmotions dataset contains comments from Reddit users, each labelled with its emotional colouring. This dataset is primarily designed to train neural networks for performing deep analysis of text tonality. Unlike many existing emotion classification datasets that often cover narrow areas like news headlines or movie subtitles, and typically use a limited scale of six basic emotions (anger, surprise, disgust, joy, fear, and sadness), GoEmotions offers a much broader emotional spectrum. This expansion enables the development of more sensitive chatbots, enhanced models for detecting hazardous online behaviour, and improved customer support services through a deeper understanding of textual emotion. The emotion categories were collaboratively identified by Google and psychologists, encompassing 12 positive, 11 negative, 4 ambiguous, and 1 neutral emotion, making it well-suited for tasks requiring fine-grained emotion differentiation.

Columns

  • id: A unique identifier for each comment. This column contains approximately 58,011 unique values.
  • text: The original text content from the Reddit comment. There are around 57,732 unique text entries.
  • example_very_unclear: A boolean field indicating whether the text content is considered very unclear. Approximately 207,814 entries are marked as false, while 3,411 are marked as true.
  • admiration: An emotion label.
  • amusement: An emotion label.
  • anger: An emotion label.
  • annoyance: An emotion label.
  • approval: An emotion label.
  • caring: An emotion label.
  • confusion: An emotion label. (Note: The dataset includes additional columns for the remaining 20+ fine-grained emotion labels mentioned in the dataset's description.)

Distribution

This dataset is typically provided in a CSV data file format. It contains a substantial number of records, with the sum of false and true values in the example_very_unclear column suggesting over 210,000 individual comments or records. The structure is organised to facilitate direct use in machine learning and natural language processing tasks.

Usage

This dataset is ideal for several applications, particularly for projects focused on emotion recognition and text analysis. Its primary use is for training neural networks to perform deep analysis of text tonality. This capability can be leveraged to develop more sensitive chatbots, create models for detecting dangerous online behaviour, and significantly improve customer support services by allowing systems to better understand the emotional nuances in user communications.

Coverage

The dataset comprises comments sourced from Reddit users, which implies a global geographic coverage. Specific details regarding the time range of the comments or the precise demographics of the Reddit users are not provided within the available information.

License

CCO

Who Can Use It

This dataset is particularly valuable for:
  • AI and Machine Learning Researchers: For advancing the field of emotion recognition and fine-grained sentiment analysis.
  • Natural Language Processing (NLP) Developers: To build applications that require the ability to discern and react to emotional states in text.
  • Chatbot Developers: To design and implement conversational AI that exhibits higher emotional intelligence and provides more empathetic interactions.
  • Data Scientists: Interested in exploring and modelling human emotions expressed through social media text.

Dataset Name Suggestions

  • Google GoEmotions Dataset
  • Reddit Emotion Comments Dataset
  • Fine-Grained Emotion Text Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free