Dark Mode

Home

Data Categories

AI & ML Data

GoEmotions Text Emotion Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

GoEmotions Text Emotion Dataset

Data Science and Analytics

Tags and Keywords

Text

Exploratory

Nlp

Statistical

Trusted By

GoEmotions Text Emotion Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a corpus of 58,009 Reddit comments, each meticulously annotated by humans to one of 27 distinct emotion categories or a neutral label. It serves as an invaluable resource for tasks involving the multi-classification of emotions and is particularly well-suited for various natural language processing (NLP) applications.

Columns

data: The original textual content of the Reddit comment.
text: The textual content of the Reddit comment, which may be a processed or identical version of the data column.
id: A unique identifier for each individual Reddit comment.
author: The username of the Reddit account that posted the comment.
subreddit: The name of the Reddit community (subreddit) where the comment was published.
link_id: An identifier for the submission (post) to which the comment is linked.
parent_id: An identifier for the parent comment or the original submission, indicating its place within a conversation thread.
created_utc: The creation timestamp of the comment, presented in Unix epoch format.
rater_id: An identifier for the human annotator who provided the emotion label for the comment.
example_very_unclear: A boolean flag that indicates whether the example was deemed very unclear during the annotation process.
admiration: One of the 27 emotion categories assigned to the comment, typically represented as a binary (0 or 1) value. Other emotion categories include amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realisation, relief, remorse, sadness, and surprise, in addition to a Neutral label.

Distribution

The dataset is provided in a CSV file format. It contains 58,009 individual examples and has a file size of 42.74 MB. The data is structured with a version filtered based on rater-agreement, which is further divided into training, testing, and validation sets:

Training dataset: 43,410 examples
Test dataset: 5,427 examples
Validation dataset: 5,426 examples

Usage

This dataset is ideal for:

Developing and evaluating emotion classification models.
Performing sentiment analysis on social media content.
Conducting research in natural language processing and understanding.
Facilitating exploratory data analysis of emotional expression on the Reddit platform.
Aiding the development of AI and large language model (LLM) applications that require emotion detection capabilities.

Coverage

Geographic Scope: The data's scope is global.
Time Range: Comments included in the dataset were created between approximately 1st January 2019 and 1st February 2019.
Demographic Scope: As the data originates from Reddit comments, it reflects the diverse range of user demographics present on the platform, although specific demographic breakdowns are not provided.

License

CC BY-NC-SA.

Who Can Use It

Data scientists seeking to build and test machine learning models for emotion detection.
NLP researchers focused on advancements in emotion recognition and textual sentiment.
Academics engaged in linguistic or social science studies of online communication patterns.
Developers creating applications for social media monitoring or conversational AI systems.

Dataset Name Suggestions

GoEmotions Reddit Comments
Reddit Emotion Corpus
Social Media Emotion Labels Dataset
GoEmotions Text Emotion Dataset

Attributes

Original Data Source: GoEmotions

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...