Dark Mode

Home

Data Categories

Web & Social Media Data

Annotated Social Media Sentiment Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Annotated Social Media Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Twitter

Nlp

Tweets

Polarity

Trusted By

Annotated Social Media Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Provides a large volume of annotated social media data for sentiment analysis, consisting of 1.6 million tweets extracted using the Twitter Opensource API. The primary purpose is to enable machine learning models to detect sentiment polarity. Tweets are labelled according to their sentiment, facilitating tasks related to text classification, Natural Language Processing, and understanding public opinion.

Columns

target: Represents the polarity of the tweet. Values are typically 0 for negative, 2 for neutral, and 4 for positive.
ids: The unique identifier assigned to the specific tweet.
date: The timestamp indicating when the tweet was posted.
flag: Originally used for query tracking; for this collection, it often defaults to NO_QUERY.
user: The username of the person who created the tweet.
text: The full content of the tweet itself.

Distribution

The data is presented in a tabular format, typically a CSV file (e.g., tweets.csv), with a total size of approximately 238.8 MB. It contains 6 distinct fields and includes 1.6 million valid records. While the file usually refers to 1 million data points in context, the validated record count is higher. Specific row counts are precise, but the dataset is not expected to receive future updates.

Usage

This data is ideally suited for training and evaluating models focused on sentiment detection and analysis. Key applications include:

Developing Deep Learning or Keras-based models for text classification.
Researching severity detection from short-form social media posts.
Projects focused on Natural Language Processing (NLP) and understanding public mood shifts.
Building predictive systems that gauge customer or public reactions to events or brands.

Coverage

The dataset captures the language and social interaction of global Twitter users. The time range is expansive, with samples and statistics covering tweet IDs generated across various dates, including examples spanning from early 2009 up to 2023, reflecting a specific period of Twitter history. There are no explicit geographic or demographic restrictions noted, as the data extraction relied on the standard Twitter API.

License

CC0: Public Domain

Who Can Use It

Machine Learning Engineers: For training robust sentiment classification algorithms.
NLP Researchers: To study linguistic patterns associated with specific polarities.
Academics: For conducting studies on social media behaviour and public opinion metrics.
Data Scientists: To perform exploratory data analysis on large volumes of unstructured text data.

Dataset Name Suggestions

Twitter Sentiment Polarity Corpus (1.6M)
Annotated Social Media Sentiment Dataset
Large-Scale Tweet Sentiment Analysis Data

Attributes

Original Data Source: Annotated Social Media Sentiment Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/11/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Annotated Social Media Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Twitter

Nlp

Tweets

Polarity

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS