Dark Mode

Home

Data Categories

Web & Social Media Data

Annotated Tweet Sentiment Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Annotated Tweet Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Tweets

Nlp

Analysis

Labels

Trusted By

Annotated Tweet Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A large-scale sentiment dataset containing one million tweets, each expertly annotated into four distinct categories: positive, negative, uncertainty, and litigious. This dataset is specifically designed for sentiment analysis, enabling users to detect and analyse public sentiment expressed on social media.

Columns

Language: Specifies the language of the tweet text. The dataset includes 72 unique languages, with English being the most prevalent at 93%.
Text: Contains the raw tweet content for analysis, with 929,544 unique text entries.
Label: The assigned sentiment category for each tweet, indicating whether it is positive, negative, uncertainty, or litigious. There are 4 unique labels, with positive and negative each accounting for 28% of the records.

Distribution

The dataset is provided as a CSV file (dataset.csv) and is approximately 167.74 MB in size. It comprises around 938,000 valid records across its 3 columns, though it is referred to as containing 1 million tweets. A sample file would be updated separately to the platform.

Usage

Ideal for sentiment analysis tasks and developing models to understand emotional tone in text. It is suitable for Data Analytics, Exploratory Data Analysis, Natural Language Processing (NLP), and Deep Learning projects. It can also be utilised with libraries such as NLTK.

Coverage

The dataset features tweets in a wide range of languages, primarily English (93%), suggesting a global, albeit English-dominant, scope. There is no specific geographic or demographic information beyond the language distribution. The dataset is static and has an expected update frequency of "Never", meaning it represents a fixed snapshot in time, with no specified time range for the tweets themselves.

License

CC0: Public Domain

Who Can Use It

Data Scientists and Machine Learning Engineers: For training and evaluating sentiment classification models.
Researchers: Studying social media trends, public opinion, and linguistic patterns related to sentiment.
Academics: Utilising a real-world, pre-labelled dataset for educational purposes in NLP and data science courses.
Developers: Integrating sentiment detection capabilities into applications.

Dataset Name Suggestions

Million Tweet Sentiment Data
Twitter Sentiment Analysis Dataset
Large-Scale Tweet Sentiment Corpus
Public Domain Tweet Sentiment Data
Annotated Tweet Sentiment Dataset

Attributes

Original Data Source: Annotated Tweet Sentiment Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

22/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Annotated Tweet Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Tweets

Nlp

Analysis

Labels

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS