Opendatabay APP

Annotated Tweet Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Tweets

Nlp

Analysis

Labels

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Annotated Tweet Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A large-scale sentiment dataset containing one million tweets, each expertly annotated into four distinct categories: positive, negative, uncertainty, and litigious. This dataset is specifically designed for sentiment analysis, enabling users to detect and analyse public sentiment expressed on social media.

Columns

  • Language: Specifies the language of the tweet text. The dataset includes 72 unique languages, with English being the most prevalent at 93%.
  • Text: Contains the raw tweet content for analysis, with 929,544 unique text entries.
  • Label: The assigned sentiment category for each tweet, indicating whether it is positive, negative, uncertainty, or litigious. There are 4 unique labels, with positive and negative each accounting for 28% of the records.

Distribution

The dataset is provided as a CSV file (dataset.csv) and is approximately 167.74 MB in size. It comprises around 938,000 valid records across its 3 columns, though it is referred to as containing 1 million tweets. A sample file would be updated separately to the platform.

Usage

Ideal for sentiment analysis tasks and developing models to understand emotional tone in text. It is suitable for Data Analytics, Exploratory Data Analysis, Natural Language Processing (NLP), and Deep Learning projects. It can also be utilised with libraries such as NLTK.

Coverage

The dataset features tweets in a wide range of languages, primarily English (93%), suggesting a global, albeit English-dominant, scope. There is no specific geographic or demographic information beyond the language distribution. The dataset is static and has an expected update frequency of "Never", meaning it represents a fixed snapshot in time, with no specified time range for the tweets themselves.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For training and evaluating sentiment classification models.
  • Researchers: Studying social media trends, public opinion, and linguistic patterns related to sentiment.
  • Academics: Utilising a real-world, pre-labelled dataset for educational purposes in NLP and data science courses.
  • Developers: Integrating sentiment detection capabilities into applications.

Dataset Name Suggestions

  • Million Tweet Sentiment Data
  • Twitter Sentiment Analysis Dataset
  • Large-Scale Tweet Sentiment Corpus
  • Public Domain Tweet Sentiment Data
  • Annotated Tweet Sentiment Dataset

Attributes

Original Data Source: Annotated Tweet Sentiment Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

22/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format