Opendatabay APP

Annotated Social Media Sentiment Dataset

Social Media and Posts

Tags and Keywords

Sentiment

Twitter

Nlp

Tweets

Polarity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Annotated Social Media Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Provides a large volume of annotated social media data for sentiment analysis, consisting of 1.6 million tweets extracted using the Twitter Opensource API. The primary purpose is to enable machine learning models to detect sentiment polarity. Tweets are labelled according to their sentiment, facilitating tasks related to text classification, Natural Language Processing, and understanding public opinion.

Columns

  • target: Represents the polarity of the tweet. Values are typically 0 for negative, 2 for neutral, and 4 for positive.
  • ids: The unique identifier assigned to the specific tweet.
  • date: The timestamp indicating when the tweet was posted.
  • flag: Originally used for query tracking; for this collection, it often defaults to NO_QUERY.
  • user: The username of the person who created the tweet.
  • text: The full content of the tweet itself.

Distribution

The data is presented in a tabular format, typically a CSV file (e.g., tweets.csv), with a total size of approximately 238.8 MB. It contains 6 distinct fields and includes 1.6 million valid records. While the file usually refers to 1 million data points in context, the validated record count is higher. Specific row counts are precise, but the dataset is not expected to receive future updates.

Usage

This data is ideally suited for training and evaluating models focused on sentiment detection and analysis. Key applications include:
  • Developing Deep Learning or Keras-based models for text classification.
  • Researching severity detection from short-form social media posts.
  • Projects focused on Natural Language Processing (NLP) and understanding public mood shifts.
  • Building predictive systems that gauge customer or public reactions to events or brands.

Coverage

The dataset captures the language and social interaction of global Twitter users. The time range is expansive, with samples and statistics covering tweet IDs generated across various dates, including examples spanning from early 2009 up to 2023, reflecting a specific period of Twitter history. There are no explicit geographic or demographic restrictions noted, as the data extraction relied on the standard Twitter API.

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Engineers: For training robust sentiment classification algorithms.
  • NLP Researchers: To study linguistic patterns associated with specific polarities.
  • Academics: For conducting studies on social media behaviour and public opinion metrics.
  • Data Scientists: To perform exploratory data analysis on large volumes of unstructured text data.

Dataset Name Suggestions

  • Twitter Sentiment Polarity Corpus (1.6M)
  • Annotated Social Media Sentiment Dataset
  • Large-Scale Tweet Sentiment Analysis Data

Attributes

Listing Stats

VIEWS

4

DOWNLOADS

0

LISTED

17/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format