Annotated Social Media Sentiment Dataset
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Provides a large volume of annotated social media data for sentiment analysis, consisting of 1.6 million tweets extracted using the Twitter Opensource API. The primary purpose is to enable machine learning models to detect sentiment polarity. Tweets are labelled according to their sentiment, facilitating tasks related to text classification, Natural Language Processing, and understanding public opinion.
Columns
- target: Represents the polarity of the tweet. Values are typically 0 for negative, 2 for neutral, and 4 for positive.
- ids: The unique identifier assigned to the specific tweet.
- date: The timestamp indicating when the tweet was posted.
- flag: Originally used for query tracking; for this collection, it often defaults to NO_QUERY.
- user: The username of the person who created the tweet.
- text: The full content of the tweet itself.
Distribution
The data is presented in a tabular format, typically a CSV file (e.g.,
tweets.csv), with a total size of approximately 238.8 MB. It contains 6 distinct fields and includes 1.6 million valid records. While the file usually refers to 1 million data points in context, the validated record count is higher. Specific row counts are precise, but the dataset is not expected to receive future updates.Usage
This data is ideally suited for training and evaluating models focused on sentiment detection and analysis. Key applications include:
- Developing Deep Learning or Keras-based models for text classification.
- Researching severity detection from short-form social media posts.
- Projects focused on Natural Language Processing (NLP) and understanding public mood shifts.
- Building predictive systems that gauge customer or public reactions to events or brands.
Coverage
The dataset captures the language and social interaction of global Twitter users. The time range is expansive, with samples and statistics covering tweet IDs generated across various dates, including examples spanning from early 2009 up to 2023, reflecting a specific period of Twitter history. There are no explicit geographic or demographic restrictions noted, as the data extraction relied on the standard Twitter API.
License
CC0: Public Domain
Who Can Use It
- Machine Learning Engineers: For training robust sentiment classification algorithms.
- NLP Researchers: To study linguistic patterns associated with specific polarities.
- Academics: For conducting studies on social media behaviour and public opinion metrics.
- Data Scientists: To perform exploratory data analysis on large volumes of unstructured text data.
Dataset Name Suggestions
- Twitter Sentiment Polarity Corpus (1.6M)
- Annotated Social Media Sentiment Dataset
- Large-Scale Tweet Sentiment Analysis Data
Attributes
Original Data Source: Annotated Social Media Sentiment Dataset
Loading...
