Annotated Tweet Sentiment Dataset
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A large-scale sentiment dataset containing one million tweets, each expertly annotated into four distinct categories: positive, negative, uncertainty, and litigious. This dataset is specifically designed for sentiment analysis, enabling users to detect and analyse public sentiment expressed on social media.
Columns
- Language: Specifies the language of the tweet text. The dataset includes 72 unique languages, with English being the most prevalent at 93%.
- Text: Contains the raw tweet content for analysis, with 929,544 unique text entries.
- Label: The assigned sentiment category for each tweet, indicating whether it is positive, negative, uncertainty, or litigious. There are 4 unique labels, with positive and negative each accounting for 28% of the records.
Distribution
The dataset is provided as a CSV file (dataset.csv) and is approximately 167.74 MB in size. It comprises around 938,000 valid records across its 3 columns, though it is referred to as containing 1 million tweets. A sample file would be updated separately to the platform.
Usage
Ideal for sentiment analysis tasks and developing models to understand emotional tone in text. It is suitable for Data Analytics, Exploratory Data Analysis, Natural Language Processing (NLP), and Deep Learning projects. It can also be utilised with libraries such as NLTK.
Coverage
The dataset features tweets in a wide range of languages, primarily English (93%), suggesting a global, albeit English-dominant, scope. There is no specific geographic or demographic information beyond the language distribution. The dataset is static and has an expected update frequency of "Never", meaning it represents a fixed snapshot in time, with no specified time range for the tweets themselves.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: For training and evaluating sentiment classification models.
- Researchers: Studying social media trends, public opinion, and linguistic patterns related to sentiment.
- Academics: Utilising a real-world, pre-labelled dataset for educational purposes in NLP and data science courses.
- Developers: Integrating sentiment detection capabilities into applications.
Dataset Name Suggestions
- Million Tweet Sentiment Data
- Twitter Sentiment Analysis Dataset
- Large-Scale Tweet Sentiment Corpus
- Public Domain Tweet Sentiment Data
- Annotated Tweet Sentiment Dataset
Attributes
Original Data Source: Annotated Tweet Sentiment Dataset