Twitter Sentiment Analysis Training Data
Social Media and Posts
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A collection of 1,600,000 tweets extracted via the Twitter API, designed for sentiment analysis tasks. Each tweet has been annotated with a polarity label, designating it as either negative or positive, making it a valuable resource for training and evaluating models that detect sentiment in text.
Columns
- target: The polarity of the tweet, where '0' signifies a negative sentiment and '4' indicates a positive sentiment.
- ids: A unique identification number for each tweet.
- date: The specific date and time the tweet was posted.
- flag: The query term used to collect the tweet. If no specific query was used, this field is marked as 'NO_QUERY'.
- user: The username of the account that posted the tweet.
- text: The complete text content of the tweet.
Distribution
The data is provided in a single CSV file named
tweets.csv
with a size of approximately 145.63 MB. It contains 1,600,000 records across 6 columns.Usage
This data is ideal for developing sentiment analysis models, classification algorithms, and tools for social media monitoring. Researchers can use it for projects related to sentence similarity, survey analysis, and understanding public opinion on social networks.
Coverage
The data was collected via the Twitter API. A specific time range is not provided, but sample dates include May 2009. The geographic and demographic scope is global, reflecting the diverse user base of Twitter. No specific details on data availability for certain groups or years are available.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists: For training and validating machine learning models for sentiment classification.
- Machine Learning Engineers: To build applications that can automatically detect the emotional tone of text.
- Academic Researchers: For studies in computational linguistics, social science, and online communication patterns.
- Market Researchers: To analyse public sentiment towards brands, products, or events.
Dataset Name Suggestions
- Sentiment140 Twitter Polarity Analysis
- Annotated Tweet Sentiment Corpus
- Positive and Negative Tweet Dataset for NLP
- Twitter Sentiment Analysis Training Data
Attributes
Original Data Source: Twitter Sentiment Analysis Training Data