Twitter Sentiment Classification Data
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of tweets, each categorised by its sentiment. It is designed to assist in developing and evaluating machine learning models, particularly for natural language processing tasks. The primary aim is to distinguish between different sentiments expressed in tweets, helping to address issues like harmful content by enabling the creation of robust classifier models. Each entry includes the tweet text and its corresponding sentiment label, with a specific focus on identifying the exact word or phrase within the tweet that encapsulates that sentiment.
Columns
- textID: A unique identifier for each tweet entry.
- text: The full content of the tweet.
- selected_text: The specific part of the tweet that best represents the given sentiment.
- sentiment: The overall sentiment expressed in the tweet, categorised as neutral, positive, or other.
Distribution
The dataset contains approximately 27,500 tweets. It is typically provided in a CSV file format. The
textID
and text
columns each contain 27,481 unique values, while the selected_text
column has 22,464 unique values. The sentiment distribution is as follows: 40% are neutral, 31% are positive, and 28% fall into other sentiment categories. When processing the data from the CSV, it is important to remove any beginning or ending quotation marks from the text fields.Usage
This dataset is ideally suited for tasks involving sentiment analysis and text classification. It can be used to build and train classification models that predict the sentiment of Twitter tweets. Furthermore, it allows for the comparison and evaluation of various classification algorithms based on their performance metrics in predicting sentiments. It is particularly useful for developing strong NLP-based classifier models to identify and categorise tweets by sentiment.
Coverage
The data originates from a global platform, Twitter, and the sentiment analysis is applicable across a wide range of content. The dataset's structure allows for analysis of sentiments in tweets, covering various topics and expressions globally. No specific time range or demographic scope is detailed beyond its global applicability.
License
CCO
Who Can Use It
This dataset is suitable for a diverse range of users, including beginners in data science and machine learning. It is especially beneficial for those interested in social network analysis, text classification, and natural language processing. Intended users include data scientists, researchers, and developers looking to build and test models for predicting social media sentiments or for applications like content moderation.
Dataset Name Suggestions
- Twitter Tweet Sentiment Dataset
- Tweet Sentiment Analysis Dataset
- Social Media Sentiment Prediction Data
- Twitter Sentiment Classification Data
Attributes
Original Data Source: Twitter Tweets Sentiment Dataset