Opendatabay APP

Twitter Sentiment Classification Data

Social Media and Networking

Tags and Keywords

Beginner

Social

Classification

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Twitter Sentiment Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a collection of tweets, each categorised by its sentiment. It is designed to assist in developing and evaluating machine learning models, particularly for natural language processing tasks. The primary aim is to distinguish between different sentiments expressed in tweets, helping to address issues like harmful content by enabling the creation of robust classifier models. Each entry includes the tweet text and its corresponding sentiment label, with a specific focus on identifying the exact word or phrase within the tweet that encapsulates that sentiment.

Columns

  • textID: A unique identifier for each tweet entry.
  • text: The full content of the tweet.
  • selected_text: The specific part of the tweet that best represents the given sentiment.
  • sentiment: The overall sentiment expressed in the tweet, categorised as neutral, positive, or other.

Distribution

The dataset contains approximately 27,500 tweets. It is typically provided in a CSV file format. The textID and text columns each contain 27,481 unique values, while the selected_text column has 22,464 unique values. The sentiment distribution is as follows: 40% are neutral, 31% are positive, and 28% fall into other sentiment categories. When processing the data from the CSV, it is important to remove any beginning or ending quotation marks from the text fields.

Usage

This dataset is ideally suited for tasks involving sentiment analysis and text classification. It can be used to build and train classification models that predict the sentiment of Twitter tweets. Furthermore, it allows for the comparison and evaluation of various classification algorithms based on their performance metrics in predicting sentiments. It is particularly useful for developing strong NLP-based classifier models to identify and categorise tweets by sentiment.

Coverage

The data originates from a global platform, Twitter, and the sentiment analysis is applicable across a wide range of content. The dataset's structure allows for analysis of sentiments in tweets, covering various topics and expressions globally. No specific time range or demographic scope is detailed beyond its global applicability.

License

CCO

Who Can Use It

This dataset is suitable for a diverse range of users, including beginners in data science and machine learning. It is especially beneficial for those interested in social network analysis, text classification, and natural language processing. Intended users include data scientists, researchers, and developers looking to build and test models for predicting social media sentiments or for applications like content moderation.

Dataset Name Suggestions

  • Twitter Tweet Sentiment Dataset
  • Tweet Sentiment Analysis Dataset
  • Social Media Sentiment Prediction Data
  • Twitter Sentiment Classification Data

Attributes

Original Data Source: Twitter Tweets Sentiment Dataset

Listing Stats

VIEWS

4

DOWNLOADS

0

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free