Dark Mode

Home

Data Categories

Web & Social Media Data

Tweet Sentiment Extraction Pseudo Labels

FREE DATASET LIBRARY

Verified Data Provider

£0

Tweet Sentiment Extraction Pseudo Labels

Social Media and Networking

Tags and Keywords

News

Social

Networks

Email

Messaging

Nlp

Trusted By

Tweet Sentiment Extraction Pseudo Labels Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains pseudo-labelled tweets, specifically curated for the Twitter Sentiment Extraction competition. It offers valuable pre-processed and original tweet texts, alongside their respective sentiments and extracted pseudo-labels. The dataset serves as an excellent resource for developing and evaluating models for sentiment analysis on social media content.

Columns

textID: Unique identifiers for competition tweet texts.
sentiment: The assigned sentiment of the tweet (e.g., positive, negative, neutral).
author: The Twitter handle.
text: The pre-processed version of the tweet.
old_text: The original tweet.
aux_id: Auxiliary identifiers for competition tweet texts.
new_sentiment: An additional sentiment label.
selected_text: The pseudo-label representing the portion of the text that conveys the sentiment.

Distribution

The dataset is typically provided in a CSV file format. It features approximately 12,520 records, with unique values observed across various columns. For instance, there are around 12,520 unique textID entries and 11,599 unique authors. The sentiment distribution includes roughly 35% negative, 34% neutral, and 32% other sentiments. Further sentiment categorisations within 'new_sentiment' show a significant portion as null (69%), with 12% neutral and varying percentages for other sentiments like 'happy' and 'good'. Specific details on file size are not provided in the sources, but the number of records suggests a sizeable collection.

Usage

This dataset is ideally suited for tasks involving natural language processing (NLP), particularly sentiment analysis and emotion detection in text. It can be used for training machine learning models to identify sentiment, for text extraction tasks, and for research into social media communication patterns. Potential applications include customer feedback analysis, brand monitoring, and public opinion tracking.

Coverage

The dataset has global coverage, making it suitable for analyses without geographic restrictions. While it was listed on 24 June 2025, specific time ranges for the tweet collection are not detailed. There are no explicit demographic scopes mentioned, ensuring broad applicability.

License

CC0

Who Can Use It

This dataset is beneficial for data scientists, machine learning engineers, and researchers focused on NLP. It is particularly useful for those participating in sentiment extraction competitions or working on projects that require training models to understand and extract sentiment from social media posts. Developers building applications that interpret user sentiment or perform content moderation will also find it valuable.

Dataset Name Suggestions

Tweet Sentiment Extraction Pseudo Labels
Social Media Sentiment Dataset
Twitter Sentiment Analysis Data
Pseudo-Labelled Tweet Corpus
Tweet Emotion Extraction Dataset

Attributes

Original Data Source: tweet-sentiment-extraction-2020-complete-pseudo

Listing Stats

VIEWS

DOWNLOADS

LISTED

24/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...