Dark Mode

Home

Data Categories

AI & ML Data

NLP Preprocessed Sentiment Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

NLP Preprocessed Sentiment Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Deep

Lstm

Nltk

Sentiment

Comments

Analysis

Trusted By

NLP Preprocessed Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a substantial collection of over 241,000 English-language comments, gathered from various online platforms. Each comment within the dataset has been carefully annotated with a sentiment label: 0 for negative sentiment, 1 for neutral, and 2 for positive. The primary aim of this dataset is to facilitate the training and evaluation of multi-class sentiment analysis models, designed to work effectively with real-world text data. The dataset has undergone a preprocessing stage, ensuring comments are in lowercase, and are cleaned of punctuation, URLs, numbers, and stopwords, making it readily usable for Natural Language Processing (NLP) pipelines.

Columns

Comment: This column contains the user-generated text content.
Sentiment: This column provides the corresponding sentiment label for each comment, where 0 denotes Negative, 1 denotes Neutral, and 2 denotes Positive.

Distribution

The dataset comprises over 241,000 records. While the specific file format is not detailed, such datasets are typically provided in a tabular format, often as a CSV file. It is structured with two distinct columns as described above, suitable for direct integration into machine learning workflows.

Usage

This dataset is ideally suited for a variety of applications and use cases, including:

Training sentiment classifiers utilising advanced models such as LSTM, BiLSTM, CNN, BERT, or RoBERTa.
Evaluating the efficacy of different preprocessing and tokenisation strategies for text data.
Benchmarking NLP models on multi-class classification tasks to assess their performance.
Supporting educational projects and research initiatives in the fields of opinion mining or text classification.
Fine-tuning transformer models on a large and diverse collection of sentiment-annotated text.

Coverage

The dataset's coverage is global, comprising English-language comments. It focuses on general user-generated text content without specific demographic notes. The dataset is listed with a version of 1.0.

License

CC0

Who Can Use It

This dataset is suitable for individuals and organisations involved in data science and analytics. Intended users include:

Data Scientists and Machine Learning Engineers for developing and deploying sentiment analysis models.
Researchers and Academics for studies in NLP, text classification, and opinion mining.
Students undertaking educational projects in artificial intelligence and machine learning.

Dataset Name Suggestions

Multi-class Comment Sentiment Data
User Text Sentiment Collection
Online Comment Sentiment Analysis Dataset
English Sentiment Labelled Comments
Preprocessed Sentiment Dataset

Attributes

Original Data Source: Sentiment Analysis Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

05/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

NLP Preprocessed Sentiment Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Deep

Lstm

Nltk

Sentiment

Comments

Analysis

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS