Opendatabay APP

NLP Preprocessed Sentiment Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Deep

Lstm

Nltk

Sentiment

Comments

Analysis

Trusted By
Trusted by company1Trusted by company2Trusted by company3
NLP Preprocessed Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a substantial collection of over 241,000 English-language comments, gathered from various online platforms. Each comment within the dataset has been carefully annotated with a sentiment label: 0 for negative sentiment, 1 for neutral, and 2 for positive. The primary aim of this dataset is to facilitate the training and evaluation of multi-class sentiment analysis models, designed to work effectively with real-world text data. The dataset has undergone a preprocessing stage, ensuring comments are in lowercase, and are cleaned of punctuation, URLs, numbers, and stopwords, making it readily usable for Natural Language Processing (NLP) pipelines.

Columns

  • Comment: This column contains the user-generated text content.
  • Sentiment: This column provides the corresponding sentiment label for each comment, where 0 denotes Negative, 1 denotes Neutral, and 2 denotes Positive.

Distribution

The dataset comprises over 241,000 records. While the specific file format is not detailed, such datasets are typically provided in a tabular format, often as a CSV file. It is structured with two distinct columns as described above, suitable for direct integration into machine learning workflows.

Usage

This dataset is ideally suited for a variety of applications and use cases, including:
  • Training sentiment classifiers utilising advanced models such as LSTM, BiLSTM, CNN, BERT, or RoBERTa.
  • Evaluating the efficacy of different preprocessing and tokenisation strategies for text data.
  • Benchmarking NLP models on multi-class classification tasks to assess their performance.
  • Supporting educational projects and research initiatives in the fields of opinion mining or text classification.
  • Fine-tuning transformer models on a large and diverse collection of sentiment-annotated text.

Coverage

The dataset's coverage is global, comprising English-language comments. It focuses on general user-generated text content without specific demographic notes. The dataset is listed with a version of 1.0.

License

CC0

Who Can Use It

This dataset is suitable for individuals and organisations involved in data science and analytics. Intended users include:
  • Data Scientists and Machine Learning Engineers for developing and deploying sentiment analysis models.
  • Researchers and Academics for studies in NLP, text classification, and opinion mining.
  • Students undertaking educational projects in artificial intelligence and machine learning.

Dataset Name Suggestions

  • Multi-class Comment Sentiment Data
  • User Text Sentiment Collection
  • Online Comment Sentiment Analysis Dataset
  • English Sentiment Labelled Comments
  • Preprocessed Sentiment Dataset

Attributes

Original Data Source: Sentiment Analysis Dataset

Listing Stats

VIEWS

5

DOWNLOADS

1

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free