Opendatabay APP

Portuguese Word Sentiment Dataset

Social Media and Networking

Tags and Keywords

Social

Nlp

Languages

Sentiment

Portuguese,

Lexicon

Twitter

Reviews

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Portuguese Word Sentiment Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a curated list of Portuguese words along with their corresponding sentiment labels. It enables comparative sentiment analysis for content sourced from both Twitter and Buscapé reviews. Each word has a human-annotated sentiment score, ranging from negative to positive with numeric values, allowing for nuanced categorisation and comparison. It serves as an invaluable resource for tasks like mining social media conversations and analysing customer feedback.

Columns

  • Word: The Portuguese word from the lexicon.
  • Sentiment_Score: The numerical sentiment label assigned to the word. Labels include -1 for negative, 0 for neutral, and +1 for positive sentiments.

Distribution

The dataset is provided as a CSV file, specifically named portuguese_lexicon.csv. It comprises a total of 114 unique words in its lexicon, each with an associated sentiment score. The dataset is derived from 3,457 tweets and 476 Buscapé reviews. Users will need an environment capable of reading CSV files that contain both text and numerical data to utilise this resource effectively.

Usage

This dataset is ideal for various applications in natural language processing (NLP) and sentiment analysis, including:
  • Applying to machine learning models for sentiment analysis, text classification, and automated opinion summarisation.
  • Comparing words or phrases within texts or across different datasets to understand expressed opinions.
  • Identifying trends in customer opinions over time by comparing sentiment from Twitter and Buscapé reviews.
  • Understanding how customer review sentiment compares across different Portuguese languages and dialects.
  • Utilising customer feedback for analytics purposes and gaining insights into public opinion on products based on textual expressions.

Coverage

The dataset's scope covers reviews written in Portuguese from both Twitter and Buscapé, originating from Portuguese-speaking areas. It is considered to have global region relevance. No specific time range or demographic scope beyond "Portuguese-speaking areas" is detailed in the sources.

License

CCO

Who Can Use It

This dataset is suitable for:
  • Data scientists and machine learning engineers working on NLP tasks.
  • Researchers interested in social media analysis and cross-platform sentiment comparison.
  • Businesses and analysts aiming to mine social media conversations and analyse customer feedback for decision-making.
  • Anyone requiring a linguistically labeled database for Portuguese text analysis.

Dataset Name Suggestions

  • Portuguese Sentiment Lexicon
  • Portuguese Social Media Sentiment Corpus
  • Portuguese Word Sentiment Dataset
  • Twitter Buscapé Portuguese Sentiment Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free