Portuguese Word Sentiment Dataset
Social Media and Networking
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated list of Portuguese words along with their corresponding sentiment labels. It enables comparative sentiment analysis for content sourced from both Twitter and Buscapé reviews. Each word has a human-annotated sentiment score, ranging from negative to positive with numeric values, allowing for nuanced categorisation and comparison. It serves as an invaluable resource for tasks like mining social media conversations and analysing customer feedback.
Columns
- Word: The Portuguese word from the lexicon.
- Sentiment_Score: The numerical sentiment label assigned to the word. Labels include -1 for negative, 0 for neutral, and +1 for positive sentiments.
Distribution
The dataset is provided as a CSV file, specifically named
portuguese_lexicon.csv
. It comprises a total of 114 unique words in its lexicon, each with an associated sentiment score. The dataset is derived from 3,457 tweets and 476 Buscapé reviews. Users will need an environment capable of reading CSV files that contain both text and numerical data to utilise this resource effectively.Usage
This dataset is ideal for various applications in natural language processing (NLP) and sentiment analysis, including:
- Applying to machine learning models for sentiment analysis, text classification, and automated opinion summarisation.
- Comparing words or phrases within texts or across different datasets to understand expressed opinions.
- Identifying trends in customer opinions over time by comparing sentiment from Twitter and Buscapé reviews.
- Understanding how customer review sentiment compares across different Portuguese languages and dialects.
- Utilising customer feedback for analytics purposes and gaining insights into public opinion on products based on textual expressions.
Coverage
The dataset's scope covers reviews written in Portuguese from both Twitter and Buscapé, originating from Portuguese-speaking areas. It is considered to have global region relevance. No specific time range or demographic scope beyond "Portuguese-speaking areas" is detailed in the sources.
License
CCO
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers working on NLP tasks.
- Researchers interested in social media analysis and cross-platform sentiment comparison.
- Businesses and analysts aiming to mine social media conversations and analyse customer feedback for decision-making.
- Anyone requiring a linguistically labeled database for Portuguese text analysis.
Dataset Name Suggestions
- Portuguese Sentiment Lexicon
- Portuguese Social Media Sentiment Corpus
- Portuguese Word Sentiment Dataset
- Twitter Buscapé Portuguese Sentiment Data
Attributes
Original Data Source: Portuguese Sentiment Corpus for Twitter and