Lusophone Market News Sentiment Analysis Dataset
Stock & Market Data
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Analysing financial sentiment in Brazilian Portuguese provides a critical benchmark for tailoring language models to the nuances of the Lusophone economic market. This resource builds upon the established Financial Phrase Bank to offer a set of economic texts manually annotated for semantic orientation. By bridging the gap between English-centric datasets and the specific linguistic requirements of the Portuguese-speaking financial world, it facilitates the development of automated tools for market monitoring and investment sentiment tracking.
Columns
- y: The sentiment label assigned by a consensus of human annotators, categorised as neutral (accounting for 59% of records), positive (28%), or negative.
- text: The raw financial news snippet in its original English language as found in the source dataset.
- text_pt: The manually validated translation of the news snippet into Brazilian Portuguese, designed for local sentiment analysis tasks.
Distribution
The data is provided in a CSV file titled
financial_phrase_bank_pt_br.csv with a size of approximately 1.37 MB. It contains 4,845 valid records across three columns. The dataset demonstrates high integrity with 100% validity for the sentiment labels and text fields, with no missing or mismatched entries. This is a static archive, and no future updates are expected.Usage
This collection is specifically designed for fine-tuning natural language processing (NLP) models on sentiment analysis tasks within the finance domain. It serves as a valuable resource for training multi-class classification algorithms, evaluating machine translation quality for technical economic texts, and developing text-mining pipelines for Portuguese-speaking stock markets.
Coverage
The scope involves 4,845 manually annotated financial news records. While the original English texts often reflect international or Finnish economic contexts, the translated content provides direct access to the Brazilian Portuguese language for sentiment detection. The dataset provides labels based on a consensus of expert annotators, ensuring a reliable standard for training machine learning models like PyTorch.
License
CC BY-SA 3.0
Who Can Use It
Data scientists can use these records to benchmark Portuguese sentiment analysis models or practice text classification techniques. Financial analysts can employ the dataset to build tools that monitor news trends for investment signals in the Brazilian market. Additionally, academic researchers in computational linguistics may utilise the parallel English-Portuguese texts to study the semantic nuances of economic discourse.
Dataset Name Suggestions
- Brazilian Portuguese Financial Sentiment Benchmark
- Portuguese Financial Phrase Bank Translation
- Manually Validated Brazilian Economic Sentiment Records
- Lusophone Market News Sentiment Analysis Dataset
- Portuguese-English Parallel Financial Sentiment Corpus
Attributes
Original Data Source: Lusophone Market News Sentiment Analysis Dataset
Loading...
Free
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
