Opendatabay APP

Vigo University Spanish Fake News Detection Data

News & Media Articles

Tags and Keywords

Spanish

News

Politics

Fake

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Vigo University Spanish Fake News Detection Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Identification of misinformation within the Spanish political landscape is a critical challenge addressed by this collection of news articles. Developed as part of a thesis at the University of Vigo, the material supports the training of Transformer models to distinguish between authentic reporting and fabricated content. The records incorporate legitimate news articles obtained through web scraping from reputable Spanish publications like 'Público,' 'La Marea,' and 'El Común,' alongside deceptive entries created through manual data alteration or artificial intelligence generation.

Columns

The data is structured using a semicolon (;) delimiter and contains the following fields:
  • ID: A unique numerical identifier assigned to each individual news item within the collection.
  • Label: A binary classifier where a value of '1' denotes verified news and '0' indicates fake news, essential for training classification models.
  • Titulo: The headline of the news item, providing a succinct summary that captures the main content of the report.
  • Descripcion: A detailed narrative providing expanded context, further details, and information regarding the political events or topics mentioned.
  • Fecha: The publication date of the article, recorded in a day/month/year format (e.g. 19/04/2023).

Distribution

The material is delivered in a CSV file titled D57000_complete.csv, with a total size of 19.96 MB. It contains 57.2k records. While the ID field is 100% valid with unique identifiers, several other fields exhibit significant proportions of missing or mismatched data in this version; for instance, the headline field is approximately 48% valid, while the detailed description is roughly 26% valid. The publication date field is approximately 13% valid. The expected update frequency for this resource is set to never.

Usage

This resource is designed for the development and evaluation of fake news detection algorithms, specifically for natural language processing tasks in the Spanish language. It is suitable for training Transformer models, conducting linguistic analysis of political misinformation, and exploring the differences between human-generated reporting and AI-generated deceptive text. Researchers can also use it to study the patterns of data alteration used to create political disinformation.

Coverage

The scope is focused on the political domain within Spain. The timeframe of the collected articles spans from April 2017 to June 2023. The material covers both authentic journalism from established digital newspapers and various forms of disinformation, providing a snapshot of the Spanish media landscape during this period.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

Academic researchers and students investigating media integrity or linguistic patterns in politics can utilise this for thesis projects and large-scale studies. Data scientists and AI engineers can use the corpus to refine and test text classification models. Political analysts may find the data useful for observing the evolution of disinformation tactics and the influence of AI on political narratives over time.

Dataset Name Suggestions

  • Spanish Political Misinformation Corpus
  • Vigo University Spanish Fake News Detection Data
  • Transformer Training Set: Spanish Politics
  • Spanish News Authenticity and AI-Generated Content

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

19/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format