Opendatabay APP

Slovak News Article Classification Dataset

Fraud Detection & Risk Management

Tags and Keywords

News

Text

Classification

Nlp

Slovak

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Slovak News Article Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset was developed as part of a bachelor's thesis, primarily to address the scarcity of publicly available data for text classification in the Slovak language. It serves as a valuable resource for demonstrating the robustness of models used in natural language processing across different languages. While not as expansive as English datasets, it was meticulously created manually to maintain objectivity and relevance, making it suitable for training various machine learning models, particularly for fake news detection.

Columns

  • id: A unique row number for each entry.
  • date: The publication date of the news article.
  • title: The title of the news article.
  • text: The full text content of the news article.
  • src: The source from which the article was obtained.
  • check: A placeholder for verification status, currently marked as 'to be determined'.
  • label: The classification label, where '0' indicates a fake article and '1' indicates a true article.

Distribution

The dataset is provided in CSV file format. It contains 100 individually labelled Slovak news articles, primarily sourced from early 2023. The articles are evenly distributed with 50 entries labelled as 'fake' (0) and 50 entries labelled as 'true' (1). Specific numbers for rows or records beyond the total of 100 are not available.

Usage

This dataset is ideal for a range of applications, including:
  • Training and evaluating text classification models for identifying fake news in the Slovak language.
  • Research into natural language processing (NLP) in low-resource languages.
  • Demonstrating cross-lingual model robustness.
  • Developing solutions for fraud detection and risk management related to information authenticity.

Coverage

The dataset's geographic scope is focused on Slovak news articles, representing content from Slovakia or Slovak-speaking regions. The time range of the articles is from early 2023. There are no specific demographic notes beyond the focus on Slovak language content. The dataset includes 100 articles.

License

CC0

Who Can Use It

This dataset is intended for a variety of users, including:
  • Students and Researchers: For academic projects and research focusing on NLP, text classification, or fake news detection.
  • Data Scientists and AI Developers: For building and training machine learning models for language-specific content analysis.
  • Organisations: Involved in media analysis, content moderation, or risk assessment for online information.

Dataset Name Suggestions

  • Dezinfo SK - Fake News Dataset
  • Slovak Fake News Articles
  • Slovak News Article Classification Dataset
  • Slovak Text Classification Dataset

Attributes

Original Data Source: Dezinfo SK - Fake News Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format