Slovak News Article Classification Dataset
Fraud Detection & Risk Management
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset was developed as part of a bachelor's thesis, primarily to address the scarcity of publicly available data for text classification in the Slovak language. It serves as a valuable resource for demonstrating the robustness of models used in natural language processing across different languages. While not as expansive as English datasets, it was meticulously created manually to maintain objectivity and relevance, making it suitable for training various machine learning models, particularly for fake news detection.
Columns
- id: A unique row number for each entry.
- date: The publication date of the news article.
- title: The title of the news article.
- text: The full text content of the news article.
- src: The source from which the article was obtained.
- check: A placeholder for verification status, currently marked as 'to be determined'.
- label: The classification label, where '0' indicates a fake article and '1' indicates a true article.
Distribution
The dataset is provided in CSV file format. It contains 100 individually labelled Slovak news articles, primarily sourced from early 2023. The articles are evenly distributed with 50 entries labelled as 'fake' (0) and 50 entries labelled as 'true' (1). Specific numbers for rows or records beyond the total of 100 are not available.
Usage
This dataset is ideal for a range of applications, including:
- Training and evaluating text classification models for identifying fake news in the Slovak language.
- Research into natural language processing (NLP) in low-resource languages.
- Demonstrating cross-lingual model robustness.
- Developing solutions for fraud detection and risk management related to information authenticity.
Coverage
The dataset's geographic scope is focused on Slovak news articles, representing content from Slovakia or Slovak-speaking regions. The time range of the articles is from early 2023. There are no specific demographic notes beyond the focus on Slovak language content. The dataset includes 100 articles.
License
CC0
Who Can Use It
This dataset is intended for a variety of users, including:
- Students and Researchers: For academic projects and research focusing on NLP, text classification, or fake news detection.
- Data Scientists and AI Developers: For building and training machine learning models for language-specific content analysis.
- Organisations: Involved in media analysis, content moderation, or risk assessment for online information.
Dataset Name Suggestions
- Dezinfo SK - Fake News Dataset
- Slovak Fake News Articles
- Slovak News Article Classification Dataset
- Slovak Text Classification Dataset
Attributes
Original Data Source: Dezinfo SK - Fake News Dataset