Daryo Uz News Classification Data
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This tabular data product compiles 175,217 news stories scraped from the Daryo.uz news portal. It captures the headline, the full article text, and its corresponding category type. It is designed to facilitate robust training datasets for Natural Language Processing (NLP) models, offering insights into regional news coverage and language patterns. The dataset is expected to receive regular updates.
Columns
The structure consists of three primary fields:
- title: Contains the headline or title of the news story (Sarlavhasi).
- content: Provides the full body text of the news article (Yangilik matni). Note that 596 records currently contain missing values in this field.
- target: Specifies the classification category of the news item (Yangilik toifasi).
Distribution
The dataset is provided in a tabular format, specifically as a CSV file named daryo_data.csv, with a size of approximately 271.94 MB. It includes 175,217 total records across 3 columns. There are seven unique target categories available for classification. The distribution of categories is uneven, with 'mahalliy' being the most common type at 42% and 'dunyo' following at 27%.
Usage
This data is ideally suited for:
- Developing and evaluating text classification algorithms.
- Training NLP models for sentiment analysis or topic modeling in regional news.
- Studying linguistic features and vocabulary common in contemporary news media.
Coverage
The scope is based entirely on articles published on the Daryo.uz website. Given the nature of the categories ('Mahalliy', 'Dunyo'), the data pertains to both local events and global affairs, primarily originating from a source in Asia. The data is intended to be updated on a weekly basis, maintaining high topical relevance.
License
CC0: Public Domain
Who Can Use It
Intended users include data scientists needing large text corpora for deep learning applications, academic researchers studying media trends or language analysis, and intermediate to advanced data professionals seeking classification tasks.
Dataset Name Suggestions
- Daryo Uz News Classification Data
- Asian News Article Corpus
- Tabular News Text for NLP
- Weekly Daryo Uz News Archive
Attributes
Original Data Source: Daryo Uz News Classification Data
Loading...
