Dark Mode

Home

Data Categories

AI & ML Data

Uzbek Language Text Classification Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Uzbek Language Text Classification Data

Data Science and Analytics

Tags and Keywords

News

Uzbek

Text

Classification

Articles

Trusted By

Uzbek Language Text Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection features news articles gathered via web scraping from the Kun.uz news site. It serves as a valuable resource for text analysis and classification tasks, containing a large volume of Uzbek-language journalistic content. The data spans various topics, including domestic affairs, global events, economics, and culture, making it ideal for training machine learning models to identify and sort news categories.

Columns

The data structure includes four key fields:

ID: A unique numerical identifier for each news record.
title (Yangilik sarlavhasi): The headline or title of the news article.
content (Yangilik matni): The full text body of the news story.
target (Toifasi): The assigned category or type of news article (e.g., business, sport, or world news).

Distribution

The information is available in a CSV file format named final_kun_uz_dataset.csv, with a size of approximately 357.48 MB. It comprises 172,349 individual news records. The data is structured as tabular text ready for processing.

Usage

This news archive is particularly useful for advanced text-based machine learning applications. Ideal uses include developing and evaluating natural language processing (NLP) models, training text classifiers to automate news categorization, performing detailed linguistic analysis on contemporary Uzbek media, and understanding trends across different news sectors.

Coverage

The scope primarily covers news pertaining to Uzbekistan (which makes up 38% of the records) and world events (24% of the records). The article categories are diverse, encompassing domains such as Society (Jamiyat), Sport, Business (Biznes), Science and Technology (Fan va texnika), and Economics (Iqtisodiyot). The expected refresh rate for new data is weekly.

License

CC0: Public Domain

Who Can Use It

This material is beneficial for NLP practitioners focusing on low-resource languages, researchers interested in Uzbek media consumption, data scientists seeking substantial labelled datasets for classification model development, and educational institutions studying machine learning applications in news analysis.

Dataset Name Suggestions

Kun.uz News Article Text Repository
Uzbek Language Text Classification Data
Kun.uz Scraped News Archive (172k Records)
Uzbek News Categorisation Corpus

Attributes

Original Data Source: Uzbek Language Text Classification Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

15/10/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Uzbek Language Text Classification Data

Data Science and Analytics

Tags and Keywords

News

Uzbek

Text

Classification

Articles

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS