Opendatabay APP

Uzbek Language Text Classification Data

Data Science and Analytics

Tags and Keywords

News

Uzbek

Text

Classification

Articles

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Uzbek Language Text Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection features news articles gathered via web scraping from the Kun.uz news site. It serves as a valuable resource for text analysis and classification tasks, containing a large volume of Uzbek-language journalistic content. The data spans various topics, including domestic affairs, global events, economics, and culture, making it ideal for training machine learning models to identify and sort news categories.

Columns

The data structure includes four key fields:
  • ID: A unique numerical identifier for each news record.
  • title (Yangilik sarlavhasi): The headline or title of the news article.
  • content (Yangilik matni): The full text body of the news story.
  • target (Toifasi): The assigned category or type of news article (e.g., business, sport, or world news).

Distribution

The information is available in a CSV file format named final_kun_uz_dataset.csv, with a size of approximately 357.48 MB. It comprises 172,349 individual news records. The data is structured as tabular text ready for processing.

Usage

This news archive is particularly useful for advanced text-based machine learning applications. Ideal uses include developing and evaluating natural language processing (NLP) models, training text classifiers to automate news categorization, performing detailed linguistic analysis on contemporary Uzbek media, and understanding trends across different news sectors.

Coverage

The scope primarily covers news pertaining to Uzbekistan (which makes up 38% of the records) and world events (24% of the records). The article categories are diverse, encompassing domains such as Society (Jamiyat), Sport, Business (Biznes), Science and Technology (Fan va texnika), and Economics (Iqtisodiyot). The expected refresh rate for new data is weekly.

License

CC0: Public Domain

Who Can Use It

This material is beneficial for NLP practitioners focusing on low-resource languages, researchers interested in Uzbek media consumption, data scientists seeking substantial labelled datasets for classification model development, and educational institutions studying machine learning applications in news analysis.

Dataset Name Suggestions

  • Kun.uz News Article Text Repository
  • Uzbek Language Text Classification Data
  • Kun.uz Scraped News Archive (172k Records)
  • Uzbek News Categorisation Corpus

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

15/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format