Uzbek Language Text Classification Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection features news articles gathered via web scraping from the Kun.uz news site. It serves as a valuable resource for text analysis and classification tasks, containing a large volume of Uzbek-language journalistic content. The data spans various topics, including domestic affairs, global events, economics, and culture, making it ideal for training machine learning models to identify and sort news categories.
Columns
The data structure includes four key fields:
- ID: A unique numerical identifier for each news record.
- title (Yangilik sarlavhasi): The headline or title of the news article.
- content (Yangilik matni): The full text body of the news story.
- target (Toifasi): The assigned category or type of news article (e.g., business, sport, or world news).
Distribution
The information is available in a CSV file format named final_kun_uz_dataset.csv, with a size of approximately 357.48 MB. It comprises 172,349 individual news records. The data is structured as tabular text ready for processing.
Usage
This news archive is particularly useful for advanced text-based machine learning applications. Ideal uses include developing and evaluating natural language processing (NLP) models, training text classifiers to automate news categorization, performing detailed linguistic analysis on contemporary Uzbek media, and understanding trends across different news sectors.
Coverage
The scope primarily covers news pertaining to Uzbekistan (which makes up 38% of the records) and world events (24% of the records). The article categories are diverse, encompassing domains such as Society (Jamiyat), Sport, Business (Biznes), Science and Technology (Fan va texnika), and Economics (Iqtisodiyot). The expected refresh rate for new data is weekly.
License
CC0: Public Domain
Who Can Use It
This material is beneficial for NLP practitioners focusing on low-resource languages, researchers interested in Uzbek media consumption, data scientists seeking substantial labelled datasets for classification model development, and educational institutions studying machine learning applications in news analysis.
Dataset Name Suggestions
- Kun.uz News Article Text Repository
- Uzbek Language Text Classification Data
- Kun.uz Scraped News Archive (172k Records)
- Uzbek News Categorisation Corpus
Attributes
Original Data Source: Uzbek Language Text Classification Data
Loading...
