Dark Mode

Home

Data Categories

AI & ML Data

Yektanet Persian Web Text Classification Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Yektanet Persian Web Text Classification Dataset

Data Science and Analytics

Tags and Keywords

Earth

Travel

Nlp

Text

Classification

Machine

Learning

Trusted By

Yektanet Persian Web Text Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The Yektanet Dataset is a real Persian web data collection, meticulously refined and gathered by the Yektanet platform. Its primary purpose is to serve as an industrial case study for applying machine learning in Natural Language Processing (NLP) [1]. This dataset enables the development of machine learning models capable of predicting the categorical topic of a document based on its text features, such as the title, description, and full text content [1]. It provides a valuable resource for training and evaluating machine learning models in document categorisation and topic prediction [1].

Columns

The dataset consists of multiple instances, each containing various features that provide information about the documents [1]. The main target variable is the category column, which indicates the topic or category of the content [1]. Additional features include:

Description: This column provides a description of the document [1].
Text_content: This column holds the complete text content of the document [1].
Title: This column represents the title of the document [1].
h1 and h2: These columns contain content found within the HTML tags h1 and h2, respectively [1].
URL: This column specifies the link address associated with the document [1].
Domain: This column indicates the domain or website from which the document originates [1].
Id: This column represents the unique identifier for each link [1].

Distribution

The Yektanet dataset comprises multiple instances, with approximately 5206 records based on the distribution of category labels [1, 2]. The dataset includes unique values for columns such as ID (4786 unique values), text content (4720 unique values), title (4614 unique values), and description (4399 unique values) [3]. Data files are typically provided in CSV format [4].

Usage

This dataset is ideally suited for developing and evaluating machine learning models for document categorisation and topic prediction tasks [1]. It can be used for applications involving Natural Language Processing (NLP), such as:

Training machine learning models to predict document topics [1].
Developing text classification systems [1].
Research into real-world web data analysis [1].
Exploring feature engineering for NLP tasks [1].

Coverage

The Yektanet dataset is a real Persian web data collection [1]. Its region of coverage is global [5]. It includes content across various topics, with dominant categories such as 'سلامت' (health) at 13% and 'ورزش' (sports) at 11% [3]. The data availability is not restricted to specific groups or years beyond being a current web data collection [1].

License

CC By

Who Can Use It

The dataset is primarily intended for researchers and practitioners in the fields of machine learning and Natural Language Processing (NLP) [1]. Ideal users include data scientists, AI/ML engineers, academics, and anyone interested in document classification, topic modelling, or working with Persian text data [1].

Dataset Name Suggestions

Yektanet Persian Web Text Classification Dataset
Persian Document Topic Prediction Data
Yektanet NLP Classification Corpus
Web Text Categorisation Dataset (Persian)
Yektanet Machine Learning Text Dataset

Attributes

Original Data Source: Yektanet( Dataset for Text Classification)

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Yektanet Persian Web Text Classification Dataset

Data Science and Analytics

Tags and Keywords

Earth

Travel

Nlp

Text

Classification

Machine

Learning

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS