Opendatabay APP

TasnimNews Farsi Classification Data

Data Science and Analytics

Tags and Keywords

News

Farsi

Persian

Classification

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
TasnimNews Farsi Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset comprises news articles crawled from TasnimNews, a Persian (Farsi) news agency. It is specifically designed for text classification tasks, featuring a balanced distribution of news across various categories. The dataset offers a valuable resource for training and evaluating models in Farsi natural language processing.

Columns

  • category: Represents the news category, suitable for classification tasks. Examples include 'سیاسی' (Political) and 'رسانه ها' (Media), with 'سیاسی' being the most common category.
  • title: Contains the subject or topic of the news article, with a high number of unique values.
  • abstract: Provides a concise summary of the news content, also featuring many unique entries.
  • body: Holds the full text of the news article, with 'انتهای پیام/' (End of message/) being a common phrase.
  • time: Indicates the publication timestamp of the news item, with '۲۰ دی ۱۴۰۰ - ۰۹:۳۸' (10 January 2022 - 09:38) as a frequent value.

Distribution

The dataset is provided in a CSV format, with a file size of 335.49 MB. It contains approximately 63,500 valid records. A key characteristic is the equal distribution of news articles across each category, ensuring a balanced dataset for classification purposes.

Usage

This dataset is ideally suited for developing and testing text classification models in Farsi. It can be used for tasks such as categorising news articles, topic modelling, and exploring sentiment analysis within Persian media. Its structured nature makes it valuable for academic research and practical application in NLP.

Coverage

The dataset's scope is primarily Farsi (Persian) language news content sourced from TasnimNews. While a precise overall time range is not specified, common timestamps observed within the 'time' column suggest coverage around early 2022. The data collection focused on providing an equal number of articles per category, ensuring a balanced representation across different news topics.

License

CC0: Public Domain

Who Can Use It

Researchers and practitioners in Natural Language Processing (NLP) and machine learning can utilise this dataset for building and evaluating Farsi text classification systems. Data scientists interested in Persian language data for various analytical tasks, such as content categorisation or information retrieval, would also find it beneficial.

Dataset Name Suggestions

  • TasnimNews Farsi Text Classification Corpus
  • Persian News Articles for NLP
  • Farsi News Category Dataset
  • Iranian Tasnim News Dataset
  • TasnimNews Farsi Classification Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

31/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format