TasnimNews Farsi Classification Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset comprises news articles crawled from TasnimNews, a Persian (Farsi) news agency. It is specifically designed for text classification tasks, featuring a balanced distribution of news across various categories. The dataset offers a valuable resource for training and evaluating models in Farsi natural language processing.
Columns
- category: Represents the news category, suitable for classification tasks. Examples include 'سیاسی' (Political) and 'رسانه ها' (Media), with 'سیاسی' being the most common category.
- title: Contains the subject or topic of the news article, with a high number of unique values.
- abstract: Provides a concise summary of the news content, also featuring many unique entries.
- body: Holds the full text of the news article, with 'انتهای پیام/' (End of message/) being a common phrase.
- time: Indicates the publication timestamp of the news item, with '۲۰ دی ۱۴۰۰ - ۰۹:۳۸' (10 January 2022 - 09:38) as a frequent value.
Distribution
The dataset is provided in a CSV format, with a file size of 335.49 MB. It contains approximately 63,500 valid records. A key characteristic is the equal distribution of news articles across each category, ensuring a balanced dataset for classification purposes.
Usage
This dataset is ideally suited for developing and testing text classification models in Farsi. It can be used for tasks such as categorising news articles, topic modelling, and exploring sentiment analysis within Persian media. Its structured nature makes it valuable for academic research and practical application in NLP.
Coverage
The dataset's scope is primarily Farsi (Persian) language news content sourced from TasnimNews. While a precise overall time range is not specified, common timestamps observed within the 'time' column suggest coverage around early 2022. The data collection focused on providing an equal number of articles per category, ensuring a balanced representation across different news topics.
License
CC0: Public Domain
Who Can Use It
Researchers and practitioners in Natural Language Processing (NLP) and machine learning can utilise this dataset for building and evaluating Farsi text classification systems. Data scientists interested in Persian language data for various analytical tasks, such as content categorisation or information retrieval, would also find it beneficial.
Dataset Name Suggestions
- TasnimNews Farsi Text Classification Corpus
- Persian News Articles for NLP
- Farsi News Category Dataset
- Iranian Tasnim News Dataset
- TasnimNews Farsi Classification Data
Attributes
Original Data Source: TasnimNews Farsi Classification Data