Opendatabay APP

Multi-Class Arabic Text Dataset

Data Science and Analytics

Tags and Keywords

Nlp

Arabic_dataset_classifiction

Tabular

Beginner

Intermediate

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Multi-Class Arabic Text Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Arabic texts designed for classification tasks. It captures modern Arabic language as it appears in newspaper articles, featuring alphabetic, numeric, and symbolic words. Its structure allows for evaluating the efficiency and robustness of various Arabic text classification and indexing document systems.

Columns

While specific column names are not explicitly provided, a typical structure for a classification dataset like this would include:
  • text: The actual Arabic news article content.
  • category: The assigned classification label for each article (e.g., sport, politic, culture, economy, diverse).

Distribution

The dataset comprises 111,728 documents, containing a total of 319,254,124 words. It is structured in text files, typically available in a CSV format. The documents are categorised into five distinct classes: sport, politic, culture, economy, and diverse, with the number of documents and words varying across these classes.

Usage

This dataset is ideal for a range of applications, including:
  • Developing and testing Arabic text classification models.
  • Building robust Arabic document indexing systems.
  • Research into modern Arabic language processing.
  • Training machine learning models for news categorisation.

Coverage

The dataset focuses on modern Arabic language, sourced from news articles published by three prominent Arabic online newspapers: Assabah, Hespress, and Akhbarona. The content covers five main categories: sport, politic, culture, economy, and diverse.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is suitable for:
  • NLP Practitioners: For developing and refining Arabic language models.
  • Researchers: Studying text classification, natural language understanding, and Arabic linguistics.
  • Beginner, Intermediate, and Advanced Data Scientists: Engaged in text mining and machine learning projects.
  • Developers: Building applications that require automated categorisation of Arabic news content.

Dataset Name Suggestions

  • Modern Arabic News Text Classification
  • Arabic News Article Categories
  • Multi-Class Arabic Text Dataset
  • Arabic Newspaper Content for NLP
  • Assabah Hespress Akhbarona Dataset

Attributes

Original Data Source:Multi-Class Arabic Text Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format