Dark Mode

Home

Data Categories

AI & ML Data

Ethiopian Media Classification Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Ethiopian Media Classification Data

Data Science and Analytics

Tags and Keywords

Amharic

Classification

News

Nlp

Ethiopia

Trusted By

Ethiopian Media Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Contains classified news articles written in Amharic, providing a highly useful resource for natural language processing research and machine learning model development. The collection includes raw text content, associated headlines, defined classification categories, and essential metadata, previously used to set a reference standard for Amharic news text classification models.

Columns

The data includes six primary columns:

headline: The main title or headline corresponding to the news piece (containing over 50,000 unique titles).
category: The predefined classification label assigned to the article. There are six unique categories, with ‘ሀገር አቀፍ ዜና’ representing the most frequent category.
date: The publication date of the news item, spanning approximately ten years.
views: The recorded view count for the article (data shows a mean of 778 views, but includes significant variance and missing values).
article: The full body text of the news story itself.
link: A URL pointing back to the original source location of the news article.

Distribution

The collection is supplied as a large CSV file named Amharic News Dataset.csv, which is approximately 191 MB in size. The structure contains approximately 51,500 valid records or rows.

Usage

This collection is ideal for developing, training, and testing text classification algorithms specifically tuned for less common or low-resource languages. It is suitable for creating machine learning benchmarks, conducting topic modeling, or performing sentiment analysis experiments focused on Amharic-language media content. To utilise the resource, the compressed data file must be extracted prior to running any code.

Coverage

The data covers news articles published over a significant period, spanning from 31 July 2011 through to 23 January 2021. Content is focused on Amharic news sources, and classification categories include broad topics such as Politics and Government.

License

CC BY-NC-SA 4.0

Who Can Use It

Researchers and students specialising in NLP for African languages.
Data scientists aiming to develop highly accurate classification and topic modeling systems.
Academics seeking to analyse historical media trends and discourse in Ethiopia or Amharic-speaking regions.

Dataset Name Suggestions

Amharic News Text Classification Corpus
Ethiopian Media Classification Data
Amharic NLP Baseline Dataset

Attributes

Original Data Source: Ethiopian Media Classification Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

16/11/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Ethiopian Media Classification Data

Data Science and Analytics

Tags and Keywords

Amharic

Classification

News

Nlp

Ethiopia

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS