Opendatabay APP

Ethiopian Media Classification Data

Data Science and Analytics

Tags and Keywords

Amharic

Classification

News

Nlp

Ethiopia

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Ethiopian Media Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Contains classified news articles written in Amharic, providing a highly useful resource for natural language processing research and machine learning model development. The collection includes raw text content, associated headlines, defined classification categories, and essential metadata, previously used to set a reference standard for Amharic news text classification models.

Columns

The data includes six primary columns:
  • headline: The main title or headline corresponding to the news piece (containing over 50,000 unique titles).
  • category: The predefined classification label assigned to the article. There are six unique categories, with ‘ሀገር አቀፍ ዜና’ representing the most frequent category.
  • date: The publication date of the news item, spanning approximately ten years.
  • views: The recorded view count for the article (data shows a mean of 778 views, but includes significant variance and missing values).
  • article: The full body text of the news story itself.
  • link: A URL pointing back to the original source location of the news article.

Distribution

The collection is supplied as a large CSV file named Amharic News Dataset.csv, which is approximately 191 MB in size. The structure contains approximately 51,500 valid records or rows.

Usage

This collection is ideal for developing, training, and testing text classification algorithms specifically tuned for less common or low-resource languages. It is suitable for creating machine learning benchmarks, conducting topic modeling, or performing sentiment analysis experiments focused on Amharic-language media content. To utilise the resource, the compressed data file must be extracted prior to running any code.

Coverage

The data covers news articles published over a significant period, spanning from 31 July 2011 through to 23 January 2021. Content is focused on Amharic news sources, and classification categories include broad topics such as Politics and Government.

License

CC BY-NC-SA 4.0

Who Can Use It

  • Researchers and students specialising in NLP for African languages.
  • Data scientists aiming to develop highly accurate classification and topic modeling systems.
  • Academics seeking to analyse historical media trends and discourse in Ethiopia or Amharic-speaking regions.

Dataset Name Suggestions

  • Amharic News Text Classification Corpus
  • Ethiopian Media Classification Data
  • Amharic NLP Baseline Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

16/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format