Opendatabay APP

East Africa Media Content Classifier

Knowledge Bundles

Tags and Keywords

Computer

Science

News

Nlp

Data

Cleaning

Trusted By
Trusted by company1Trusted by company2Trusted by company3
East Africa Media Content Classifier Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides key insights into media streams across East Africa, enabling tailored understanding related to racial tensions and social shifts. It allows researchers and data scientists to track classified news content from various countries in the region, offering a portrait of news stories from East African nations. With practical applications for understanding how culture shapes press reporting and how media outlets portray world events, this dataset is essential for any project related to understanding communication processes or tracking information flows within an interconnected global system. It is ideal for building machine learning models to classify news content automatically, categorising stories into topics like politics, economics, health, sports, environment, and entertainment.

Columns

  • text: The full article content of each news item. (String)
  • label: Categories or topics assigned to the article, such as 'kitaifa' (46%), 'michezo' (27%), and 'Other' (27%). There are 7338 unique label values. (String)
  • content: The full article content of each news item. (Text)
  • category: Categories or topics assigned to the article. (Categorical)
The articles are pre-labeled by human annotators, and the dataset is specifically for classifying Swahili texts. There are no date values associated with any of these columns.

Distribution

The data files are typically in CSV format, such as train_v0.2.csv and train.csv. The dataset contains labeled text data suitable for training machine learning models. Specific numbers for rows or records are not available within the provided information.

Usage

This dataset is perfect for anyone looking to build a machine learning model to classify news content across East Africa.
  • Automated News Classification: Create classifiers that can automatically identify and categorise news stories into specific topics like politics, economics, health, sports, environment, and entertainment.
  • Trend Prediction: Predict trend topics of news coverage by identifying news categories with the highest frequency of occurrences over given time periods.
  • Bias Detection: Identify and flag potential bias in news coverage across East Africa by analysing the prevalence of certain labels or topics to discover potential trends in reporting style.
  • Visibility Prediction: Develop a predictive model to determine which topic or category will have higher visibility based on the amount of related content published in different regions.

Coverage

The dataset focuses on news content from East Africa. While no specific time range is indicated due to the absence of date values, it covers news stories that allow for insights into racial tensions and social shifts within the region. The data is based on Swahili texts.

License

CC0

Who Can Use It

This dataset is intended for researchers and data scientists. It is suitable for anyone aiming to build machine learning models to classify news content. Users can leverage it to:
  • Track classified news content from different countries in the East African region.
  • Understand how culture shapes press reporting.
  • Track information flows within an interconnected global system.
  • Preprocess text data for various machine learning algorithms.

Dataset Name Suggestions

  • East African News Classification
  • Swahili News Classification Dataset
  • East Africa Media Content Classifier
  • African News Topic Data
  • Regional News Categorisation - East Africa

Attributes

Original Data Source: East African News Classification

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free