Opendatabay APP

Bangla News Classification Dataset

Entertainment & Media Consumption

Tags and Keywords

Earth

Nlp

Text

Bengali

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Bangla News Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a large collection of text articles in the Bengali (Bangla) language, primarily sourced from the Jamuna TV website. It contains over 11,000 rows and is specifically designed for machine learning and natural language processing (NLP) tasks. The dataset features a wide variety of news articles covering events, updates, and diverse topics, organised into five main categories: Sports, All-Bangladesh, International, Entertainment, and National.

Columns

The dataset includes the following columns for each article:
  • Title: The headline of the news article.
  • Published Date: The date and time when the article was published.
  • Reporter: The name of the reporter, if available.
  • Category: The specific category the article belongs to (e.g., Sports, Entertainment).
  • URL: The direct link to the full news article online.
  • Content: A brief summary or an excerpt of the article's main text.

Distribution

The dataset is provided in CSV format. It contains over 11,000 rows, representing individual news articles. Specific numbers for rows or records beyond "over 11,000" are not available.

Usage

This dataset is ideal for various applications and use cases in machine learning and natural language processing:
  • Text Classification: Developing and training models to automatically categorise news articles.
  • Sentiment Analysis: Assessing the sentiment or emotional tone expressed within the articles.
  • Information Retrieval: Creating systems that can efficiently find relevant articles based on user queries.
  • Language Modelling: Building and enhancing language models and tools specifically for the Bengali language.
  • Research: Serving as a valuable resource for NLP research focused on the Bengali language.
  • Education: Utilisation in educational settings for teaching machine learning and NLP concepts.
  • Application Development: Assisting in the creation of applications that process Bengali text, such as news aggregators or recommendation systems.

Coverage

The dataset is entirely in the Bengali (Bangla) language, focusing on NLP tasks specific to this language. The articles are sourced from the Jamuna TV website, implying a primary geographic scope related to Bangladesh media. A specific time range for the news articles within the dataset is not detailed in the sources.

License

CC-BY

Who Can Use It

This dataset is intended for a range of users and their specific needs:
  • Researchers: Those conducting NLP research related to the Bengali language will find it a useful resource.
  • Educators and Students: Ideal for use in educational settings for teaching machine learning and NLP principles and practices.
  • Application Developers: Suitable for developers looking to build applications that process Bengali text, such as news aggregation platforms or content recommendation systems.

Dataset Name Suggestions

  • Bangla News Classification Dataset
  • Jamuna TV Bengali News Corpus
  • Bengali NLP News Articles
  • Bangla Text Analysis Dataset

Attributes

Original Data Source: Over 11,500 Bangla News for NLP

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

16/06/2025

REGION

ASIA

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free