Bangla News Classification Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a large collection of text articles in the Bengali (Bangla) language, primarily sourced from the Jamuna TV website. It contains over 11,000 rows and is specifically designed for machine learning and natural language processing (NLP) tasks. The dataset features a wide variety of news articles covering events, updates, and diverse topics, organised into five main categories: Sports, All-Bangladesh, International, Entertainment, and National.
Columns
The dataset includes the following columns for each article:
- Title: The headline of the news article.
- Published Date: The date and time when the article was published.
- Reporter: The name of the reporter, if available.
- Category: The specific category the article belongs to (e.g., Sports, Entertainment).
- URL: The direct link to the full news article online.
- Content: A brief summary or an excerpt of the article's main text.
Distribution
The dataset is provided in CSV format. It contains over 11,000 rows, representing individual news articles. Specific numbers for rows or records beyond "over 11,000" are not available.
Usage
This dataset is ideal for various applications and use cases in machine learning and natural language processing:
- Text Classification: Developing and training models to automatically categorise news articles.
- Sentiment Analysis: Assessing the sentiment or emotional tone expressed within the articles.
- Information Retrieval: Creating systems that can efficiently find relevant articles based on user queries.
- Language Modelling: Building and enhancing language models and tools specifically for the Bengali language.
- Research: Serving as a valuable resource for NLP research focused on the Bengali language.
- Education: Utilisation in educational settings for teaching machine learning and NLP concepts.
- Application Development: Assisting in the creation of applications that process Bengali text, such as news aggregators or recommendation systems.
Coverage
The dataset is entirely in the Bengali (Bangla) language, focusing on NLP tasks specific to this language. The articles are sourced from the Jamuna TV website, implying a primary geographic scope related to Bangladesh media. A specific time range for the news articles within the dataset is not detailed in the sources.
License
CC-BY
Who Can Use It
This dataset is intended for a range of users and their specific needs:
- Researchers: Those conducting NLP research related to the Bengali language will find it a useful resource.
- Educators and Students: Ideal for use in educational settings for teaching machine learning and NLP principles and practices.
- Application Developers: Suitable for developers looking to build applications that process Bengali text, such as news aggregation platforms or content recommendation systems.
Dataset Name Suggestions
- Bangla News Classification Dataset
- Jamuna TV Bengali News Corpus
- Bengali NLP News Articles
- Bangla Text Analysis Dataset
Attributes
Original Data Source: Over 11,500 Bangla News for NLP