Dark Mode

Home

Data Categories

AI & ML Data

Bengali Text Classification Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Bengali Text Classification Dataset

Social Media and Networking

Tags and Keywords

Social

Beginner

Nlp

Text

Trusted By

Bengali Text Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The Bengali Hate Speech Detection Dataset is a meticulously curated collection designed to aid in understanding the subtleties and nuances of offensive content within the Bengali language. As one of the world's most spoken languages, Bengali presents unique challenges and patterns in online communication. This dataset empowers researchers, linguists, and AI professionals to develop algorithms and models specifically tailored to recognise and combat hate speech within Bengali digital content. Leveraging this dataset can help foster a safer and more inclusive online environment for the vast Bengali-speaking community. Please note: The dataset and associated lexicons contain content that may be perceived as racist, sexist, homophobic, and generally offensive. This collection was curated strictly for research purposes.

Columns

text: This column contains the Bengali text snippets, which are the primary data points for analysis.
label: This column provides the classification or category for each text snippet, indicating the nature of the content (e.g., Religious, Geopolitical, Neutral, Personal).

Distribution

The dataset is typically provided in a CSV file format. Specific numbers for rows or records are not available in the provided details. A sample file will be updated separately to the platform.

Usage

This dataset is ideal for developing advanced algorithms and models focused on detecting and mitigating hate speech in Bengali digital content. It is particularly valuable for natural language processing (NLP) tasks, specifically text classification, aimed at identifying and categorising various forms of offensive language. Its application extends to contributing to a safer and more inclusive online environment.

Coverage

The dataset focuses on the Bengali language, which is widely spoken across the globe. It addresses online communication patterns and content specifically in Bengali, with a global regional scope.

License

CC By 4.0

Who Can Use It

Researchers and Linguists: For in-depth studies on offensive content, language nuances, and social dynamics in Bengali online communication.
AI Professionals and Data Scientists: For developing and refining machine learning models for hate speech detection, text classification, and natural language understanding (NLU).
Social Scientists: For analysing patterns of online behaviour and content within Bengali social networks.
Organisations and Developers: Aiming to create tools or platforms that promote digital safety and inclusivity for Bengali speakers.

Dataset Name Suggestions

Bengali Hate Speech Detection Dataset
Bengali Offensive Content Corpus
Bengali NLP Social Media Data
Bengali Text Classification Dataset

Attributes

Original Data Source: Bengali Hate Speech Detection Dataset - UCI

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Bengali Text Classification Dataset

Social Media and Networking

Tags and Keywords

Social

Beginner

Nlp

Text

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS