Opendatabay APP

Bengali Text Classification Dataset

Social Media and Networking

Tags and Keywords

Social

Beginner

Nlp

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Bengali Text Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The Bengali Hate Speech Detection Dataset is a meticulously curated collection designed to aid in understanding the subtleties and nuances of offensive content within the Bengali language. As one of the world's most spoken languages, Bengali presents unique challenges and patterns in online communication. This dataset empowers researchers, linguists, and AI professionals to develop algorithms and models specifically tailored to recognise and combat hate speech within Bengali digital content. Leveraging this dataset can help foster a safer and more inclusive online environment for the vast Bengali-speaking community. Please note: The dataset and associated lexicons contain content that may be perceived as racist, sexist, homophobic, and generally offensive. This collection was curated strictly for research purposes.

Columns

  • text: This column contains the Bengali text snippets, which are the primary data points for analysis.
  • label: This column provides the classification or category for each text snippet, indicating the nature of the content (e.g., Religious, Geopolitical, Neutral, Personal).

Distribution

The dataset is typically provided in a CSV file format. Specific numbers for rows or records are not available in the provided details. A sample file will be updated separately to the platform.

Usage

This dataset is ideal for developing advanced algorithms and models focused on detecting and mitigating hate speech in Bengali digital content. It is particularly valuable for natural language processing (NLP) tasks, specifically text classification, aimed at identifying and categorising various forms of offensive language. Its application extends to contributing to a safer and more inclusive online environment.

Coverage

The dataset focuses on the Bengali language, which is widely spoken across the globe. It addresses online communication patterns and content specifically in Bengali, with a global regional scope.

License

CC By 4.0

Who Can Use It

  • Researchers and Linguists: For in-depth studies on offensive content, language nuances, and social dynamics in Bengali online communication.
  • AI Professionals and Data Scientists: For developing and refining machine learning models for hate speech detection, text classification, and natural language understanding (NLU).
  • Social Scientists: For analysing patterns of online behaviour and content within Bengali social networks.
  • Organisations and Developers: Aiming to create tools or platforms that promote digital safety and inclusivity for Bengali speakers.

Dataset Name Suggestions

  • Bengali Hate Speech Detection Dataset
  • Bengali Offensive Content Corpus
  • Bengali NLP Social Media Data
  • Bengali Text Classification Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free