Dark Mode

Home

Data Categories

AI & ML Data

Bangla Social Media Emotion Corpus

FREE DATASET LIBRARY

Verified Data Provider

£0

Bangla Social Media Emotion Corpus

Knowledge Bundles

Tags and Keywords

Computer

Universities

Nlp

Deep

Multilabel

Trusted By

Bangla Social Media Emotion Corpus Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is specifically designed for detecting fine-grained emotions in Bangla textual data, particularly from social media comments. It addresses existing challenges in developing robust emotion detection models for the low-resourced Bangla language, focusing on improving both the size and cross-domain adaptability of available resources. The dataset features 22,698 manually annotated public comments collected from various social media platforms. These comments are labelled across six distinct emotion categories derived from the Junto Emotion Wheel, covering 12 diverse domains such as Personal, Politics, and Health. Significant effort was invested in preparing this data to maintain its rich linguistic nuances and to present a meaningful challenge for classification models. Initial experiments indicate that traditional hand-crafted features may outperform neural networks and pretrained language models for this specific task.

Columns

The dataset includes the following columns:

ID: A unique identifier for each entry.
Data: The original social media comment text.
Love: A binary label, '1' if the comment expresses Love, '0' otherwise.
Joy: A binary label, '1' if the comment expresses Joy, '0' otherwise.
Surprise: A binary label, '1' if the comment expresses Surprise, '0' otherwise.
Anger: A binary label, '1' if the comment expresses Anger, '0' otherwise.
Sadness: A binary label, '1' if the comment expresses Sadness, '0' otherwise.
Fear: A binary label, '1' if the comment expresses Fear, '0' otherwise.
Topic: The specific topic of the comment.
Domain: The source social media platform from which the comment was collected, including YouTube, Facebook, and Twitter.

Distribution

The dataset comprises 22,698 Bangla public comments. A dedicated test set within the dataset contains 2,272 samples. Data files are typically provided in a CSV format. Each emotion category is represented by a binary label (0 or 1), indicating the presence or absence of that particular emotion.

Usage

This dataset is ideally suited for:

Developing and evaluating emotion detection models for Bangla text.
Research in Natural Language Processing (NLP) and Deep Learning, specifically for text classification tasks.
Creating multilabel classification systems to identify multiple emotions within a single text.
Analysing sentiment and emotional trends in social media content.
Building benchmark systems for analysing fine-grained emotions on noisy Bangla texts.

Coverage

The dataset's scope includes:

Geographic Focus: Predominantly covers the Bangla language, suggesting relevance to regions like Bangladesh and West Bengal, India.
Data Source: Public comments extracted from major social media sites, including YouTube (76%), Facebook (22%), and other platforms (2%).
Content Domains: The comments span 12 different domains, with significant proportions from Personal (25%) and Politics (17%), alongside a broad category of other topics (57%).
Emotion Categories: Labels are provided for six fine-grained emotions: Love, Joy, Surprise, Anger, Sadness, and Fear.

License

CC-BY

Who Can Use It

This dataset is valuable for:

Universities and Colleges: For academic research, projects, and educational purposes in NLP and AI.
Researchers and Data Scientists: Focused on emotion analysis, sentiment detection, or working with low-resource languages.
Machine Learning Engineers: Developing and training models for text classification and social media content analysis.
Organisations: Interested in monitoring public sentiment or content moderation in Bangla.

Dataset Name Suggestions

Bangla Emotion Dataset
EmoNoBa
Bangla Social Media Emotion Corpus
Fine-Grained Bangla Text Emotion Analysis Dataset

Attributes

Original Data Source: Bangla Emotion Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

16/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Bangla Social Media Emotion Corpus

Knowledge Bundles

Tags and Keywords

Computer

Universities

Nlp

Deep

Multilabel

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS