Bangla Social Media Emotion Corpus
Knowledge Bundles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is specifically designed for detecting fine-grained emotions in Bangla textual data, particularly from social media comments. It addresses existing challenges in developing robust emotion detection models for the low-resourced Bangla language, focusing on improving both the size and cross-domain adaptability of available resources. The dataset features 22,698 manually annotated public comments collected from various social media platforms. These comments are labelled across six distinct emotion categories derived from the Junto Emotion Wheel, covering 12 diverse domains such as Personal, Politics, and Health. Significant effort was invested in preparing this data to maintain its rich linguistic nuances and to present a meaningful challenge for classification models. Initial experiments indicate that traditional hand-crafted features may outperform neural networks and pretrained language models for this specific task.
Columns
The dataset includes the following columns:
- ID: A unique identifier for each entry.
- Data: The original social media comment text.
- Love: A binary label, '1' if the comment expresses Love, '0' otherwise.
- Joy: A binary label, '1' if the comment expresses Joy, '0' otherwise.
- Surprise: A binary label, '1' if the comment expresses Surprise, '0' otherwise.
- Anger: A binary label, '1' if the comment expresses Anger, '0' otherwise.
- Sadness: A binary label, '1' if the comment expresses Sadness, '0' otherwise.
- Fear: A binary label, '1' if the comment expresses Fear, '0' otherwise.
- Topic: The specific topic of the comment.
- Domain: The source social media platform from which the comment was collected, including YouTube, Facebook, and Twitter.
Distribution
The dataset comprises 22,698 Bangla public comments. A dedicated test set within the dataset contains 2,272 samples. Data files are typically provided in a CSV format. Each emotion category is represented by a binary label (0 or 1), indicating the presence or absence of that particular emotion.
Usage
This dataset is ideally suited for:
- Developing and evaluating emotion detection models for Bangla text.
- Research in Natural Language Processing (NLP) and Deep Learning, specifically for text classification tasks.
- Creating multilabel classification systems to identify multiple emotions within a single text.
- Analysing sentiment and emotional trends in social media content.
- Building benchmark systems for analysing fine-grained emotions on noisy Bangla texts.
Coverage
The dataset's scope includes:
- Geographic Focus: Predominantly covers the Bangla language, suggesting relevance to regions like Bangladesh and West Bengal, India.
- Data Source: Public comments extracted from major social media sites, including YouTube (76%), Facebook (22%), and other platforms (2%).
- Content Domains: The comments span 12 different domains, with significant proportions from Personal (25%) and Politics (17%), alongside a broad category of other topics (57%).
- Emotion Categories: Labels are provided for six fine-grained emotions: Love, Joy, Surprise, Anger, Sadness, and Fear.
License
CC-BY
Who Can Use It
This dataset is valuable for:
- Universities and Colleges: For academic research, projects, and educational purposes in NLP and AI.
- Researchers and Data Scientists: Focused on emotion analysis, sentiment detection, or working with low-resource languages.
- Machine Learning Engineers: Developing and training models for text classification and social media content analysis.
- Organisations: Interested in monitoring public sentiment or content moderation in Bangla.
Dataset Name Suggestions
- Bangla Emotion Dataset
- EmoNoBa
- Bangla Social Media Emotion Corpus
- Fine-Grained Bangla Text Emotion Analysis Dataset
Attributes
Original Data Source: Bangla Emotion Dataset