Medical Specialist Bengali Data
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for medical specialist classification and Bengali Named Entity Recognition (NER), serving a vital role in various applications within the medical sector. It is an NLP dataset featuring approximately six hundred patient primary statements, totalling around eight thousand words, all in pure Bengali language. The data was meticulously created based on statements from medical outdoor receptionists and insights from doctors regarding the common health problems patients present. Patients initially describe their issues, and the receptionist then assists in guiding them to the appropriate specialist. All data has been verified by an authorised MBBS doctor. The dataset reflects general and common health issues faced by people, including primary symptoms of more significant problems such as headache, bone pain, and itching. This resource is particularly valuable as health complexes, health centres, and hospitals globally are experiencing increasingly busy periods.
Columns
- Patient ID: A unique identifier for each patient.
- Problem's token: Represents a specific word or token from the patient's problem description.
- Tag: An annotation label indicating the type of entity, such as 'O' (Other), 'V' (Verb), 'T' (Time), 'A' (Adverb), 'Bp' (Body part), or 'S' (Symptom).
- Gazetteers: A boolean tag, typically 'Y' or 'N', indicating if the token is part of a predefined list or gazetteer.
Distribution
This NLP dataset is structured with around six hundred patient primary statements, comprising approximately eight thousand words. It is manually partitioned into two distinct parts. One part is annotated with labels pertaining to patient symptoms, body parts, colours, body fluids, blood, times, values, directions, effluents, and adverbs. The second part is labelled with specific medical specialists, including medicine specialist, cardiologist, dentist, and gynaecologist.
Usage
This dataset is ideal for developing and training models for medical specialist classification and Bengali Named Entity Recognition. It can be applied in systems designed to assist healthcare facilities, such as health complexes, health centres, and hospitals, in managing patient flow and specialist recommendations, especially during busy periods.
Coverage
The dataset's geographic coverage is Asia. It represents general and common health issues and primary symptoms typically faced by people, derived from actual patient statements in pure Bengali language. The dataset version is 1.0, and it was listed on 17/06/2025.
License
CC By 4.0
Who Can Use It
- AI and ML Engineers: For training and validating machine learning models in healthcare.
- Natural Language Processing (NLP) Researchers: For advancing Bengali language processing in medical contexts.
- Healthcare Technology Developers: For creating innovative health information systems and patient triage tools.
- Data Scientists: For exploratory data analysis within the medical domain.
- Hospitals and Clinics: For implementing automated patient guidance and specialist referral systems.
Dataset Name Suggestions
- Bengali Medical Dataset
- Bengali Clinical NER Data
- Patient Symptom Bengali
- Bengali Healthcare NLP
- Medical Specialist Bengali Data
Attributes
Original Data Source: Bengali Medical Dataset