Opendatabay APP

Medical Specialist Bengali Data

Health Information Systems & Technology

Tags and Keywords

Classification

Exploratory

Nlp

Nltk

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Medical Specialist Bengali Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for medical specialist classification and Bengali Named Entity Recognition (NER), serving a vital role in various applications within the medical sector. It is an NLP dataset featuring approximately six hundred patient primary statements, totalling around eight thousand words, all in pure Bengali language. The data was meticulously created based on statements from medical outdoor receptionists and insights from doctors regarding the common health problems patients present. Patients initially describe their issues, and the receptionist then assists in guiding them to the appropriate specialist. All data has been verified by an authorised MBBS doctor. The dataset reflects general and common health issues faced by people, including primary symptoms of more significant problems such as headache, bone pain, and itching. This resource is particularly valuable as health complexes, health centres, and hospitals globally are experiencing increasingly busy periods.

Columns

  • Patient ID: A unique identifier for each patient.
  • Problem's token: Represents a specific word or token from the patient's problem description.
  • Tag: An annotation label indicating the type of entity, such as 'O' (Other), 'V' (Verb), 'T' (Time), 'A' (Adverb), 'Bp' (Body part), or 'S' (Symptom).
  • Gazetteers: A boolean tag, typically 'Y' or 'N', indicating if the token is part of a predefined list or gazetteer.

Distribution

This NLP dataset is structured with around six hundred patient primary statements, comprising approximately eight thousand words. It is manually partitioned into two distinct parts. One part is annotated with labels pertaining to patient symptoms, body parts, colours, body fluids, blood, times, values, directions, effluents, and adverbs. The second part is labelled with specific medical specialists, including medicine specialist, cardiologist, dentist, and gynaecologist.

Usage

This dataset is ideal for developing and training models for medical specialist classification and Bengali Named Entity Recognition. It can be applied in systems designed to assist healthcare facilities, such as health complexes, health centres, and hospitals, in managing patient flow and specialist recommendations, especially during busy periods.

Coverage

The dataset's geographic coverage is Asia. It represents general and common health issues and primary symptoms typically faced by people, derived from actual patient statements in pure Bengali language. The dataset version is 1.0, and it was listed on 17/06/2025.

License

CC By 4.0

Who Can Use It

  • AI and ML Engineers: For training and validating machine learning models in healthcare.
  • Natural Language Processing (NLP) Researchers: For advancing Bengali language processing in medical contexts.
  • Healthcare Technology Developers: For creating innovative health information systems and patient triage tools.
  • Data Scientists: For exploratory data analysis within the medical domain.
  • Hospitals and Clinics: For implementing automated patient guidance and specialist referral systems.

Dataset Name Suggestions

  • Bengali Medical Dataset
  • Bengali Clinical NER Data
  • Patient Symptom Bengali
  • Bengali Healthcare NLP
  • Medical Specialist Bengali Data

Attributes

Original Data Source: Bengali Medical Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

ASIA

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free