Opendatabay APP

Disease Symptom Classifier Dataset

Healthcare Insurance & Costs

Tags and Keywords

Health

Classification

Nlp

Deep

Diseases

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Disease Symptom Classifier Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a curated collection of disease labels paired with natural language descriptions of symptoms. Its primary purpose is to facilitate the development of language models capable of accurately predicting potential diseases based on user-provided symptom descriptions. Such models hold significant potential for enabling early disease identification, allowing individuals to seek prompt medical attention and treatment. Furthermore, it supports the creation of applications for remote diagnosis and treatment recommendations, particularly useful in situations where in-person consultations may not be feasible or desirable.

Columns

The dataset consists of two main columns:
  • label: This column contains the specific disease labels associated with each symptom description.
  • text: This column provides the natural language descriptions of the symptoms experienced.

Distribution

The dataset is typically provided in a CSV file format. It comprises a total of 1200 datapoints. These datapoints are structured around 24 distinct diseases, with each disease having 50 corresponding symptom descriptions.

Usage

This dataset is ideal for various applications and use cases, including:
  • Developing and training natural language processing (NLP) models for disease prediction.
  • Creating AI-powered tools for early identification of health conditions.
  • Building virtual assistants or telemedicine platforms that offer remote diagnostic support.
  • Researching classification algorithms in the medical and healthcare domain.
  • Analysing disease patterns and symptom correlations.

Coverage

The dataset's coverage is global, making it suitable for a wide range of applications without regional limitations. It specifically includes 24 different diseases: Psoriasis, Varicose Veins, Typhoid, Chicken pox, Impetigo, Dengue, Fungal infection, Common Cold, Pneumonia, Dimorphic Hemorrhoids, Arthritis, Acne, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Jaundice, Malaria, urinary tract infection, allergy, gastroesophageal reflux disease, drug reaction, peptic ulcer disease, and diabetes. Information on specific time ranges or demographic scopes is not available in the provided details.

License

CCO

Who Can Use It

This dataset is intended for a variety of users, including:
  • Data Scientists and Machine Learning Engineers: To build and refine models for medical diagnostics and NLP tasks.
  • Healthcare Technology Developers: To integrate symptom analysis capabilities into healthcare applications and platforms.
  • Researchers: To conduct studies on disease prediction, language understanding in a medical context, and the application of deep learning to health data.
  • Students: As a valuable resource for learning and practicing data science and AI skills within the healthcare domain.

Dataset Name Suggestions

  • Symptom2Disease Dataset
  • Disease Symptom Classifier
  • Medical Symptom Description Data
  • Healthcare NLP Diagnostic Dataset

Attributes

Original Data Source: Symptom2Disease

Listing Stats

VIEWS

1

DOWNLOADS

3

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free