Disease Symptom Classifier Dataset
Healthcare Insurance & Costs
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated collection of disease labels paired with natural language descriptions of symptoms. Its primary purpose is to facilitate the development of language models capable of accurately predicting potential diseases based on user-provided symptom descriptions. Such models hold significant potential for enabling early disease identification, allowing individuals to seek prompt medical attention and treatment. Furthermore, it supports the creation of applications for remote diagnosis and treatment recommendations, particularly useful in situations where in-person consultations may not be feasible or desirable.
Columns
The dataset consists of two main columns:
- label: This column contains the specific disease labels associated with each symptom description.
- text: This column provides the natural language descriptions of the symptoms experienced.
Distribution
The dataset is typically provided in a CSV file format. It comprises a total of 1200 datapoints. These datapoints are structured around 24 distinct diseases, with each disease having 50 corresponding symptom descriptions.
Usage
This dataset is ideal for various applications and use cases, including:
- Developing and training natural language processing (NLP) models for disease prediction.
- Creating AI-powered tools for early identification of health conditions.
- Building virtual assistants or telemedicine platforms that offer remote diagnostic support.
- Researching classification algorithms in the medical and healthcare domain.
- Analysing disease patterns and symptom correlations.
Coverage
The dataset's coverage is global, making it suitable for a wide range of applications without regional limitations. It specifically includes 24 different diseases: Psoriasis, Varicose Veins, Typhoid, Chicken pox, Impetigo, Dengue, Fungal infection, Common Cold, Pneumonia, Dimorphic Hemorrhoids, Arthritis, Acne, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Jaundice, Malaria, urinary tract infection, allergy, gastroesophageal reflux disease, drug reaction, peptic ulcer disease, and diabetes. Information on specific time ranges or demographic scopes is not available in the provided details.
License
CCO
Who Can Use It
This dataset is intended for a variety of users, including:
- Data Scientists and Machine Learning Engineers: To build and refine models for medical diagnostics and NLP tasks.
- Healthcare Technology Developers: To integrate symptom analysis capabilities into healthcare applications and platforms.
- Researchers: To conduct studies on disease prediction, language understanding in a medical context, and the application of deep learning to health data.
- Students: As a valuable resource for learning and practicing data science and AI skills within the healthcare domain.
Dataset Name Suggestions
- Symptom2Disease Dataset
- Disease Symptom Classifier
- Medical Symptom Description Data
- Healthcare NLP Diagnostic Dataset
Attributes
Original Data Source: Symptom2Disease