Disease Symptom Predictor Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The construction and evaluation of statistical models aimed at predicting disease diagnoses. It features a robust collection of symptoms and their associated clinical outcomes (prognoses). The structure is optimal for data scientists and researchers focused on leveraging machine learning for improved disease diagnosis and health outcome forecasting.
Columns
The dataset includes 133 columns in total: 132 binary symptom features and one target variable. Symptom columns use binary encoding (1 for presence, 0 for absence).
- Prognosis: The final target variable, detailing the predicted disease (e.g., Fungal infection, Allergy).
- itching: Indicates the presence (1) or absence (0) of itching.
- skin_rash: Indicates the presence (1) or absence (0) of a skin rash.
- nodal_skin_eruptions: Indicates the presence (1) or absence (0) of nodal skin eruptions.
- continuous_sneezing: Indicates the presence (1) or absence (0) of continuous sneezing.
- chills: Indicates the presence (1) or absence (0) of chills.
- joint_pain: Indicates the presence (1) or absence (0) of joint pain.
- (Additional symptoms include stomach_pain, acidity, ulcers_on_tongue, high_fever, muscle_wasting, and many others, totalling 132 symptom indicators.)
Distribution
The data is designed to be large, supporting robust analysis, and is usually shared in a CSV file format. Sample analysis shows that record quality is high, with 100% validity observed across analysed symptom features. The sample file size is 13.78 kB. There are currently no expected future updates to this dataset.
Usage
- Training and testing machine learning classifiers for disease prediction.
- Developing predictive diagnostic tools that map symptom patterns to diseases.
- Statistical analysis of symptom co-occurrence and disease manifestation.
- Educational purposes focused on classification problems using highly dimensional, sparse binary data.
Coverage
Specific geographic range, time period, or demographic information regarding the collection of these symptom records is not provided.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: Utilising the binary features to build high-accuracy predictive models for health applications.
- Health Informaticians: Studying the relationship between reported symptoms and established prognoses.
- Academic Researchers: Validating new methodologies for feature selection and classification in medical contexts.
Dataset Name Suggestions
- Disease Symptom Predictor Data
- Clinical Prognosis Symptom Features
- Binary Health Condition Model Input
- Predicting Diseases Based on Symptoms
Attributes
Original Data Source: Disease Symptom Predictor Data
Loading...
