ME/CFS vs Depression Differential Diagnosis Data
Mental Health & Wellness
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A synthetic dataset designed specifically for differential diagnosis between Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Depression. It supports classification tasks by presenting behavioural, clinical, and symptomatic features related to both chronic illness and mental health. The primary objective is to predict whether a patient is classified as having ME/CFS, Depression, or both conditions. This data is structured using clinical-like heuristics, making it highly valuable for training machine learning models that aim to support real-world diagnostic decisions.
Columns
The dataset contains 16 features detailing patient characteristics and health measures:
- age: The patient's age in years, ranging from 18 to 70.
- gender: Categorical variable (Male or Female).
- fatigue_severity_scale_score: The score derived from the Fatigue Severity Scale (FSS), ranging from 0 to 10.
- depression_phq9_score: The PHQ-9 depression score, ranging from 0 to 27.
- pem_present: A binary indicator (Yes/No or 1/0) indicating the presence of Post-Exertional Malaise (PEM).
- pem_duration_hours: The duration of PEM measured in hours.
- sleep_quality_index: Sleep quality measured on a scale of 1 to 10.
- brain_fog_level: Level of brain fog experienced, measured on a scale of 1 to 10.
- physical_pain_score: Intensity of physical pain, measured on a scale of 1 to 10.
- stress_level: The patient's reported stress level, measured on a scale of 1 to 10.
- work_status: Categorical variable describing work status (Working, Partially working, or Not working).
- social_activity_level: Social activity level, ranging from Very low to Very high.
- exercise_frequency: Reported frequency of exercise, ranging from Never to Daily.
- meditation_or_mindfulness: Boolean indicating whether the patient practices mindfulness or meditation (Yes/No).
- hours_of_sleep_per_night: The average duration of sleep per night in hours.
- diagnosis: The target variable, indicating the classified condition (ME/CFS, Depression, or Both).
Distribution
The data is structured as a CSV file utilising UTF-8 encoding. It currently contains approximately 1,000 records. Importantly, the dataset simulates real-world challenges by including missing values (NaN) in most features, typically affecting between 1% and 5% of records. Additionally, controlled noise has been introduced into all numeric features to prevent easy class separation.
Usage
Ideal applications for this dataset include machine learning tasks focused on complex health conditions, such as:
- Performing Binary classification between ME/CFS and Depression.
- Implementing Multiclass classification to distinguish between ME/CFS, Depression, and Both conditions.
- Practising Exploratory Data Analysis (EDA) and feature engineering techniques.
- Testing missing data imputation methodologies.
- Developing and interpreting medical machine learning models.
Coverage
The patient demographic scope spans an age range of 18 to 70 years. Gender distribution is nearly balanced. As a synthetic dataset, it focuses entirely on health parameters associated with chronic fatigue and mental health symptoms, rather than specific geographic or temporal data ranges.
License
CC BY-NC-SA 4.0
Who Can Use It
This dataset is particularly valuable for machine learning beginners and academic researchers. It serves as an excellent resource for anyone exploring classification in mental and chronic health fields, especially those seeking practice with noisy and incomplete data simulation.
Dataset Name Suggestions
- ME/CFS vs Depression Differential Diagnosis Data
- Chronic Fatigue Syndrome & Mental Health Classification
- Synthetic Differential Diagnosis Data
Attributes
Original Data Source: ME/CFS vs Depression Differential Diagnosis Data
Loading...
