Pima Female Diabetes Health Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset aims to predict diabetes based on diagnostic measurements. It originates from the National Institute of Diabetes and Digestive and Kidney Diseases and focuses on female patients, aged at least 21 years old, of Pima Indian heritage. It serves as a valuable resource for developing predictive models for diabetes.
Columns
- Pregnancies: The number of times pregnant.
- Glucose: Plasma glucose concentration measured 2 hours into an oral glucose tolerance test.
- BloodPressure: Diastolic blood pressure, recorded in mm Hg.
- SkinThickness: Triceps skin fold thickness, in mm.
- Insulin: 2-Hour serum insulin, in mu U/ml.
- BMI: Body mass index, calculated as weight in kg/(height in m)^2.
- DiabetesPedigreeFunction: A function that quantifies the likelihood of diabetes based on family history.
- Age: The patient's age in years.
- Outcome: The class variable, indicating whether the patient has diabetes (1) or not (0).
Distribution
The dataset is provided in a CSV format, specifically
diabetes.csv
, with a file size of 23.87 kB. It contains 768 instances (rows) and 9 columns, consisting of 8 diagnostic attributes and one class variable. All attribute values are numeric. No missing values are present in the provided column details. The class distribution shows 500 instances for Outcome 0 (no diabetes) and 268 instances for Outcome 1 (diabetes).Usage
This dataset is ideal for various applications, including:
- Developing and evaluating machine learning models for diabetes prediction.
- Research into diagnostic measurements and their correlation with diabetes.
- Educational purposes in data science and medical informatics.
- Projects involving deep learning techniques for health outcome prediction.
Coverage
The dataset's demographic scope is specific to females aged 21 years or older of Pima Indian heritage. The data was received on 9 May 1990. No geographic coverage is explicitly detailed, and the dataset is not expected to be updated.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers building predictive models for health conditions.
- Medical researchers exploring diabetes risk factors and diagnostic indicators.
- Students and academics learning about classification algorithms and health analytics.
Dataset Name Suggestions
- Pima Indian Diabetes Prediction Dataset
- Diabetes Disease Prediction Dataset
- Pima Female Diabetes Health Data
- Indigenous Diabetes Risk Factors
Attributes
Original Data Source: Pima Female Diabetes Health Data