Opendatabay APP

Diabetes and Digestive Health Patient Data

Patient Health Records & Digital Health

Tags and Keywords

Diabetes

Predictive

Health

Classification

Pima

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Diabetes and Digestive Health Patient Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection of clinical data is designed for use in machine learning and predictive analytics, focusing on the diagnosis of diabetes. The primary objective is to predict whether a patient has diabetes based on a set of nine diagnostic measurements. The information originates from the National Institute of Diabetes and Digestive and Kidney Diseases, providing a focused view on health metrics for a specific cohort.

Columns

The dataset includes nine critical health indicators necessary for predictive modelling:
  • Pregnancies: Records the total number of times the patient has been pregnant.
  • Glucose: Plasma glucose concentration measured at 2 hours during an oral glucose tolerance test.
  • BloodPressure: Diastolic blood pressure, recorded in mm Hg.
  • SkinThickness: Triceps skin fold thickness, measured in mm.
  • Insulin: Serum insulin levels measured at 2 hours, recorded in mu U/ml.
  • BMI: Body mass index, calculated as weight in kg divided by height in metres squared.
  • DiabetesPedigreeFunction: A function that scores the risk of diabetes based on family history.
  • Age: The patient’s age in years.
  • Outcome: The class variable, coded as 0 (no diabetes) or 1 (diabetes).

Distribution

The data file, named diabetes.csv, is provided in CSV format. It contains 9 distinct columns. The structure includes 768 valid records. Data quality is excellent, with zero missing or mismatched values across all observed variables. The file size is 26.68 kB. The dataset is static, as its expected update frequency is listed as 'Never'.

Usage

This resource is ideally suited for various analytical tasks, including:
  • Developing and testing machine learning classification models to predict diabetes onset.
  • Conducting detailed exploratory data analysis and data cleaning exercises.
  • Model comparison, particularly for methods like Support Vector Machines (SVM).
  • Educational use in demonstrating binary classification problems in health informatics.

Coverage

The scope of this data is strictly defined by specific constraints placed during its selection. It focuses exclusively on females of Pima Indian heritage. Furthermore, all subjects included in the data are at least 21 years old, with ages ranging up to 81 years. The dataset does not contain specific temporal coverage or time-series data.

License

CC0: Public Domain

Who Can Use It

Intended users include data scientists building predictive health models, students learning classification techniques, and medical researchers investigating specific demographic risk factors associated with diabetes. Academics can utilise this as a well-defined case study for classification algorithm testing.

Dataset Name Suggestions

  • Pima Indian Diabetes Prediction Data
  • NIDDK Diabetes Diagnostic Measurements
  • Diabetes and Digestive Health Patient Data

Attributes

Listing Stats

VIEWS

6

DOWNLOADS

1

LISTED

13/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format