Opendatabay APP

Pima Female Diabetes Dataset

Patient Health Records & Digital Health

Tags and Keywords

Diabetes

Prediction

Pima

Health

Medical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Pima Female Diabetes Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset aims to predict diabetes based on diagnostic measurements. It originates from the National Institute of Diabetes and Digestive and Kidney Diseases. A key constraint is that all included patients are females of Pima Indian heritage, aged at least 21 years old, living near Phoenix, Arizona, USA. The objective is to determine if a patient exhibits signs of diabetes according to World Health Organization criteria, specifically if their 2-hour post-load plasma glucose was at least 200 mg/dl.

Columns

  • Pregnancies: The number of times a patient has been pregnant.
  • Glucose: Plasma glucose concentration measured 2 hours into an oral glucose tolerance test.
  • BloodPressure: Diastolic blood pressure, recorded in mm Hg.
  • SkinThickness: Triceps skin fold thickness, measured in mm.
  • Insulin: 2-Hour serum insulin, expressed in mu U/ml.
  • BMI: Body mass index, calculated as weight in kg divided by height in metres squared.
  • DiabetesPedigreeFunction: A function that provides information on the likelihood of diabetes based on family history.
  • Age: The patient's age in years.
  • Outcome: The class variable, binary (0 or 1), where 1 indicates the patient tested positive for diabetes.
All attributes are numeric-valued, and the dataset includes instances with missing attribute values.

Distribution

The dataset contains 768 instances with 8 numeric attributes plus a class variable. The data file is typically in CSV format and has a size of 23.87 kB, consisting of 9 columns in total. The class distribution indicates that a class value of 1 signifies a positive test for diabetes. Data for attributes like Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, and DiabetesPedigreeFunction show a range of values, with varying distributions such as concentrations around means and specific ranges for counts or measurements.

Usage

This dataset is ideal for developing and evaluating machine learning models designed to predict the onset of diabetes. It has been historically used in research, such as applying adaptive learning algorithms to forecast diabetes mellitus. It is well-suited for classification tasks, pattern recognition, and building predictive analytics solutions in the healthcare domain.

Coverage

The data focuses on females of Pima Indian heritage, aged 21 years and older, residing near Phoenix, Arizona, USA. The dataset was received on 9 May 1990, and past research using this data dates back to 1988, indicating its temporal scope.

License

Public Domain (CC0)

Who Can Use It

This dataset is valuable for:
  • Researchers and data scientists focusing on medical diagnostics, public health, and predictive modelling of diseases like diabetes.
  • Machine learning practitioners looking for a well-known benchmark dataset for classification algorithms.
  • Academics and students studying healthcare analytics, statistical modelling, or specific health outcomes within defined demographics.

Dataset Name Suggestions

  • Pima Diabetes Prediction
  • Pima Indian Diabetes Health Study
  • Diabetes Patient Diagnostic Data
  • Arizona Diabetes Outcomes
  • Pima Female Diabetes Dataset

Attributes

Original Data Source: Pima Female Diabetes Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format