Opendatabay APP

Healthcare No Shows Appointments Dataset

Mental Health & Wellness

Tags and Keywords

Health

Appointment

Noshow

Prediction

Medical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Healthcare No Shows Appointments Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The data contains variables tracking various patient characteristics and scheduling factors intended to model appointment non-attendance. With 107,000 rows and 15 distinct columns, it provides a substantial foundation for predictive analytics in public health and operational healthcare management. The key focus is the target variable: Showed_up.

Columns

The dataset includes the following 15 variables:
  • PatientId: A unique identifier for the patient.
  • AppointmentID: A unique identifier assigned to the specific appointment.
  • Gender: Indicates the patient's sex (Female is the most common, accounting for 66% of records).
  • ScheduledDay: The date the appointment was scheduled.
  • AppointmentDay: The actual date of the scheduled appointment.
  • Age: The patient's age, ranging from 1 to 115 years, with a mean of 38.3.
  • Neighbourhood: The location of the appointment (81 unique values, with JARDIM CAMBURI being the most frequent at 7%).
  • Scholarship: Boolean indicating if the patient is enrolled in the Brasilian welfare program (True for 10% of records).
  • Hipertension: Boolean indicating if the patient suffers from hypertension (True for 20% of records).
  • Diabetes: Boolean indicating if the patient suffers from diabetes (True for 7% of records).
  • Alcoholism: Boolean indicating if the patient is an alcoholic (True for 3% of records).
  • Handcap: Boolean indicating if the patient has a physical disability (True for 2% of records).
  • SMS_received: Boolean indicating if the patient received a reminder SMS (True for 32% of records).
  • Showed_up: The target variable, indicating if the patient attended the appointment (True for 80%, False for 20%).
  • Date.diff: The difference in days between the scheduled date and the appointment date, ranging from -6 to 179 days.

Distribution

The data consists of 107,000 records across 15 columns. The data file is available as healthcare_noshows_appt.csv and is approximately 11.64 MB in size. All fields across all records are verified as valid with zero missing values or mismatches.

Usage

This resource is ideally suited for tasks involving binary classification, specifically machine learning model training aimed at forecasting appointment attendance. It enables healthcare providers and analysts to identify the key factors contributing to patient non-attendance, allowing for targeted intervention strategies.

Coverage

The records cover a specific time span: Scheduled dates range from 10 November 2015 to 8 June 2016, while appointment dates range from 29 April 2016 to 8 June 2016. Demographic scope includes detailed age distribution (1 to 115) and binary health indicators (e.g., Hipertension, Diabetes). Geographic scope is based on 81 unique neighbourhood categories.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists/Machine Learning Engineers: To build, evaluate, and deploy predictive models for no-show risk assessment.
  • Healthcare Administrators: To analyse operational efficiency and determine resource allocation based on predicted attendance rates.
  • Public Health Researchers: To study patient behaviour, socio-economic barriers (e.g., Scholarship status), and health factors impacting access to care.

Dataset Name Suggestions

  • Healthcare No Shows Appointments Dataset
  • Healthcare Appointment No Shows Dataset
  • Medical Appointment Attendance Predictor
  • Patient No-Show Prediction Data

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

10/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format