AIDS Healthcare Statistics Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for predicting AIDS virus infection in patients. It contains a collection of healthcare statistics and categorical information about individuals who have been diagnosed with AIDS. The dataset was initially published in 1996 and serves the purpose of classifying patients based on various attributes to determine if they are infected [1].
Columns
The dataset includes 23 columns, each detailing specific patient information:
- time: Time to failure or censoring [1, 2].
- trt: Treatment indicator, specifying the type of therapy received: 0 = ZDV only, 1 = ZDV + ddI, 2 = ZDV + Zal, 3 = ddI only [1-3].
- age: Age of the patient in years at baseline [1, 3, 4].
- wtkg: Patient's weight in kilograms at baseline [1, 4].
- hemo: Hemophilia status, where 0 = no and 1 = yes [1, 4, 5].
- homo: Homosexual activity indicator, 0 = no and 1 = yes [5, 6].
- drugs: History of intravenous (IV) drug use, 0 = no and 1 = yes [5, 6].
- karnof: Karnofsky score, indicating performance status on a scale of 0 to 100 [6, 7].
- oprior: Non-ZDV antiretroviral therapy received prior to the 175-day mark, 0 = no and 1 = yes [6, 7].
- z30: ZDV (Zidovudine) therapy received in the 30 days prior to the 175-day mark, 0 = no and 1 = yes [6-8].
- preanti: Number of days of pre-175 antiretroviral therapy [6, 8, 9].
- race: Patient's race, 0 = White and 1 = non-white [6, 9].
- gender: Patient's gender, 0 = Female and 1 = Male [6, 9].
- str2: Antiretroviral history, indicating if the patient is 0 = naive or 1 = experienced [6, 10].
- strat: Antiretroviral history stratification: 1 = 'Antiretroviral Naive', 2 = '> 1 but <= 52 weeks of prior antiretroviral therapy', 3 = '> 52 weeks' [6, 10].
- symptom: Symptomatic indicator, 0 = asymptomatic and 1 = symptomatic [11, 12].
- treat: General treatment indicator, 0 = ZDV only and 1 = others [11, 12].
- offtrt: Indicator of being off-treatment before 96+/-5 weeks, 0 = no and 1 = yes [11-13].
- cd40: CD4 cell count at baseline [11, 13].
- cd420: CD4 cell count at 20+/-5 weeks [11, 13, 14].
- cd80: CD8 cell count at baseline [11, 14, 15].
- cd820: CD8 cell count at 20+/-5 weeks [11, 15, 16].
- infected: The target variable, indicating if the patient is infected with AIDS, 0 = No and 1 = Yes [11, 16].
Distribution
The dataset is provided in a CSV file format (
AIDS_Classification.csv
) [2]. It contains 2139 valid records across all 23 columns, with no mismatched or missing values reported for any attribute [2-5, 7-10, 12-16]. The file size is 142.8 kB [2].Usage
This dataset is ideal for various applications, including:
- Binary classification tasks to predict AIDS virus infection [1, 17].
- Developing machine learning models for disease prediction in healthcare [17].
- Statistical analysis of patient demographics and medical history related to AIDS [1, 11].
- Research into the effectiveness of different antiretroviral treatments [1, 6, 11].
- Data visualization to explore patterns and relationships within healthcare statistics [17].
Coverage
The dataset covers patient information from the context of AIDS diagnosis and clinical trials, initially published in 1996 [1]. It includes demographic details such as age (12 to 70 years), gender (Female and Male), and race (White and non-white) [1, 3, 4, 6, 9]. Medical history components include hemophilia status and history of IV drug use [1, 5, 6]. Treatment history encompasses various ZDV-based therapies and prior antiretroviral use [1, 6, 11]. Lab results feature CD4 and CD8 counts at baseline and at 20+/-5 weeks [11]. The dataset does not specify a geographic scope.
License
CC0: Public Domain
Who Can Use It
- Data scientists and machine learning engineers: For building and evaluating classification models to predict AIDS infection [1, 17].
- Healthcare researchers: To study patient characteristics, treatment efficacy, and disease progression related to AIDS [1, 11].
- Students and beginners in data science: As an accessible dataset for learning binary classification and data analysis, tagged as 'Beginner' [17].
- Public health analysts: For understanding population-level health statistics related to AIDS.
Dataset Name Suggestions
- AIDS Patient Infection Predictor
- HIV/AIDS Clinical Trial Study 175 Data
- Patient AIDS Status Classification
- AIDS Healthcare Statistics Dataset
- ZDV Treatment Outcome Data
Attributes
Original Data Source: AIDS Healthcare Statistics Dataset