Alpha Thalassemia Carrier Status Classification
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A collection of clinical and haematological measurements compiled to facilitate the prediction of Alpha Thalassemia carrier status. The data supports the development of Machine Learning models intended to function as crucial decision-support tools, enhancing the effectiveness of carrier screening programs, particularly in resource-constrained environments. This specific dataset is curated for the binary classification task of differentiating between individuals who are normal and those who are alpha thalassemia carriers.
Columns
The dataset contains 16 variables, including 15 independent continuous or categorical variables and one target variable:
- Sex: Categorical variable detailing 'male' or 'female'.
- hb: Hemoglobin concentration, measured in grams per decilitre (g/dL).
- pcv: Pack cell volume/hematocrit, presented as a percentage (%).
- rbc: Red blood cell volume, reported in 10^12/L.
- mcv: Mean cell volume, measured in femtolitres (fl).
- mch: Mean corpuscular hemoglobin, measured in picograms (pg).
- mchc: Mean corpuscular hemoglobin concentration, measured in g/dL.
- rdw: Red blood cell distribution width, presented as a percentage (%).
- wbc: Total white blood cell count, reported in 10^6/L.
- neut: Percentage of neutrophil white blood cell type.
- lymph: Percentage of lymphocyte white blood cell type.
- plt: Total platelet count, reported in 10^6/L.
- hba, hba2, hbf: Hemoglobin A, A2, and F percentages, derived from High-Performance Liquid Chromatography (HPLC) testing.
- phenotype (Target): The binary categorical target variable, classifying individuals as 'alpha carrier' or 'normal'.
Distribution
The data is typically structured in a standard file format, such as CSV. The alphanorm.csv file comprises 16 fields and holds 203 valid records for most variables. The variables, excluding Sex and Phenotype, are generally continuous floating-point values. One record is missing data for the rbc variable, and two records are missing data for mch.
Usage
This data is ideal for several analytical and development purposes, including:
- Developing and training Machine Learning algorithms for diagnostic pipelines that differentiate Thalassemia Carrier states.
- Creating easy-to-deploy decision-support systems for clinical diagnosis based on full blood count parameters.
- Statistical modelling of the relationship between haematological indicators and Alpha Thalassemia carrier status.
- Evaluating the efficacy and accessibility of Thalassemia screening methodologies.
Coverage
The data collection originates from the Human Genetics Unit (HGU) of the Faculty of Medicine in Colombo, Sri Lanka. The cases were screened from 2016 through to 2020. The subjects involved were alpha thalassemia carrier children and their family members.
License
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Who Can Use It
- Medical Researchers: For studying the prevalence and diagnostic biomarkers of Alpha Thalassemia.
- Data Scientists and ML Practitioners: For building and validating robust classification models for health conditions.
- Public Health Authorities: To design improved, cost-effective screening strategies.
- Biomedical Engineers: For integrating diagnostic models into healthcare technology.
Dataset Name Suggestions
- Alpha Thalassemia Carrier Status Classification
- Haematology and Thalassemia Screening Data
- Colombo HGU Alpha Thalassemia Cases
- ML Model Training for Thalassemia Diagnosis
Attributes
Original Data Source: Alpha Thalassemia Carrier Status Classification
Loading...
