Opendatabay APP

Alpha Thalassemia Carrier Status Classification

Public Health & Epidemiology

Tags and Keywords

Thalassemia

Carrier

Hematology

Screening

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Alpha Thalassemia Carrier Status Classification Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A collection of clinical and haematological measurements compiled to facilitate the prediction of Alpha Thalassemia carrier status. The data supports the development of Machine Learning models intended to function as crucial decision-support tools, enhancing the effectiveness of carrier screening programs, particularly in resource-constrained environments. This specific dataset is curated for the binary classification task of differentiating between individuals who are normal and those who are alpha thalassemia carriers.

Columns

The dataset contains 16 variables, including 15 independent continuous or categorical variables and one target variable:
  • Sex: Categorical variable detailing 'male' or 'female'.
  • hb: Hemoglobin concentration, measured in grams per decilitre (g/dL).
  • pcv: Pack cell volume/hematocrit, presented as a percentage (%).
  • rbc: Red blood cell volume, reported in 10^12/L.
  • mcv: Mean cell volume, measured in femtolitres (fl).
  • mch: Mean corpuscular hemoglobin, measured in picograms (pg).
  • mchc: Mean corpuscular hemoglobin concentration, measured in g/dL.
  • rdw: Red blood cell distribution width, presented as a percentage (%).
  • wbc: Total white blood cell count, reported in 10^6/L.
  • neut: Percentage of neutrophil white blood cell type.
  • lymph: Percentage of lymphocyte white blood cell type.
  • plt: Total platelet count, reported in 10^6/L.
  • hba, hba2, hbf: Hemoglobin A, A2, and F percentages, derived from High-Performance Liquid Chromatography (HPLC) testing.
  • phenotype (Target): The binary categorical target variable, classifying individuals as 'alpha carrier' or 'normal'.

Distribution

The data is typically structured in a standard file format, such as CSV. The alphanorm.csv file comprises 16 fields and holds 203 valid records for most variables. The variables, excluding Sex and Phenotype, are generally continuous floating-point values. One record is missing data for the rbc variable, and two records are missing data for mch.

Usage

This data is ideal for several analytical and development purposes, including:
  • Developing and training Machine Learning algorithms for diagnostic pipelines that differentiate Thalassemia Carrier states.
  • Creating easy-to-deploy decision-support systems for clinical diagnosis based on full blood count parameters.
  • Statistical modelling of the relationship between haematological indicators and Alpha Thalassemia carrier status.
  • Evaluating the efficacy and accessibility of Thalassemia screening methodologies.

Coverage

The data collection originates from the Human Genetics Unit (HGU) of the Faculty of Medicine in Colombo, Sri Lanka. The cases were screened from 2016 through to 2020. The subjects involved were alpha thalassemia carrier children and their family members.

License

Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Who Can Use It

  • Medical Researchers: For studying the prevalence and diagnostic biomarkers of Alpha Thalassemia.
  • Data Scientists and ML Practitioners: For building and validating robust classification models for health conditions.
  • Public Health Authorities: To design improved, cost-effective screening strategies.
  • Biomedical Engineers: For integrating diagnostic models into healthcare technology.

Dataset Name Suggestions

  • Alpha Thalassemia Carrier Status Classification
  • Haematology and Thalassemia Screening Data
  • Colombo HGU Alpha Thalassemia Cases
  • ML Model Training for Thalassemia Diagnosis

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

18/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format