TCGA Brain Tumour Grading Molecular Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This resource focuses on the classification of gliomas, which are recognised as the most common primary tumours of the brain. The data facilitates the differentiation of these tumours into Lower-Grade Gliomas (LGG) and Glioblastoma Multiforme (GBM). Classification relies on a combination of histological, imaging, clinical, and molecular factors. The primary intent is to aid the development of predictive models capable of accurately determining whether a patient presents with LGG or GBM. Utilisation of this data helps in identifying the optimal subset of mutation genes and clinical features needed to enhance the accuracy of glioma grading, which may ultimately assist in reducing the costs associated with molecular tests.
Columns
The data includes 24 fields, consisting of 20 molecular features (categorised as either mutated or not_mutated/wildtype) and 3 clinical features related to patient demographics.
- Grade: The classification variable indicating the glioma grade (0 represents LGG; 1 represents GBM).
- Gender: Patient gender (0 is male; 1 is female).
- Age_at_diagnosis: The patient's age in years at the time of diagnosis.
- Race: Patient race information (0 = white; 1 = black or African American; 2 = asian; 3 = American Indian or Alaska Native).
- Molecular Features (Examples): Features like IDH1, TP53, ATRX, PTEN, and EGFR are included, representing frequently mutated genes. All molecular features are binary, indicating whether the gene is MUTATED (1) or NOT_MUTATED (0).
Distribution
The data is provided as a CSV file, named
clinical_glioma_grading.csv, and is 43.7 kB in size. The structure contains 839 validated patient records (instances) across 24 distinct columns. All records are valid, with no missing or mismatched values reported for the key features.Usage
Ideal applications include developing binary classification models to predict tumour grade. It is suitable for research aiming to improve diagnostic accuracy by leveraging clinical and genomic data. It can be used for selecting the most informative set of clinical and mutation features for diagnostic prediction.
Coverage
The data instances represent patient records collected from the TCGA-LGG and TCGA-GBM brain glioma projects. Demographic scope includes gender, age (ranging from approximately 14.4 to 89.3 years at diagnosis), and race. The underlying funding source for the creation of the original resource was The Cancer Genome Atlas (TCGA) Project, supported by the National Cancer Institute (NCI).
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This resource is valuable for data scientists and machine learning engineers focused on biomedical applications, researchers in oncology and genomics seeking to correlate molecular features with tumour progression, and bioinformatics specialists developing clinical decision support tools for brain tumours.
Dataset Name Suggestions
- TCGA Brain Tumour Grading Molecular Data
- Clinical Genomics for Glioma Classification
- LGG vs GBM Prediction Dataset
Attributes
Loading...
