Opendatabay APP

TCGA Brain Tumour Grading Molecular Data

Patient Health Records & Digital Health

Tags and Keywords

Glioma

Cancer

Molecular

Brain

Tcga

Trusted By
Trusted by company1Trusted by company2Trusted by company3
TCGA Brain Tumour Grading Molecular Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This resource focuses on the classification of gliomas, which are recognised as the most common primary tumours of the brain. The data facilitates the differentiation of these tumours into Lower-Grade Gliomas (LGG) and Glioblastoma Multiforme (GBM). Classification relies on a combination of histological, imaging, clinical, and molecular factors. The primary intent is to aid the development of predictive models capable of accurately determining whether a patient presents with LGG or GBM. Utilisation of this data helps in identifying the optimal subset of mutation genes and clinical features needed to enhance the accuracy of glioma grading, which may ultimately assist in reducing the costs associated with molecular tests.

Columns

The data includes 24 fields, consisting of 20 molecular features (categorised as either mutated or not_mutated/wildtype) and 3 clinical features related to patient demographics.
  • Grade: The classification variable indicating the glioma grade (0 represents LGG; 1 represents GBM).
  • Gender: Patient gender (0 is male; 1 is female).
  • Age_at_diagnosis: The patient's age in years at the time of diagnosis.
  • Race: Patient race information (0 = white; 1 = black or African American; 2 = asian; 3 = American Indian or Alaska Native).
  • Molecular Features (Examples): Features like IDH1, TP53, ATRX, PTEN, and EGFR are included, representing frequently mutated genes. All molecular features are binary, indicating whether the gene is MUTATED (1) or NOT_MUTATED (0).

Distribution

The data is provided as a CSV file, named clinical_glioma_grading.csv, and is 43.7 kB in size. The structure contains 839 validated patient records (instances) across 24 distinct columns. All records are valid, with no missing or mismatched values reported for the key features.

Usage

Ideal applications include developing binary classification models to predict tumour grade. It is suitable for research aiming to improve diagnostic accuracy by leveraging clinical and genomic data. It can be used for selecting the most informative set of clinical and mutation features for diagnostic prediction.

Coverage

The data instances represent patient records collected from the TCGA-LGG and TCGA-GBM brain glioma projects. Demographic scope includes gender, age (ranging from approximately 14.4 to 89.3 years at diagnosis), and race. The underlying funding source for the creation of the original resource was The Cancer Genome Atlas (TCGA) Project, supported by the National Cancer Institute (NCI).

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This resource is valuable for data scientists and machine learning engineers focused on biomedical applications, researchers in oncology and genomics seeking to correlate molecular features with tumour progression, and bioinformatics specialists developing clinical decision support tools for brain tumours.

Dataset Name Suggestions

  • TCGA Brain Tumour Grading Molecular Data
  • Clinical Genomics for Glioma Classification
  • LGG vs GBM Prediction Dataset

Attributes

Original Data Source:

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

02/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format