ChEMBL Curated Kinase Bioactivity Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This registry contains detailed bioactivity, toxicity, and druglikeness information for small compounds that target tyrosine kinase proteins. The data set was meticulously curated from the ChEMBL database. It focuses on 28,314 unique small compounds and their interactions with 11 crucial tyrosine kinases, including ABL, EGFR, PDGFR, FGFR, MET, VEGFR, KIT, RET, JAK, ALK, and SRC. The resource offers essential data for developing predictive machine learning models in drug discovery and molecular biology.
Columns
The data set is comprised of 9 distinct columns, primarily focused on chemical properties and bioactivity measurements:
- molecule_chembl_id: The unique identifier for the molecule, sourced from the ChEMBL database.
- canonical_smiles: Standard representation of the molecule’s structure using SMILES strings.
- standard_value: The measured IC50 values, expressed in nanomolar (nM).
- class: A classification of the compound's bioactivity, categorised typically as 'active' (approximately 61%) or 'inactive' (approximately 24%).
- MW: The molecular weight of the compound.
- LogP: The water-octanol partition coefficient, indicating hydrophobicity.
- NumHDonors: The calculated number of hydrogen bond donors present in the molecule.
- NumHAcceptors: The calculated number of hydrogen bond acceptors present in the molecule.
- pIC50: The calculated pIC50 values.
Distribution
The primary data file, named
Kinase_final_data.csv, occupies 2.44 MB and contains 9 columns of data. While the registry details 28,314 small compounds, the curated records provide approximately 21,400 valid entries used for classification and modelling. The structure is fixed and is not expected to receive future updates.Usage
This registry is an ideal resource for computational chemistry and bioinformatics applications. Key uses include:
- Training machine learning and deep learning models to predict the bioactivity (IC50 or pIC50) of novel small compounds.
- Classifying whether a small molecule is likely to be 'active' or 'inactive' against specific tyrosine kinase proteins.
- Analysing structure-activity relationships (SAR) based on molecular properties like LogP and Molecular Weight.
- Developing tools for drug-likeness and toxicity assessment in early-stage drug discovery.
Coverage
The data covers small compounds curated from the ChEMBL database. The scope is defined by the interaction with 11 major tyrosine kinases: ABL (1970 compounds), EGFR (9508 compounds), PDGFR (1750 compounds), FGFR (3156 compounds), MET (3950 compounds), VEGFR (1211 compounds), KIT (1692 compounds), RET (896 compounds), JAK (4203 compounds), ALK (2087 compounds), and SRC (3846 compounds). The coverage is chemical and biological in nature, focusing solely on bioactivity data.
License
CC BY-NC-SA 4.0
Who Can Use It
- Researchers and Academics: For biological research, particularly in cancer and signalling pathways.
- Data Scientists/Machine Learning Engineers: To build predictive models for drug property estimation.
- Chemists and Drug Developers: For analysing ligand efficiency and screening potential drug candidates.
Dataset Name Suggestions
- Tyrosine Kinase Ligand Bioactivity Database
- Small Molecule Kinase Inhibitor Registry
- ChEMBL Curated Kinase Bioactivity Data
- Predictive Tyrosine Kinase Bioactivity
Attributes
Original Data Source: ChEMBL Curated Kinase Bioactivity Data
Loading...
