Opendatabay APP

Tox24 TTR Binding Prediction Data

Data Science and Analytics

Tags and Keywords

Ttr

Chemical

Binding

Prediction

Smiles

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Tox24 TTR Binding Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data relates to the Tox24 Challenge, specifically focusing on the prediction of chemical binding activity to the target protein Transthyretin (TTR). This collection offers a practical, real-world example of how machine learning can be employed to forecast chemical activity against a specific biological target. The data includes various processed representations of the SMILES notation for 1512 competition chemicals. The materials also incorporate supplemental tables taken from the article accompanying the challenge, detailing assay reaction components, lists of autofluorescent chemicals, and chemicals excluded from the analysis due to interference.

Columns

The dataset contains eleven columns detailing chemical identity and structural representations derived from SMILES:
  • dataset: Defines the designated subset for modelling purposes (e.g., training, blind test, or other).
  • Chemical: The systematic name of the chemical compound (with 1512 unique values).
  • activity: The primary label, representing the median percentage activity recorded against TTR.
  • pubchem_smiles: The SMILES notation retrieved directly from PubChem.
  • alogps_smiles: The initial SMILES representation.
  • pubchem_smiles_cleaned: The cleaned version of the PubChem SMILES.
  • alogps_smiles_cleaned: The cleaned version of the initial SMILES.
  • pubchem_smiles_no_iso_atoms: Cleaned SMILES with isolated atoms removed.
  • pubchem_smiles_no_salts: Cleaned SMILES with salts removed.
  • pubchem_smiles_no_iso_atoms_and_dup: Cleaned SMILES with isolated atoms and duplicate fragments removed.
  • alogps_smiles_no_salts: SMILES notation with salts removed.

Distribution

The core data is contained within the all_smiles_data.csv file, which is approximately 434.41 kB in size. This file is structured as a tabular dataset with 1512 valid records across 11 columns. The data is partitioned for model development, with 67% designated for training, 20% for the blind test set, and 13% classified as 'other' (200 records). It should be noted that the target variable, activity, has 300 missing values, accounting for 20% of the total observations.

Usage

This collection is ideally suited for several advanced scientific and technical applications, including:
  • Developing robust machine learning models to predict chemical binding activity, often leveraging algorithms such as XGBoost and LightGBM.
  • Supporting fundamental drug design research and the identification of lead compounds.
  • Conducting detailed studies on protein-ligand interactions.
  • Evaluating the predictive performance of different molecular descriptors derived from various SMILES representations.

Coverage

The scope of this data is strictly limited to the chemical compounds and their measured binding responses screened during the Tox24 Challenge. This includes chemicals screened using both single concentration and concentration response testing methods. The data focuses solely on chemical properties and toxicological responses related to TTR binding, and therefore contains no explicit geographical or demographic dimensions. The data is static, with an expected update frequency listed as 'Never'.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Cheminformatics Researchers: For applying computational methods to analyse chemical structure data.
  • Toxicologists and Pharmacologists: Individuals involved in studying toxicity prediction and drug efficacy related to protein binding.
  • Data Scientists and Machine Learning Engineers: Professionals building predictive models for biological activity and chemical property forecasting.

Dataset Name Suggestions

  • Tox24 TTR Binding Prediction Data
  • Transthyretin Chemical Activity Dataset
  • SMILES Representations for TTR Modelling

Attributes

Original Data Source: Tox24 TTR Binding Prediction Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

11/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format