Opendatabay APP

Half-life versus Length Proteomics Data

Synthetic Biology & Genetic Engineering

Tags and Keywords

Protein

Half-life

Biology

Proteostasis

Length

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Half-life versus Length Proteomics Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Explores the critical relationship between protein half-lives and their molecular length. A better understanding of how proteins are maintained, known as proteostasis, is required in the context of health and disease, and this data helps facilitate that study. The fundamental goal is to determine if a meaningful correlation exists between the measured half-lives of proteins and their physical length, defined by the total amount of amino acids. The initial half-life measurements are sourced from a leading scientific publication and have been enriched with verified protein length information derived from the Uniprot database.

Columns

The data file contains 32 distinct columns.
  • gene_name: The identifier for the protein.
  • Length: The physical size of the protein, quantified as the total amount of amino acids.
  • Half-life columns (e.g., Bcells replicate 1 half_life): The measured stability of that protein, expressed as half-life in hours, recorded across various non-dividing cell types (including Bcells, NK cells, Hepatocytes, Monocytes, and Mouse Neurons) and their respective replicates.
  • dataQual columns (e.g., Bcells replicate 1 dataQual): Indicates the quality of the half-life measurement. Quality is marked as ‘good’ if protein fold changes across three out of four time points were based on a minimum of three quantified peptides. It is marked as ‘weak’ if a fold change could be determined in at least three out of the four time points.
  • R_sq columns (e.g., Bcells replicate 1 R_sq): The Coefficient of determination, calculated by the original authors between the four log10 transformed half-lives measured in the four different human cell types.

Distribution

The data is provided in a standard tabular format within a CSV file (data.csv), which is 1.19 MB in size. The dataset includes 8571 total records. While the gene_name and Length columns are fully populated (100% valid), the various half-life and quality measurement columns contain a high volume of missing values. The percentage of missing data varies depending on the specific cell replicate measured, ranging from a low of 29% (Mouse Neurons replicate 4) to a high of 70% (NK cells replicate 1).

Usage

The data is ideal for quantitative statistical analysis aimed at establishing the connection between molecular structure (length) and biological stability (half-life). It can be utilised for biological modelling, hypothesis testing related to the mechanisms governing protein turnover, and exploratory research in biophysics.

Coverage

The data focuses on protein measurements within specific non-dividing cell environments, including several human cell types (Bcells, Natural Killer (NK) cells, Hepatocytes, and Monocytes), as well as Mouse Neurons. The data is derived from the latest available publication concerning protein half-lives.

License

CC0: Public Domain

Who Can Use It

  • Biochemists: Investigating cellular mechanisms of protein degradation and stability (proteostasis).
  • Data Scientists/Statisticians: Performing regression analysis to correlate protein attributes across varied biological replicates.
  • Academics and Students: Utilizing foundational proteomics data for educational projects and research in life sciences.

Dataset Name Suggestions

  1. Protein Kinetic and Structural Metrics
  2. Human Cell Protein Turnover Metrics
  3. Half-life versus Length Proteomics Data

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

02/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format