Half-life versus Length Proteomics Data
Synthetic Biology & Genetic Engineering
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Explores the critical relationship between protein half-lives and their molecular length. A better understanding of how proteins are maintained, known as proteostasis, is required in the context of health and disease, and this data helps facilitate that study. The fundamental goal is to determine if a meaningful correlation exists between the measured half-lives of proteins and their physical length, defined by the total amount of amino acids. The initial half-life measurements are sourced from a leading scientific publication and have been enriched with verified protein length information derived from the Uniprot database.
Columns
The data file contains 32 distinct columns.
- gene_name: The identifier for the protein.
- Length: The physical size of the protein, quantified as the total amount of amino acids.
- Half-life columns (e.g., Bcells replicate 1 half_life): The measured stability of that protein, expressed as half-life in hours, recorded across various non-dividing cell types (including Bcells, NK cells, Hepatocytes, Monocytes, and Mouse Neurons) and their respective replicates.
- dataQual columns (e.g., Bcells replicate 1 dataQual): Indicates the quality of the half-life measurement. Quality is marked as ‘good’ if protein fold changes across three out of four time points were based on a minimum of three quantified peptides. It is marked as ‘weak’ if a fold change could be determined in at least three out of the four time points.
- R_sq columns (e.g., Bcells replicate 1 R_sq): The Coefficient of determination, calculated by the original authors between the four log10 transformed half-lives measured in the four different human cell types.
Distribution
The data is provided in a standard tabular format within a CSV file (
data.csv), which is 1.19 MB in size. The dataset includes 8571 total records. While the gene_name and Length columns are fully populated (100% valid), the various half-life and quality measurement columns contain a high volume of missing values. The percentage of missing data varies depending on the specific cell replicate measured, ranging from a low of 29% (Mouse Neurons replicate 4) to a high of 70% (NK cells replicate 1).Usage
The data is ideal for quantitative statistical analysis aimed at establishing the connection between molecular structure (length) and biological stability (half-life). It can be utilised for biological modelling, hypothesis testing related to the mechanisms governing protein turnover, and exploratory research in biophysics.
Coverage
The data focuses on protein measurements within specific non-dividing cell environments, including several human cell types (Bcells, Natural Killer (NK) cells, Hepatocytes, and Monocytes), as well as Mouse Neurons. The data is derived from the latest available publication concerning protein half-lives.
License
CC0: Public Domain
Who Can Use It
- Biochemists: Investigating cellular mechanisms of protein degradation and stability (proteostasis).
- Data Scientists/Statisticians: Performing regression analysis to correlate protein attributes across varied biological replicates.
- Academics and Students: Utilizing foundational proteomics data for educational projects and research in life sciences.
Dataset Name Suggestions
- Protein Kinetic and Structural Metrics
- Human Cell Protein Turnover Metrics
- Half-life versus Length Proteomics Data
Attributes
Original Data Source: Half-life versus Length Proteomics Data
Loading...
Free
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
