Opendatabay APP

Ribonanza External Mapping Data

Synthetic Data Generation

Tags and Keywords

Rna

Biology

Mapping

Ribonanza

Chemical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Ribonanza External Mapping Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection features 143,200 RNA chemical mapping profiles drawn from the RNA Mapping DataBase (RMDB) at rmdb.stanford.edu. The data was processed and made available on 9 November 2023. These profiles are highly suitable for use as external data, particularly for competitions such as the Stanford Ribonanza event, utilizing the standard Ribonanza .csv format. This resource provides detailed one-dimensional chemical mapping information for numerous RNA sequences.

Columns

The dataset contains 873 columns in total, with core structural and measurement data captured in fields such as:
  • sequence_id: An arbitrary hash identifier assigned to each RNA sequence (approximately 66.8k unique values).
  • sequence: The actual RNA sequence, featuring roughly 66.4k unique RNA strings.
  • experiment_type: Indicates the chemical probe used for generating the profile (e.g., 1M7, DMS_M2_seq). There are 12 unique types recorded, with 1M7 being the most frequent (51%).
  • dataset_name: The identifier for the original experimental dataset (e.g., ETERNA_R00_0002.rdat).
  • reads: The total number of Illumina NovaSeq sequencing reads associated with the mutational profile (Note: This column is 100% missing/null in the current files).
  • signal_to_noise: Calculated as the mean of the signal divided by the mean of the statistical error over probed positions. Values range from -21.4 to 640 (Mean 4.52).
  • SN_filter: A binary filter indicating if the signal-to-noise ratio is greater than 1.0.
  • reactivity_0001 to reactivity_N: Columns detailing the chemical reactivity observed at specific sequence positions (e.g., position 1, position 2, position 3, and so forth). Approximately 8% of records are missing values across the reactivity columns.

Distribution

The dataset is distributed in a standard CSV format and is contained within the file named rmdb_data.v1.3.0.csv. The file size is 278.44 MB. It includes 143,200 total profiles (records). The dataset structure involves 873 distinct columns.

Usage

This dataset is designed for specialized biological and machine learning applications focused on molecular structure. Ideal applications include:
  • Training models for the Stanford Ribonanza competition.
  • Investigating RNA secondary structure prediction.
  • Developing tools for analysing chemical mapping profiles.
  • Biotechnology research requiring large-scale validated RNA sequencing data.

Coverage

The data consists of profiles collected and subsequently wrangled on 9 November 2023. It focuses exclusively on RNA sequences and associated chemical mapping metrics. Specific geographic or demographic scope is not applicable to this biological dataset.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Bioinformaticians: For pipeline development and large-scale data analysis on RNA profiles.
  • Machine Learning Engineers: To train predictive models, especially those related to the Ribonanza structure prediction challenge.
  • Academic Researchers: For studying RNA biology and chemical probing methodology.

Dataset Name Suggestions

  • RMDB 2023 RNA Chemical Profiles
  • Ribonanza External Mapping Data
  • RNA Mapping DataBase Profiles v1.3.0
  • Stanford RMDB Sequencing Profiles

Attributes

Original Data Source: Ribonanza External Mapping Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

26/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format