Opendatabay APP

Extended Crab Age Regression Data

Synthetic Data Generation

Tags and Keywords

Crab

Age

Prediction

Regression

Tabular

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Extended Crab Age Regression Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Contains synthetically generated data with 200,000 additional observations for the Crab Age Prediction dataset. This tabular data is primarily intended for use in regression and deep learning models to predict the age of crabs based on various physical measurements. It was created for a Kaggle Playground competition (Season 3 Episode 16) and is recommended for training purposes rather than validation to avoid overly optimistic scores. The synthetic nature of the data may result in some noise, such as numerical values in the 'Sex' column or zero values for height.

Columns

  • id: A unique identifier for each crab observation.
  • Sex: The gender of the crab, categorised as Male (M), Female, and Indeterminate (I).
  • Length: The length of the crab, measured in feet.
  • Diameter: The diameter of the crab, measured in feet.
  • Height: The height of the crab, measured in feet.
  • Weight: The total weight of the crab, measured in ounces.
  • Shucked Weight: The weight of the crab's meat after being removed from the shell, measured in ounces.
  • Viscera Weight: The weight of the internal organs in the abdominal cavity, measured in ounces.
  • Shell Weight: The weight of the crab's shell, measured in ounces.
  • Age: The age of the crab, measured in months.

Distribution

The dataset is provided in a single CSV file (train_extended.csv) with a size of 14.58 MB. It consists of 200,000 rows and 10 columns. The data is structured in a standard tabular format.

Usage

  • Predictive Modelling: Ideal for building and training regression models to predict the age of crabs from their physical characteristics.
  • Machine Learning Competitions: Can be used to supplement training data for machine learning challenges, specifically those related to age prediction.
  • Data Augmentation: Serves as an excellent resource for augmenting existing crab datasets to improve model performance.
  • Deep Learning: Suitable for developing and testing deep learning architectures for tabular data regression tasks.

Coverage

This dataset does not have a specific geographic or time-based coverage as it is synthetically generated. It includes male, female, and indeterminate crabs, but demographic details beyond these are not applicable.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: Can use this data to develop and fine-tune machine learning models for age prediction.
  • Students and Researchers: Can use this dataset for academic projects and research in biology, ecology, and data science.
  • Machine Learning Engineers: Can leverage this dataset for benchmarking regression algorithms and data augmentation techniques.

Dataset Name Suggestions

  • Synthetic Crab Physical Measurements for Age Prediction
  • Extended Crab Age Regression Data
  • Crab Biometrics and Age Dataset (Synthetic)

Attributes

Original Data Source: Extended Crab Age Regression Data

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

17/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format