Simulated Mushroom Edibility Prediction Data
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This resource contains a simulation designed specifically for binary classification problems involving fungi. It includes 61,069 hypothetical records, generated to categorise simulated mushrooms as either definitely edible or definitely poisonous/not recommended. The data was created using a Python module that applies randomization to both nominal and metrical variables, expanding upon features derived from a smaller, primary mushroom dataset. This simulation is useful for training and testing machine learning models focused on predicting edibility.
Columns
The dataset features 23 distinct attributes detailing the physical characteristics of the simulated fungi. These columns cover essential morphological traits used in classification:
- family: The taxonomic family of the mushroom.
- name: The specific name of the fungus.
- class: The target variable, indicating edibility (edible or poisonous).
- cap-diameter, cap-shape, cap-surface, cap-color: Metrics and descriptions related to the mushroom's cap structure.
- does-bruise-or-bleed: An indicator of physical reaction.
- gill-attachment, gill-spacing, gill-color: Variables describing the gills.
- stem-height, stem-width, stem-root, stem-surface, stem-color: Detailed attributes of the stem.
- veil-type, veil-color, has-ring, ring-type, Spore-print-color, habitat, season: Other key features often used in fungal identification.
Distribution
This collection consists of 61,069 records of simulated mushrooms, generated based on 173 species, with 353 mushrooms represented per species. The files are provided in CSV format. Two versions are available: one ordered by species (
secondary_data_generated.csv) and one randomly shuffled (secondary_data_shuffled.csv). This dataset is static and has an expected update frequency of never.Usage
Ideal applications for this data include:
- Developing and evaluating machine learning algorithms for binary classification, particularly for toxicity prediction.
- Studying the outcomes of using randomized nominal and metrical variables in data science projects.
- Educational training on data generation techniques and the process of expanding primary datasets.
- Exploratory data analysis focused on correlations between fungal morphology and edibility.
Coverage
The scope of this dataset is purely hypothetical, focusing on simulated characteristics derived from 173 real mushroom species. Since the data is artificially generated via randomization, it does not possess real-world geographical or temporal restrictions. It encompasses the structural variety necessary for a robust classification exercise.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Machine Learning Practitioners: For training robust classification models without needing large, verified real-world samples.
- Students and Educators: For practical demonstrations of data simulation, feature randomization, and classification project workflow.
- Researchers: To explore how structural attributes correlate with edibility in a controlled, hypothetical environment.
Dataset Name Suggestions
- Simulated Mushroom Edibility Prediction Data
- Fungal Classification Data (Secondary Simulation)
- Hypothetical Fungi Edibility Schema
Attributes
Original Data Source: Simulated Mushroom Edibility Prediction Data
Loading...
