Synthetic Mushroom Overload
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data provides detailed descriptive attributes related to fungal morphology. It is a large, artificially created resource designed specifically for machine learning and deep learning applications, particularly those requiring extensive datasets for training and testing. The primary context is to aid research in mycological feature analysis and noise component modelling.
Columns
The dataset contains 21 features capturing physical characteristics of mushrooms. Key fields include:
- class: The target variable, indicating whether the mushroom is poisonous (p) or edible (e).
- cap-diameter: A measurement of the cap's width, ranging from 0.22 up to 66.9.
- cap-shape: Categorical descriptor of the cap's form.
- cap-color: Categorical descriptor of the cap's hue, with 12 unique values.
- does-bruise-or-bleed: A boolean indicating if bruising or bleeding occurs.
- gill-color: The colour of the mushroom's gills.
- stem-height: The height of the stalk, with a mean value of 6.7.
- stem-width: The width of the stalk, ranging significantly up to 119.
- habitat: The typical environment where the mushroom is found.
- season: The time of year the mushroom data pertains to, with the most common season starting with 'a'.
Note that several fields, including
stem-root, stem-surface, veil-type, veil-color, and spore-print-color, contain a high percentage of missing values.Distribution
The data is distributed as a single CSV file,
mushroom_overload.csv, with a file size of approximately 312.17 MB. It contains precisely 6.72 million records. The structure is tabular, with 21 columns available for analysis. The column class shows that 55% of the entries are designated as poisonous ('p').Usage
This product is highly suitable for advanced data manipulation and model building. It can serve as a standalone resource for general experimentation. One primary use case involves utilizing the synthetic data to fit an initial predictive model. The results from this model can then be used as an "ideal prediction" feature when tackling datasets that contain both synthetic data and generated noise (SD+GAN), allowing researchers to isolate and model the noise component more effectively. Post-processing and additional feature engineering are advised to tailor the data to specific research demands.
Coverage
Coverage is focused exclusively on simulated mycological attributes. As the data is synthetically generated, it does not possess specific geographic locations or a real-world time range. It models the features commonly used in classifying fungi, including morphology, colouration, and environmental context (habitat, season).
License
CC0: Public Domain
Who Can Use It
Intended users include data scientists and machine learning specialists who require high-volume, structured input for training classification algorithms. Researchers in biology, ecology, and mycology focusing on trait prediction will find this resource valuable. Furthermore, individuals involved in developing techniques for synthetic data analysis and noise modelling are ideal consumers.
Dataset Name Suggestions
- Synthetic Mushroom Overload
- 6.7 Million Fungal Records
- Mycological Data Augmentation Set
- SD Mushroom Feature Resource
- AI Fungi Classification Input
Attributes
Original Data Source: Synthetic Mushroom Overload
Loading...
