Dark Mode

Home

Data Categories

Synthetic Data

Synthetic Mushroom Overload

FREE DATASET LIBRARY

Verified Data Provider

£0

Synthetic Mushroom Overload

Synthetic Data Generation

Tags and Keywords

Mushroom

Fungi

Synthetic

Research

Classification

Trusted By

Synthetic Mushroom Overload Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The data provides detailed descriptive attributes related to fungal morphology. It is a large, artificially created resource designed specifically for machine learning and deep learning applications, particularly those requiring extensive datasets for training and testing. The primary context is to aid research in mycological feature analysis and noise component modelling.

Columns

The dataset contains 21 features capturing physical characteristics of mushrooms. Key fields include:

class: The target variable, indicating whether the mushroom is poisonous (p) or edible (e).
cap-diameter: A measurement of the cap's width, ranging from 0.22 up to 66.9.
cap-shape: Categorical descriptor of the cap's form.
cap-color: Categorical descriptor of the cap's hue, with 12 unique values.
does-bruise-or-bleed: A boolean indicating if bruising or bleeding occurs.
gill-color: The colour of the mushroom's gills.
stem-height: The height of the stalk, with a mean value of 6.7.
stem-width: The width of the stalk, ranging significantly up to 119.
habitat: The typical environment where the mushroom is found.
season: The time of year the mushroom data pertains to, with the most common season starting with 'a'.

Note that several fields, including stem-root, stem-surface, veil-type, veil-color, and spore-print-color, contain a high percentage of missing values.

Distribution

The data is distributed as a single CSV file, mushroom_overload.csv, with a file size of approximately 312.17 MB. It contains precisely 6.72 million records. The structure is tabular, with 21 columns available for analysis. The column class shows that 55% of the entries are designated as poisonous ('p').

Usage

This product is highly suitable for advanced data manipulation and model building. It can serve as a standalone resource for general experimentation. One primary use case involves utilizing the synthetic data to fit an initial predictive model. The results from this model can then be used as an "ideal prediction" feature when tackling datasets that contain both synthetic data and generated noise (SD+GAN), allowing researchers to isolate and model the noise component more effectively. Post-processing and additional feature engineering are advised to tailor the data to specific research demands.

Coverage

Coverage is focused exclusively on simulated mycological attributes. As the data is synthetically generated, it does not possess specific geographic locations or a real-world time range. It models the features commonly used in classifying fungi, including morphology, colouration, and environmental context (habitat, season).

License

CC0: Public Domain

Who Can Use It

Intended users include data scientists and machine learning specialists who require high-volume, structured input for training classification algorithms. Researchers in biology, ecology, and mycology focusing on trait prediction will find this resource valuable. Furthermore, individuals involved in developing techniques for synthetic data analysis and noise modelling are ideal consumers.

Dataset Name Suggestions

Synthetic Mushroom Overload
6.7 Million Fungal Records
Mycological Data Augmentation Set
SD Mushroom Feature Resource
AI Fungi Classification Input

Attributes

Original Data Source: Synthetic Mushroom Overload

Listing Stats

VIEWS

DOWNLOADS

LISTED

20/10/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Synthetic Mushroom Overload

Synthetic Data Generation

Tags and Keywords

Mushroom

Fungi

Synthetic

Research

Classification

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS