Orange vs. Grapefruit Classification Data
Food & Beverage Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is specifically designed for binary classification tasks, aiming to differentiate between oranges and grapefruit [1, 2]. While humans can easily tell the difference, this dataset provides a structured approach for computational analysis [1]. It includes a wide variety of generated values for diameter, weight, and colour, derived from the average characteristics of oranges and grapefruit [1]. This makes it an ideal and engaging resource for teaching situations involving binary classification [2]. The dataset is mostly fictional, created by generating artificial samples from starting fruit measurements [1].
Columns
- name: This column serves as the label, indicating whether the fruit is an 'orange' or a 'grapefruit' [2]. It has two unique values and consists of 10,000 valid entries [2].
- diameter: Represents the diameter of the citrus fruit, measured in centimetres [2]. The values range from approximately 2.96 to 16.4 centimetres, with an average of 9.98 cm [3].
- weight: Indicates the weight of the citrus fruit in grams [3]. These values span from roughly 86.76 to 262 grams, with a mean of 175 grams [3, 4].
- red: Displays the average red reading from an RGB scan, with values ranging from 0 to 255 [4]. The mean red value is 154 [5].
- green: Shows the average green reading from an RGB scan, with values from 0 to 255 [5]. The mean green value is 76 [5].
- blue: Presents the average blue reading from an RGB scan, with values between 0 and 255 [6]. The mean blue value is 11.4 [6].
Distribution
The dataset is supplied in a CSV (Comma Separated Values) format [2, 7]. It contains 10,000 records (rows) [2-6]. All six columns are fully populated, with 100% valid data and no mismatched or missing entries [2-6].
Usage
This dataset is well-suited for developing and testing binary classification algorithms [2]. Its primary use is in educational settings to teach machine learning principles, particularly supervised learning for classification problems [2]. Specific applications include:
- Building predictive models to distinguish between oranges and grapefruits based on their physical attributes and colour [1].
- Providing a clean dataset for practising data preprocessing, feature engineering, and model training.
- Exploring various data visualisation techniques for multi-dimensional data.
Coverage
The dataset is predominantly fictional and artificially generated, based on the typical characteristics of oranges and grapefruits [1]. Consequently, it does not possess real-world geographic, time-based, or demographic coverage [1]. It is intended as a synthetic dataset for educational and model development purposes [1, 2].
License
CC0: Public Domain
Who Can Use It
- Students and educators: For learning and teaching machine learning concepts, especially binary classification [2].
- Data scientists and analysts: For prototyping and experimenting with classification algorithms without the need for real-world data collection.
- Researchers: As a straightforward, clean dataset for demonstrating new classification methodologies.
Dataset Name Suggestions
- Citrus Fruit Classifier Dataset
- Orange vs. Grapefruit Classification Data
- Fruit Attributes for ML
- Binary Citrus Identification
- Simulated Fruit Data
Attributes
Original Data Source: Orange vs. Grapefruit Classification Data