Red and White Wine Quality Prediction Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This data product contains 6,497 observations detailing the chemical properties and expert quality scores for Portuguese vinho verde. It aims to facilitate data mining and machine learning efforts focused on predicting wine quality based on measurable attributes like acidity, sugar content, and alcohol level. The combination of chemical data and a consumer-based quality score (ranging from 3 to 9) provides a valuable resource for research in enology and statistical modelling.
Columns
The dataset includes 14 distinct attributes, presented with their descriptive statistics:
- botella_id: Unique identifier assigned to each sampled bottle (Min: 1, Max: 6496).
- acidez fija: Fixed acidity (Mean: 7.22, Range: 3.8 to 15.9).
- acidez volatil: Volatile acidity (Mean: 0.34, Range: 0.08 to 1.58).
- acido citrico: Citric acid content (Mean: 0.32, Range: 0 to 1.66).
- azucar residual: Residual sugar (Mean: 5.44, Range: 0.6 to 65.8).
- cloruros: Chlorides content (Mean: 0.06, Range: 0.01 to 0.61).
- dioxido de azufre libre: Free sulphur dioxide (Mean: 30.5, Range: 1 to 289).
- dioxido de azufre total: Total sulphur dioxide (Mean: 116, Range: 6 to 440).
- densidad: Density measurement (Mean: 0.99, Range: 0.99 to 1.04).
- pH: pH level (Mean: 3.22, Range: 2.72 to 4.01).
- sulfatos: Sulphates content (Mean: 0.53, Range: 0.22 to 2).
- alcohol: Alcohol percentage by volume (Mean: 10.5, Range: 8 to 14.9).
- color: Categorical variable denoting wine type: 'blanco' (white) or 'rojo' (red).
- calidad: Quality score derived from sensory evaluation (Mean: 5.82, Range: 3 to 9).
Distribution
The data is delivered in a single CSV file, named
calidad_de_vino.csv, with a file size of 452.92 kB. It contains 6,497 total valid records across 14 columns. Data validation indicates that there are zero missing or mismatched values. The dataset structure includes 75% white wine records ('blanco') and 25% red wine records ('rojo'). The expected update frequency for this static collection is "Never".Usage
This collection is highly suitable for statistical investigation and is rated as excellent for usability (10.00). Ideal applications include developing machine learning models for regression (predicting the exact quality score) or classification (predicting high vs. low quality). It is well-suited for beginner-level projects in exploratory data analysis (EDA).
Coverage
The data covers Portuguese vinho verde, incorporating observations on both white and red wine varieties. The scope is limited to the chemical and physical measurements taken on the final product and the corresponding sensory evaluation scores. The entire dataset has been translated into Spanish.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists: For training and evaluating classification and regression models based on physical properties.
- Academic Researchers: To study the influence of specific chemical variables (like acidity or residual sugar) on perceived quality.
- Students and Educators: As an accessible, high-quality resource for introductory data science projects.
- Vintners and Industry Professionals: To benchmark chemical parameters against expert quality ratings.
Dataset Name Suggestions
- Portuguese Wine Quality Analysis
- Vinho Verde Physicochemical Indicators
- Red and White Wine Quality Prediction Data
- Wine Quality Spanish Translation
Attributes
Original Data Source: Red and White Wine Quality Prediction Data
Loading...
