Red Wine Quality Prediction Dataset
Not Specified
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on red variants of Portuguese "Vinho Verde" wine, detailing various chemical properties and their influence on wine quality. It serves as a suitable foundation for classification or regression tasks aimed at predicting wine quality. While a relatively straightforward project, its complexity is heightened by the presence of fewer samples and a high degree of imbalance across the quality classes. The primary objective is to develop robust classification models to forecast wine quality, allowing for hyperparameter tuning and performance comparison across different classification algorithms.
Columns
The dataset comprises input variables derived from physicochemical tests and an output variable based on sensory data.
- fixed acidity: Value representing the fixed acidity.
- volatile acidity: Value indicating the volatile acidity.
- citric acid: Value for citric acid content.
- residual sugar: Value for the residual sugar amount.
- chlorides: Value for chloride content.
- free sulfur dioxide: Value for free sulfur dioxide.
- total sulfur dioxide: Value for total sulfur dioxide.
- density: Value representing the wine's density.
- pH: Value indicating the pH level.
- sulphates: Value for sulphate content.
- alcohol: Value representing the alcohol percentage.
- quality: The target quality score, ranging between 0 and 10.
- Id: An identification number for each entry.
Distribution
The dataset is provided in a CSV file format, named WineQT.csv, with a size of 78.06 kB. It contains 13 columns and consists of 1143 records or rows. It is important to note that the classes within the dataset are ordered but not balanced, meaning there are significantly more wines classified as 'normal' compared to 'excellent' or 'poor' quality. The dataset is considered highly imbalanced.
Usage
This dataset is ideal for:
- Building and evaluating classification models to predict red wine quality.
- Conducting regression analysis to estimate wine quality scores.
- Exploring and fine-tuning the hyperparameters of various machine learning algorithms.
- Comparing the evaluation metrics of different classification approaches for quality prediction.
- Developing and testing predictive models for quality anticipation in food and beverage products.
Coverage
The dataset specifically pertains to red variants of the Portuguese "Vinho Verde" wine. Information regarding specific time ranges or demographic scopes is not available within the provided material. The data availability highlights an imbalance in classes, with a disproportionate number of entries for average quality wines compared to very high or very low quality wines.
License
CC0: Public Domain
Who Can Use It
This dataset is particularly suitable for:
- Beginner data scientists and machine learning enthusiasts looking for a challenging yet approachable project to develop their classification and regression skills.
- Students undertaking projects related to data analysis, predictive modelling, or food science.
- Researchers interested in the chemical composition influencing wine quality or exploring methods for handling imbalanced datasets.
- Anyone keen to build predictive models for anticipating wine quality based on physicochemical properties.
Dataset Name Suggestions
- Red Wine Quality Prediction Dataset
- Vinho Verde Red Wine Physicochemical Quality
- Wine Quality Classification Challenge
- Portuguese Red Wine Quality Attributes
- Wine Physicochemical Properties and Quality
Attributes
Original Data Source: Red Wine Quality Prediction Dataset