Opendatabay APP

Red Wine Quality Prediction Dataset

Not Specified

Tags and Keywords

Wine

Quality

Prediction

Classification

Alcohol

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Red Wine Quality Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset focuses on red variants of Portuguese "Vinho Verde" wine, detailing various chemical properties and their influence on wine quality. It serves as a suitable foundation for classification or regression tasks aimed at predicting wine quality. While a relatively straightforward project, its complexity is heightened by the presence of fewer samples and a high degree of imbalance across the quality classes. The primary objective is to develop robust classification models to forecast wine quality, allowing for hyperparameter tuning and performance comparison across different classification algorithms.

Columns

The dataset comprises input variables derived from physicochemical tests and an output variable based on sensory data.
  • fixed acidity: Value representing the fixed acidity.
  • volatile acidity: Value indicating the volatile acidity.
  • citric acid: Value for citric acid content.
  • residual sugar: Value for the residual sugar amount.
  • chlorides: Value for chloride content.
  • free sulfur dioxide: Value for free sulfur dioxide.
  • total sulfur dioxide: Value for total sulfur dioxide.
  • density: Value representing the wine's density.
  • pH: Value indicating the pH level.
  • sulphates: Value for sulphate content.
  • alcohol: Value representing the alcohol percentage.
  • quality: The target quality score, ranging between 0 and 10.
  • Id: An identification number for each entry.

Distribution

The dataset is provided in a CSV file format, named WineQT.csv, with a size of 78.06 kB. It contains 13 columns and consists of 1143 records or rows. It is important to note that the classes within the dataset are ordered but not balanced, meaning there are significantly more wines classified as 'normal' compared to 'excellent' or 'poor' quality. The dataset is considered highly imbalanced.

Usage

This dataset is ideal for:
  • Building and evaluating classification models to predict red wine quality.
  • Conducting regression analysis to estimate wine quality scores.
  • Exploring and fine-tuning the hyperparameters of various machine learning algorithms.
  • Comparing the evaluation metrics of different classification approaches for quality prediction.
  • Developing and testing predictive models for quality anticipation in food and beverage products.

Coverage

The dataset specifically pertains to red variants of the Portuguese "Vinho Verde" wine. Information regarding specific time ranges or demographic scopes is not available within the provided material. The data availability highlights an imbalance in classes, with a disproportionate number of entries for average quality wines compared to very high or very low quality wines.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly suitable for:
  • Beginner data scientists and machine learning enthusiasts looking for a challenging yet approachable project to develop their classification and regression skills.
  • Students undertaking projects related to data analysis, predictive modelling, or food science.
  • Researchers interested in the chemical composition influencing wine quality or exploring methods for handling imbalanced datasets.
  • Anyone keen to build predictive models for anticipating wine quality based on physicochemical properties.

Dataset Name Suggestions

  • Red Wine Quality Prediction Dataset
  • Vinho Verde Red Wine Physicochemical Quality
  • Wine Quality Classification Challenge
  • Portuguese Red Wine Quality Attributes
  • Wine Physicochemical Properties and Quality

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free