Opendatabay APP

Wine Quality Prediction Dataset

Product Reviews & Feedback

Tags and Keywords

Wine

Quality

Alcohol

Acidity

Density

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Wine Quality Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for modelling wine quality based on physicochemical tests and sensory evaluations. It combines data for both red and white variants of Portuguese Vinho Verde wine, featuring 1,599 red wine samples and 4,898 white wine samples. The dataset includes various input variables derived from objective physicochemical tests, such as acidity levels, pH values, and alcohol by volume (ABV), alongside a target output variable: a numerical quality score. This score, ranging from 0 (very bad) to 10 (very excellent), represents the median of at least three evaluations by wine experts. The dataset aims to facilitate a better understanding of the characteristics that define good or poor quality wine, offering insights into the complex interplay of chemical properties and human perception in winemaking. Due to privacy and logistic considerations, information regarding grape types, wine brands, and selling prices is not included.

Columns

  • type: A categorical variable indicating whether the wine is 'red' or 'white'. There are 6,497 valid entries, with white wine making up 75% of the samples.
  • fixed acidity: Measures the non-volatile acids naturally present in grapes, such as tartaric, malic, citric, or succinic acid. Expressed in grams per cubic decimetre (g/dm³), it ranges from 3.8 to 15.9, with a mean of 7.22.
  • volatile acidity: Represents acids that evaporate at low temperatures, predominantly acetic acid. High levels can lead to an unpleasant, vinegar-like taste. Values are in g/dm³, ranging from 0.08 to 1.58, with a mean of 0.34.
  • citric acid: Used as an acid supplement to boost acidity and add 'freshness' and flavour. Measured in g/dm³, values range from 0 to 1.66, with a mean of 0.32.
  • residual sugar: The amount of sugar remaining after fermentation ceases. Wines with less than 1 g/L are rare, while over 45 g/L indicates a sweet wine. Values are in g/dm³, ranging from 0.6 to 65.8, with a mean of 5.44.
  • chlorides: The concentration of chloride salts, such as sodium chloride, in the wine. Expressed in g/dm³, values range from 0.01 to 0.61, with a mean of 0.06.
  • free sulfur dioxide: The dissolved gas form of SO2, crucial for preventing microbial growth and wine oxidation. Measured in milligrams per cubic decimetre (mg/dm³), it ranges from 1 to 289, with a mean of 30.5.
  • total sulfur dioxide: The sum of free and bound SO2 forms. At concentrations over 50 ppm, SO2 becomes detectable in the wine's aroma and taste. Values are in mg/dm³, ranging from 6 to 440, with a mean of 116.
  • density: Indicates the wine's 'thickness', related to its alcohol and sugar content. Expressed in grams per cubic centimetre (g/cm³), values range from 0.99 to 1.04, with a mean of 0.99.
  • pH: A measure of the wine's acidity, with most wines falling between 3 and 4 on the pH scale. Values range from 2.72 to 4.01, with a mean of 3.22.
  • sulphates: Potassium sulphate, an additive that contributes to SO2 levels and acts as an antimicrobial and antioxidant agent. Measured in g/dm³, values range from 0.22 to 2, with a mean of 0.53.
  • alcohol: The alcohol by volume (ABV) percentage of the wine. Wine generally contains between 5–15% alcohol. Values range from 8% to 14.9%, with a mean of 10.5%.
  • quality: The output variable, a numerical score from 0 (very bad) to 10 (very excellent) assigned by wine experts. Values range from 3 to 9, with a mean of 5.82. The classes are ordered but not balanced, with more normal wines than excellent or poor ones.

Distribution

The dataset is provided as a CSV file named 'wine-quality-white-and-red.csv', with a file size of 390.51 KB. It contains 13 columns and 6,497 records (rows), representing 1,599 red wine samples and 4,898 white wine samples. All columns have 6,497 valid entries, indicating no missing values.

Usage

This dataset is ideal for:
  • Modelling wine quality based on physicochemical properties.
  • Classification tasks to predict wine quality categories.
  • Regression tasks to predict a continuous wine quality score.
  • Outlier detection to identify unusually good or poor wines.
  • Feature selection methods to determine the most relevant input variables for quality prediction.
  • Gaining insights into the chemical factors influencing wine sensory perception.

Coverage

The dataset focuses on Portuguese Vinho Verde wine. The data primarily covers physicochemical and sensory analysis metrics. While no specific time range for data collection is given, the source publication dates to 2009. The quality scores are based on evaluations by wine experts. The dataset does not include geographical details beyond the wine region, nor does it contain information on grape types, wine brand, or selling price. It is important to note that the wine quality classes are ordered but not balanced.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is beneficial for:
  • Data scientists and machine learning practitioners aiming to build predictive models for wine quality.
  • Researchers and academics studying sensory science, food chemistry, or data mining applications in the food and beverage industry.
  • Winemakers and wine connoisseurs seeking a scientific basis for understanding quality factors.
  • Anyone interested in exploring the relationship between chemical composition and perceived quality in wines.

Dataset Name Suggestions

  • Wine Quality Prediction Dataset
  • Vinho Verde Wine Quality Analysis
  • Red and White Wine Physicochemical Properties
  • Wine Quality Sensory Evaluation
  • Wine Characteristics and Quality Score

Attributes

Original Data Source: Wine Quality Prediction Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

13/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format