Opendatabay APP

Wine Attribute and Quality Dataset

Data Science and Analytics

Tags and Keywords

Wine

Quality

Physicochemical

Portugal

Acidity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Wine Attribute and Quality Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset features over 6000 samples of red and white Vinho Verde wines from northern Portugal. Its primary purpose is to enable the modelling of wine quality based on various physicochemical test results. The dataset comprises two distinct sets, one for red wines and another for white wines, and can be used for classification or regression tasks. It is notable that the quality classes within the dataset are ordered and not evenly balanced, meaning there are more wines of normal quality than those considered excellent or poor.

Columns

The dataset includes several input variables derived from physicochemical tests and one output variable representing sensory quality:
  • fixed acidity: A measure of the non-volatile acids in wine.
  • volatile acidity: The amount of acetic acid in wine, which can lead to an unpleasant, vinegar-like taste at high levels.
  • citric acid: A small amount can add freshness and flavour to wines.
  • residual sugar: The amount of sugar remaining after fermentation has ceased.
  • chlorides: The amount of salt in the wine.
  • free sulfur dioxide: The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulphite ion; it prevents microbial growth and oxidation of wine.
  • total sulfur dioxide: The sum of free and bound forms of SO2; at low concentrations, SO2 is not detectable in wine, but at higher concentrations, it can impart an undesirable aroma.
  • density: The density of the wine, which is closely related to the alcohol and sugar content.
  • pH: A measure of how acidic or basic a solution is; on a scale of 0 to 14, most wines are between 3-4.
  • sulphates: A wine additive which can contribute to sulfur dioxide levels.
  • alcohol: The percentage of alcohol by volume in the wine.
  • quality: The output variable, representing the wine's quality score, ranging from 0 to 10 based on sensory data.
For the 'winequality-red.csv' file, specific numerical details for most columns (e.g., volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, quality) indicate 1599 total values, though further detailed statistics are not available for these particular samples.

Distribution

The dataset is provided in a data file format, typically CSV. It includes two separate datasets: one for red Vinho Verde wines and one for white Vinho Verde wines. While the exact total number of records for each individual file is not specified globally, the red wine dataset ('winequality-red.csv') is approximately 84.2 kB in size, with 12 columns, and indicates 1599 records. The dataset's structure allows for the values to be ordered, but it is important to note that the distribution of quality classes is not balanced.

Usage

This dataset is ideally suited for various analytical and machine learning applications. It can be used for:
  • Developing and evaluating classification models to predict wine quality categories.
  • Building regression models to forecast a continuous wine quality score.
  • Implementing outlier detection algorithms to identify exceptional (excellent or poor) wines, given their lower frequency.
  • Exploring and applying feature selection methods to determine the most relevant physicochemical inputs influencing wine quality.

Coverage

The data encompasses samples of Vinho Verde wines originating from the north of Portugal. The dataset is focused purely on physicochemical and sensory variables. Due to privacy and logistical considerations, it does not include information such as grape types, wine brand, or wine selling price. No specific time range or demographic scope is detailed.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is particularly valuable for:
  • Data Scientists and Machine Learning Engineers: For building and refining predictive models for wine quality.
  • Researchers and Academics: Those studying the influence of chemical properties on sensory attributes of wine.
  • Statisticians: For exploring data distributions, correlations, and applying various statistical tests.
  • Wine Industry Professionals: To gain insights into quality factors, potentially informing production or quality control processes.
  • Students: As a practical dataset for learning about classification, regression, and data analysis techniques.

Dataset Name Suggestions

  • Vinho Verde Wine Quality Prediction
  • Portuguese Wine Quality Factors
  • Red and White Vinho Verde Physicochemical Analysis
  • Wine Quality Assessment Dataset
  • Wine Attribute and Quality Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

30/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format