Opendatabay APP

Red and White Wine Quality Prediction Data

Data Science and Analytics

Tags and Keywords

Wine

Quality

Tabular

Acidity

Alcohol

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Red and White Wine Quality Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This data product contains 6,497 observations detailing the chemical properties and expert quality scores for Portuguese vinho verde. It aims to facilitate data mining and machine learning efforts focused on predicting wine quality based on measurable attributes like acidity, sugar content, and alcohol level. The combination of chemical data and a consumer-based quality score (ranging from 3 to 9) provides a valuable resource for research in enology and statistical modelling.

Columns

The dataset includes 14 distinct attributes, presented with their descriptive statistics:
  • botella_id: Unique identifier assigned to each sampled bottle (Min: 1, Max: 6496).
  • acidez fija: Fixed acidity (Mean: 7.22, Range: 3.8 to 15.9).
  • acidez volatil: Volatile acidity (Mean: 0.34, Range: 0.08 to 1.58).
  • acido citrico: Citric acid content (Mean: 0.32, Range: 0 to 1.66).
  • azucar residual: Residual sugar (Mean: 5.44, Range: 0.6 to 65.8).
  • cloruros: Chlorides content (Mean: 0.06, Range: 0.01 to 0.61).
  • dioxido de azufre libre: Free sulphur dioxide (Mean: 30.5, Range: 1 to 289).
  • dioxido de azufre total: Total sulphur dioxide (Mean: 116, Range: 6 to 440).
  • densidad: Density measurement (Mean: 0.99, Range: 0.99 to 1.04).
  • pH: pH level (Mean: 3.22, Range: 2.72 to 4.01).
  • sulfatos: Sulphates content (Mean: 0.53, Range: 0.22 to 2).
  • alcohol: Alcohol percentage by volume (Mean: 10.5, Range: 8 to 14.9).
  • color: Categorical variable denoting wine type: 'blanco' (white) or 'rojo' (red).
  • calidad: Quality score derived from sensory evaluation (Mean: 5.82, Range: 3 to 9).

Distribution

The data is delivered in a single CSV file, named calidad_de_vino.csv, with a file size of 452.92 kB. It contains 6,497 total valid records across 14 columns. Data validation indicates that there are zero missing or mismatched values. The dataset structure includes 75% white wine records ('blanco') and 25% red wine records ('rojo'). The expected update frequency for this static collection is "Never".

Usage

This collection is highly suitable for statistical investigation and is rated as excellent for usability (10.00). Ideal applications include developing machine learning models for regression (predicting the exact quality score) or classification (predicting high vs. low quality). It is well-suited for beginner-level projects in exploratory data analysis (EDA).

Coverage

The data covers Portuguese vinho verde, incorporating observations on both white and red wine varieties. The scope is limited to the chemical and physical measurements taken on the final product and the corresponding sensory evaluation scores. The entire dataset has been translated into Spanish.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Data Scientists: For training and evaluating classification and regression models based on physical properties.
  • Academic Researchers: To study the influence of specific chemical variables (like acidity or residual sugar) on perceived quality.
  • Students and Educators: As an accessible, high-quality resource for introductory data science projects.
  • Vintners and Industry Professionals: To benchmark chemical parameters against expert quality ratings.

Dataset Name Suggestions

  • Portuguese Wine Quality Analysis
  • Vinho Verde Physicochemical Indicators
  • Red and White Wine Quality Prediction Data
  • Wine Quality Spanish Translation

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

15/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format