Dark Mode

Home

Data Categories

AI & ML Data

Global Wine Tasting Notes Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Global Wine Tasting Notes Dataset

Finance & Banking Analytics

Tags and Keywords

Business

News

Text

Nlp

Alcohol

Multiclass

Trusted By

Global Wine Tasting Notes Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains a collection of wine reviews and comments provided by various wine tasters. Its primary purpose is to facilitate data exploration and text analytics, particularly for the recognition and classification of individual commenters based on their review styles. It is an excellent resource for those new to Natural Language Processing (NLP) and is designed to demonstrate key techniques involved in language processing, operating on the assumption that each taster possesses a unique descriptive style for wines.

Columns

Unnamed: An unlabelled column, likely an index or ID.
country: Specifies the country of origin for the wine, covering countries from around the world.
description: Contains the detailed reviews and comments given for each wine variety.
designation: Provides the wine's designation.
points: Represents the score achieved by the wine variety, as awarded by the reviewer.
price: Indicates the price of the wine.
province: Denotes the province where the wine originates.
region_1: Specifies the first region associated with the wine.
region_2: Specifies the second region associated with the wine.
taster_name: The name of the individual taster who provided the review.

Distribution

The dataset typically comes in CSV format and comprises approximately 130,000 records.

Country Distribution: The United States accounts for 42% of entries, France 17%, with other countries making up 41%.
Points Distribution: Scores range from 80 to 100, with the majority falling between 86.00 and 89.20.
Price Distribution: Prices vary significantly, with a large concentration (over 110,000 records) in the 4.00-69.92 range, extending up to 3300.
Province Distribution: California is the most represented province at 28%, followed by Washington at 7%, and others at 65%.
Taster Distribution: Approximately 20% of entries have no taster name recorded, while Roger Voss accounts for 20% of the named tasters.
Certain columns like 'designation', 'region_1', and 'region_2' contain a notable percentage of null values (29%, 16%, and 61% respectively).

Usage

This dataset is ideal for various applications and use cases, including:

Data exploration and initial data analysis.
Text analytics for understanding patterns in wine reviews.
Text classification to categorise or identify wine tasters based on their review content.
Demonstrating and learning Natural Language Processing (NLP) techniques.
Developing models for multiclass classification.

Coverage

The dataset has a global geographic scope, featuring wines from various countries, with a significant presence from the United States (42%) and France (17%). Key provinces such as California (28%) and Washington (7%) are well-represented. No specific time range for the reviews is provided in the available information. The demographic scope centres around the named wine tasters, though no detailed demographic information about them is included. Data availability varies by column, with some columns containing a considerable number of missing values.

License

CC0

Who Can Use It

Beginners in Natural Language Processing (NLP) looking for a practical dataset to apply text classification techniques.
Data scientists and analysts interested in performing data exploratory analysis on text data.
Researchers focusing on text analytics and authorship attribution (recognising commenters).
Anyone aiming to build and demonstrate models for language processing or multiclass classification.

Dataset Name Suggestions

Wine Reviews & Taster Comments
Global Wine Tasting Notes
Wine Review NLP Dataset
Wine Taster Classification Data
Wine Ratings and Descriptions

Attributes

Original Data Source: Winedata

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...