Global Wine Tasting Notes Dataset
Finance & Banking Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of wine reviews and comments provided by various wine tasters. Its primary purpose is to facilitate data exploration and text analytics, particularly for the recognition and classification of individual commenters based on their review styles. It is an excellent resource for those new to Natural Language Processing (NLP) and is designed to demonstrate key techniques involved in language processing, operating on the assumption that each taster possesses a unique descriptive style for wines.
Columns
- Unnamed: An unlabelled column, likely an index or ID.
- country: Specifies the country of origin for the wine, covering countries from around the world.
- description: Contains the detailed reviews and comments given for each wine variety.
- designation: Provides the wine's designation.
- points: Represents the score achieved by the wine variety, as awarded by the reviewer.
- price: Indicates the price of the wine.
- province: Denotes the province where the wine originates.
- region_1: Specifies the first region associated with the wine.
- region_2: Specifies the second region associated with the wine.
- taster_name: The name of the individual taster who provided the review.
Distribution
The dataset typically comes in CSV format and comprises approximately 130,000 records.
- Country Distribution: The United States accounts for 42% of entries, France 17%, with other countries making up 41%.
- Points Distribution: Scores range from 80 to 100, with the majority falling between 86.00 and 89.20.
- Price Distribution: Prices vary significantly, with a large concentration (over 110,000 records) in the 4.00-69.92 range, extending up to 3300.
- Province Distribution: California is the most represented province at 28%, followed by Washington at 7%, and others at 65%.
- Taster Distribution: Approximately 20% of entries have no taster name recorded, while Roger Voss accounts for 20% of the named tasters.
- Certain columns like 'designation', 'region_1', and 'region_2' contain a notable percentage of null values (29%, 16%, and 61% respectively).
Usage
This dataset is ideal for various applications and use cases, including:
- Data exploration and initial data analysis.
- Text analytics for understanding patterns in wine reviews.
- Text classification to categorise or identify wine tasters based on their review content.
- Demonstrating and learning Natural Language Processing (NLP) techniques.
- Developing models for multiclass classification.
Coverage
The dataset has a global geographic scope, featuring wines from various countries, with a significant presence from the United States (42%) and France (17%). Key provinces such as California (28%) and Washington (7%) are well-represented. No specific time range for the reviews is provided in the available information. The demographic scope centres around the named wine tasters, though no detailed demographic information about them is included. Data availability varies by column, with some columns containing a considerable number of missing values.
License
CC0
Who Can Use It
- Beginners in Natural Language Processing (NLP) looking for a practical dataset to apply text classification techniques.
- Data scientists and analysts interested in performing data exploratory analysis on text data.
- Researchers focusing on text analytics and authorship attribution (recognising commenters).
- Anyone aiming to build and demonstrate models for language processing or multiclass classification.
Dataset Name Suggestions
- Wine Reviews & Taster Comments
- Global Wine Tasting Notes
- Wine Review NLP Dataset
- Wine Taster Classification Data
- Wine Ratings and Descriptions
Attributes
Original Data Source: Winedata