Goodreads Modern Classics Rating Metrics
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This compilation contains details for the top 100 titles identified as classics over a 25-year period, specifically books published between 1983 and 2008. The list is curated based on a selection originally compiled by Entertainment Weekly and focuses on highly-rated books within this timeframe. The underlying data was acquired by scraping Goodreads, a major social cataloging website and community for readers worldwide. This resource is designed for literary enthusiasts and data analysts seeking to explore the metrics associated with successful, widely read modern classics.
Columns
- Ranking: The assigned position of the book on the classic list (ranging from 0 to 99).
- title: The primary title of the book.
- language: The language of the specific edition catalogued, with English being the most prevalent.
- series: Identifies if the book belongs to a sequence of titles, such as a fictional saga.
- author: The primary writer credited for the work.
- pages: The total page count of the book edition listed.
- avg_rating: The mean user rating recorded on the Goodreads platform at the point of data acquisition.
- no_ratings: The overall count of ratings the book has received, with figures extending into the millions.
- description: A summary or brief background text provided on the Goodreads listing.
- awards: Details of notable accolades, prizes, or distinctions the book has achieved.
Distribution
The data product is structured as a CSV file, named
goodreads_dataset.csv, with a file size of approximately 149.85 kB. It contains 10 distinct variables and features 100 individual records, representing 100 unique book titles. While most data is valid, certain fields show gaps; for instance, 13% of the data is missing language information, 59% lacks series data, 14% is missing page counts, and 22% is missing award details.Usage
This dataset is suitable for Exploratory Data Analysis (EDA) and statistical analysis focused on literary topics. It can be used to study correlations between metrics such as page length and rating volume. Ideal applications include creating visualizations using tools like Matplotlib, investigating trends in book popularity among readers, and serving as an accessible resource for beginners learning text and literature data science.
Coverage
The dataset focuses exclusively on books published within the 25-year interval from 1983 to 2008. Although the source platform, Goodreads, is an American entity, the community input reflects the reading preferences of a global audience. The linguistic scope is highly centred on English, which accounts for 86% of the editions listed in the collection. The data is confined to the specific top titles selected by Entertainment Weekly.
License
CC0: Public Domain
Who Can Use It
- Data Scientists/Analysts: To practise statistical analysis and create models based on user rating patterns.
- Literary Critics/Researchers: To quantify and compare success factors (ratings, awards) for modern classic novels.
- Students and Beginners: To work with a well-structured text and literature dataset for educational projects.
- Book Publishers: To identify characteristics common to highly rated titles from the defined era.
Dataset Name Suggestions
- Goodreads Modern Classics Rating Metrics
- Top 100 Classics 1983–2008 Data
- Literary Popularity and Success Factors
- Entertainment Weekly Top Classics Scrape
Attributes
Original Data Source: Goodreads Modern Classics Rating Metrics
Loading...
