Best Books of The 21st Century Dataset
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data consists of over 9000 different books selected from the Best Books of the 21st century list on Goodreads. Goodreads serves as the world’s largest platform for readers and book recommendations, utilising its systems to analyse data points to provide tailored suggestions. This collection provides valuable insights into modern literary tastes, including titles, authors, genres, associated ratings, reviews, and awards for books published within this century.
Columns
The dataset includes 14 columns detailing various attributes of each book:
- id: The unique identification number assigned to the book.
- title: The specific name or title of the book.
- series: Indicates if the book belongs to a series (this value is often null if the book is a standalone title).
- author: The name of the person who wrote the book.
- book_link: The direct URL link to the book's page on Goodreads.
- genre: The listed genres of the book, which are ordered according to the number of user votes.
- date_published: The date the book was released; some records contain only the month or year of publication.
- publisher: The organisation responsible for publishing the book.
- num_of_page: The count of pages in the physical book (the mean page count is approximately 356).
- lang: The primary language of the book (English accounts for 91% of records).
- review_count: The total number of reviews left by users (mean is 3.9 thousand).
- rating_count: The total number of ratings provided by users (mean is 66.4 thousand).
- rate: The average user rating for the book (the mean rating is 3.99 out of 5).
- award: Lists any notable awards the book has received.
Distribution
The data is typically provided in a CSV file format, totalling 3.68 MB. It contains approximately 10,000 records across 14 distinct columns. Approximately 9,098 unique book titles are present in the collection. While most columns are well-populated, certain fields exhibit notable missing data; for example, the
series column is missing in 57% of the records, and 60% of records are missing information in the award column.Usage
- Developing recommendation systems tailored to modern literary preferences.
- Conducting market analysis to understand popular 21st-century literary trends and genre success.
- Performing natural language processing (NLP) tasks on titles, reviews, and genres.
- Studying the correlation between publication variables (page count, language, date) and reader success metrics (rating and review counts).
Coverage
The dataset focuses on books identified as the "Best Books of 21st Century" sourced from the Goodreads platform. The temporal scope is limited to books published since the year 2000. Geographically and linguistically, the scope is heavily centred on English-language literature, which constitutes 91% of the available records. The data reflects reader engagement (ratings and reviews) on a global platform.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: Utilising the 20 billion underlying data points referenced by Goodreads to train and evaluate book recommendation engines.
- Literary Critics and Researchers: Analysing the critical reception and popular success (via ratings and awards) of contemporary literature.
- Publishing Houses: Gaining insights into which genres, authors, and book lengths achieve high user engagement and positive ratings.
- Book Enthusiasts: Exploring and filtering the top-rated and reviewed modern titles.
Dataset Name Suggestions
- Goodreads 21st Century Book Metadata
- Modern Bestsellers Ratings and Reviews
- Top Rated Literature of the New Millennium
- Best Books of The 21st Century Dataset
Attributes
Original Data Source: Best Books of The 21st Century Dataset
Loading...
