Historical Amazon Book Sales Data
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a detailed collection of the top 50 bestselling fiction and non-fiction novels on Amazon annually from 2009 to 2021. It provides insights into popular literature, author success, pricing trends, and customer engagement metrics such as user ratings and review counts over more than a decade. The data was gathered from the amazon.com website and Kaggle, drawing inspiration from a similar dataset covering 2009-2020. It serves as a valuable resource for understanding the dynamics of the online book market.
Columns
- Name: The title of the book. Notable titles include "Publication Manual of the American Psychological Association, 6th Edition". There are 420 unique book titles recorded.
- Author: The individual or organisation credited with writing the book. Jeff Kinney is a frequently occurring author. There are 292 unique authors within the dataset.
- User Rating: A score ranging from 1 to 5, indicating how users rated the book. The average rating is approximately 4.64, with ratings generally falling between 3.3 and 4.9.
- Reviews: The total number of people who provided a score for a book. The average number of reviews is around 17,200, with numbers varying widely from 37 to 193,000.
- Price: The selling price of the book. The most common price observed is $8.00. The dataset includes 87 unique price points.
- Price_r: The selling price of the book, rounded up. The most common rounded price is $8. There are 41 unique rounded price points.
- Year: The year the book appeared on the bestselling list. The data spans from 2009 to 2021.
- Genre: Categorisation of the book as either Fiction or Non Fiction. Non Fiction titles represent 56% of the dataset, with Fiction making up the remaining 44%.
Distribution
The dataset is provided in CSV format, available in both zipped and unzipped versions. It comprises 8 columns and contains 650 individual records, each representing a bestselling novel from a specific year. All columns are fully populated, with no missing or mismatched data entries, ensuring high data quality. The raw CSV file size is approximately 66.87 kB.
Usage
This dataset is ideal for:
- Data analytics projects, particularly for beginners looking to practice their skills.
- Literature research, to identify trends in popular book titles, authors, and genres over time.
- Market analysis, to study pricing strategies and user engagement within the book publishing industry.
- Data cleaning exercises, offering a well-structured base for learning data preparation techniques.
- Educational purposes, providing real-world data for case studies in e-commerce and consumer behaviour.
Coverage
The dataset covers Amazon's top 50 bestselling novels each year over a 13-year period, from 2009 to 2021. The scope is global, reflecting the listings on amazon.com. Each year within this timeframe is represented by 50 distinct titles, offering a consistent snapshot of popularity.
License
CC0: Public Domain
Who Can Use It
- Data analysts and scientists: To identify market trends, perform statistical analysis on book popularity, and build predictive models.
- Students and educators: For practical learning in data analytics, data cleaning, and data visualisation courses.
- Publishing industry professionals: To gain insights into bestselling patterns, evaluate pricing models, and understand consumer preferences.
- Authors and literary researchers: To explore successful genres, author prominence, and factors contributing to a book's success.
- E-commerce strategists: To analyse product performance metrics like ratings and reviews in a retail context.
Dataset Name Suggestions
- Amazon Bestselling Books 2009-2021
- Amazon Top Novels Data (2009-2021)
- Historical Amazon Book Sales Data
- Amazon Bestsellers Annual Review
- Bestselling Books on Amazon (2009-2021)
Attributes
Original Data Source: Historical Amazon Book Sales Data