Global Book Listings
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated collection of books listed on Goodreads, designed to offer a clean and reliable source of book information. It was created to address the common issues of missing columns or unclean data found in other available book datasets. Derived through the Goodreads API, this dataset includes essential features, making it valuable for understanding book details, popularity, and reader engagement. Its primary purpose is to serve book enthusiasts and researchers alike, offering numerical insights into how books are perceived by readers rather than just text reviews.
Columns
- bookID: A distinct identification number for each book.
- title: The name under which the book was published.
- authors: The names of the book's authors. Multiple authors are separated by a forward slash (/).
- average_rating: The overall average rating assigned to the book.
- isbn: The International Standard Book Number, another unique identifier for the book.
- isbn13: A 13-digit ISBN used to identify the book.
- language_code: Indicates the book's primary language (e.g., 'eng' for English).
- num_pages: The total number of pages in the book.
- ratings_count: The total number of ratings the book has received.
- text_reviews_count: The total count of written text reviews for the book.
- publication_date: The date when the book was first published.
- publisher: The name of the publishing entity.
Distribution
The dataset is provided as a CSV file, named
books.csv
, with a size of approximately 1.56 MB. It contains 12 columns and consists of roughly 11,100 records, offering a structured collection of book data.Usage
This dataset is ideal for various applications, including:
- Discovering new books to read based on ratings and details.
- Exploring specific details of books you have already read.
- Generating word clouds or other visualisations from book titles or themes.
- Developing content recommendation systems for books.
- Analysing trends in book publishing, author popularity, and reader preferences.
Coverage
The dataset focuses on books listed on Goodreads, aiming to include literature irrespective of language or publication specifics, suggesting a broad, global scope. The data collection began on 25 May 2019, with updates initially planned weekly, though maintenance ceased on 8 December 2020. Publication dates for books within the dataset range from 1 January 1900 to 31 March 2020. There are no specific notes on demographic availability, as it is intended for "all book-lovers".
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Book enthusiasts: To discover and explore book information.
- Bibliophiles: Individuals with a deep love for books, seeking detailed information or inspiration.
- Researchers: For academic studies on literature, reading habits, or publishing trends.
- Data scientists and developers: For building applications such as book recommendation engines or data visualisations.
Dataset Name Suggestions
- Goodreads Book Data
- Global Book Listings
- Literary Ratings Dataset
- Curated Books Data
Attributes
Original Data Source: Global Book Listings