Gutenberg Goodreads Book Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to support book recommendation tasks, drawing its content from a combination of the Goodreads and Gutenberg datasets. It contains key information for 5750 English books, providing a foundation for building systems that utilise both user ratings (from Goodreads) and book text (from Gutenberg) to generate recommendations. It is a valuable resource for projects focused on literary analysis and recommendation engines.
Columns
The dataset features four primary columns:
- guten_bid: This identifies the book within the Gutenberg dataset.
- good_bid: This identifies the book within the Goodreads dataset.
- title: The title of the book.
- author: The author of the book.
These book identifiers enable the retrieval of additional attributes such as ratings, reviews, genres, and full text from their respective original datasets.
Distribution
The data file is typically provided in CSV format and comprises 5750 English books or records. The current version of this dataset is 1.0. Specific details regarding total file size in MB/GB are not available in the sources.
Usage
This dataset is ideally suited for a variety of applications and use cases, including:
- Developing and testing book recommendation systems.
- Natural Language Processing (NLP) tasks, leveraging the book titles and authors.
- Text mining and content analysis related to literature.
- Building classification models based on book metadata.
- Enabling research into literary trends and reader engagement.
Coverage
The dataset's scope is global, focusing on English language books. While it encompasses a substantial number of titles, specific time ranges or demographic breakdowns of the data are not detailed in the available information, nor are any particular notes on data availability for certain groups or years.
License
CC0
Who Can Use It
This dataset is particularly beneficial for:
- Data Scientists interested in developing and evaluating recommendation algorithms and text analysis.
- Machine Learning Engineers who build and deploy models for natural language processing or content classification.
- Researchers in fields such as literature, digital humanities, and library science, for exploring large-scale literary data.
- Developers creating applications that require rich book metadata and capabilities for literary insights.
Dataset Name Suggestions
- Gutenberg Goodreads Book Data
- Literary Recommendation Dataset
- Unified Book Metadata Collection
- GG Books Dataset
Attributes
Original Data Source: GG Books Dataset