Opendatabay APP

Gutenberg Goodreads Book Dataset

Data Science and Analytics

Tags and Keywords

Nlp

Text

Mining

Recommender

Systems

Classification

Pre-processing

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Gutenberg Goodreads Book Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to support book recommendation tasks, drawing its content from a combination of the Goodreads and Gutenberg datasets. It contains key information for 5750 English books, providing a foundation for building systems that utilise both user ratings (from Goodreads) and book text (from Gutenberg) to generate recommendations. It is a valuable resource for projects focused on literary analysis and recommendation engines.

Columns

The dataset features four primary columns:
  • guten_bid: This identifies the book within the Gutenberg dataset.
  • good_bid: This identifies the book within the Goodreads dataset.
  • title: The title of the book.
  • author: The author of the book.
These book identifiers enable the retrieval of additional attributes such as ratings, reviews, genres, and full text from their respective original datasets.

Distribution

The data file is typically provided in CSV format and comprises 5750 English books or records. The current version of this dataset is 1.0. Specific details regarding total file size in MB/GB are not available in the sources.

Usage

This dataset is ideally suited for a variety of applications and use cases, including:
  • Developing and testing book recommendation systems.
  • Natural Language Processing (NLP) tasks, leveraging the book titles and authors.
  • Text mining and content analysis related to literature.
  • Building classification models based on book metadata.
  • Enabling research into literary trends and reader engagement.

Coverage

The dataset's scope is global, focusing on English language books. While it encompasses a substantial number of titles, specific time ranges or demographic breakdowns of the data are not detailed in the available information, nor are any particular notes on data availability for certain groups or years.

License

CC0

Who Can Use It

This dataset is particularly beneficial for:
  • Data Scientists interested in developing and evaluating recommendation algorithms and text analysis.
  • Machine Learning Engineers who build and deploy models for natural language processing or content classification.
  • Researchers in fields such as literature, digital humanities, and library science, for exploring large-scale literary data.
  • Developers creating applications that require rich book metadata and capabilities for literary insights.

Dataset Name Suggestions

  • Gutenberg Goodreads Book Data
  • Literary Recommendation Dataset
  • Unified Book Metadata Collection
  • GG Books Dataset

Attributes

Original Data Source: GG Books Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format