Opendatabay APP

TagMyBook Collection Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Tv

Shows

Classification

Exploratory

Data

Analysis

Nlp

Regression

Feature

Engineering

Trusted By
Trusted by company1Trusted by company2Trusted by company3
TagMyBook Collection Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset focuses on book analysis, primarily for predicting a book's genre based on its synopsis [1]. It offers a resource for automating book classification, a task typically reliant on user votes on platforms like Goodreads [1]. Beyond genre prediction, the dataset facilitates the development of models to predict book ratings by examining attributes such as author followers and review counts [1]. It is also suitable for general data analysis, including exploring correlations between different features and creating word clouds for various genres [1]. The data was originally collected from the Goodreads website [1].

Columns

  • index column: A unique identifier for each record [2].
  • title: The name of the book [2].
  • rating: The average rating of the book, with a maximum rating of 5 [1, 2].
  • name: The name of the author [1, 2].
  • num_ratings: The number of users who have rated the book [1, 2].
  • num_reviews: The number of users who have reviewed the book [1, 2].
  • num_followers: The number of followers the author has [1, 2].
  • synopsis: A summary or description of the book [1, 2].
  • genre: The genre or type of the book [1, 2].

Distribution

The dataset is provided as a single CSV file named data.csv [2]. It contains 1,539 records [3, 4]. Book ratings range from 2.78 to 4.75, with the majority falling between 3.96 and 4.16 (561 records) [3]. The number of ratings (num_ratings) can be up to 1,538 [3]. Author follower counts range significantly, from approximately 2,281 to 8.74 million [5]. The number of reviews (num_reviews) varies from 132 to over 186,000 [5]. The dataset includes various genres such as thriller, fantasy, romance, horror, history, psychology, travel, science, sports, and science fiction [2]. Thriller accounts for 31% of the dataset, and fantasy for 23% [4].

Usage

This dataset is ideal for:
  • Natural Language Processing (NLP): Predicting the genre of a book using its synopsis [1].
  • Predictive Modelling: Forecasting book ratings based on factors like the number of ratings, reviews, and author followers [1].
  • Data Analysis: Investigating correlations between dataset attributes, generating word clouds for different genres, and exploring how review counts influence book ratings [1].

Coverage

The dataset's content is global, covering a wide array of books [6]. It includes books from various genres, scraped from the Goodreads website [1]. No specific time range or demographic scope is indicated beyond the nature of book data from a general readership platform.

License

CC0

Who Can Use It

  • Data Scientists and Machine Learning Engineers: To build and train models for book genre classification and rating prediction [1].
  • NLP Researchers: To experiment with text analysis techniques on book synopses [1].
  • Data Analysts: For exploratory data analysis, identifying trends, and understanding the relationships between book attributes, author popularity, and reader engagement [1].
  • Publishing Industry Professionals: Potentially for market research or content categorisation [Not in sources. General knowledge].

Dataset Name Suggestions

  • Book Genre & Rating Data
  • Goodreads Book Insights
  • Author Popularity Metrics
  • TagMyBook Collection

Attributes

Original Data Source: TagMyBook

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

24/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format