TagMyBook Collection Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on book analysis, primarily for predicting a book's genre based on its synopsis [1]. It offers a resource for automating book classification, a task typically reliant on user votes on platforms like Goodreads [1]. Beyond genre prediction, the dataset facilitates the development of models to predict book ratings by examining attributes such as author followers and review counts [1]. It is also suitable for general data analysis, including exploring correlations between different features and creating word clouds for various genres [1]. The data was originally collected from the Goodreads website [1].
Columns
- index column: A unique identifier for each record [2].
- title: The name of the book [2].
- rating: The average rating of the book, with a maximum rating of 5 [1, 2].
- name: The name of the author [1, 2].
- num_ratings: The number of users who have rated the book [1, 2].
- num_reviews: The number of users who have reviewed the book [1, 2].
- num_followers: The number of followers the author has [1, 2].
- synopsis: A summary or description of the book [1, 2].
- genre: The genre or type of the book [1, 2].
Distribution
The dataset is provided as a single CSV file named
data.csv
[2]. It contains 1,539 records [3, 4].
Book ratings range from 2.78 to 4.75, with the majority falling between 3.96 and 4.16 (561 records) [3]. The number of ratings (num_ratings) can be up to 1,538 [3]. Author follower counts range significantly, from approximately 2,281 to 8.74 million [5]. The number of reviews (num_reviews) varies from 132 to over 186,000 [5].
The dataset includes various genres such as thriller, fantasy, romance, horror, history, psychology, travel, science, sports, and science fiction [2]. Thriller accounts for 31% of the dataset, and fantasy for 23% [4].Usage
This dataset is ideal for:
- Natural Language Processing (NLP): Predicting the genre of a book using its synopsis [1].
- Predictive Modelling: Forecasting book ratings based on factors like the number of ratings, reviews, and author followers [1].
- Data Analysis: Investigating correlations between dataset attributes, generating word clouds for different genres, and exploring how review counts influence book ratings [1].
Coverage
The dataset's content is global, covering a wide array of books [6]. It includes books from various genres, scraped from the Goodreads website [1]. No specific time range or demographic scope is indicated beyond the nature of book data from a general readership platform.
License
CC0
Who Can Use It
- Data Scientists and Machine Learning Engineers: To build and train models for book genre classification and rating prediction [1].
- NLP Researchers: To experiment with text analysis techniques on book synopses [1].
- Data Analysts: For exploratory data analysis, identifying trends, and understanding the relationships between book attributes, author popularity, and reader engagement [1].
- Publishing Industry Professionals: Potentially for market research or content categorisation [Not in sources. General knowledge].
Dataset Name Suggestions
- Book Genre & Rating Data
- Goodreads Book Insights
- Author Popularity Metrics
- TagMyBook Collection
Attributes
Original Data Source: TagMyBook