Opendatabay APP

Book Genre Classification Dataset

Education & Learning Analytics

Tags and Keywords

Text

Intermediate

Nlp

Synopsis

Books

Genre

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Book Genre Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for predicting the genre of a book based solely on its synopsis. It serves as a valuable resource for developing and evaluating Natural Language Processing (NLP) and Artificial Intelligence (AI) / Machine Learning (ML) models. The dataset is a subset of a previously created collection, but it contains a larger number of books and is intended for future expansion. Its primary purpose is to facilitate research and development in the field of book genre classification.

Columns

  • index: An identifier for each entry in the dataset.
  • title: The title of the respective book.
  • genre: The assigned genre of the book. The dataset includes a variety of genres, with thriller making up 22%, fantasy 19%, and other genres accounting for 59% of the unique values.
  • summary: The synopsis or brief overview of the book's content, which is the key feature for genre prediction.

Distribution

The data is provided in a single CSV file named data.csv. It contains 4656 unique index values, indicating 4656 records or books. The genre column features 4542 unique values, highlighting the diversity within the genre categories. The dataset is offered as a free resource.

Usage

This dataset is ideal for a range of applications, particularly in Education & Learning Analytics and the broader AI & ML Data domain. Specific use cases include:
  • Training machine learning models to automatically classify book genres.
  • Developing NLP algorithms for text analysis and understanding book synopses.
  • Researching and experimenting with different text classification techniques.
  • Creating recommendation systems based on genre prediction.

Coverage

The dataset's coverage is considered global. It is listed as Version 1.0 and was listed on 8th June 2025. While it originates from a larger dataset, this version specifically focuses on the book's synopsis for genre prediction and has an increased number of books compared to its predecessor.

License

CCO

Who Can Use It

This dataset is particularly useful for:
  • Data scientists and machine learning engineers working on text classification problems.
  • Researchers in natural language processing and artificial intelligence.
  • Students and academics engaged in educational projects related to data analytics and NLP.
  • Anyone interested in building or testing models for book genre prediction.

Dataset Name Suggestions

  • Book Genre Classification Dataset
  • Literary Synopsis Genre Predictor
  • Book Text Genre Dataset
  • Book Summary Genre Predictor

Attributes

Original Data Source: Book Genre Prediction

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free