2022 Romance Book Review Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a collection of book reviews for the Goodreads Best Romances of 2022 shortlist. It was created through web scraping using BeautifulSoup, initially as a project to learn data collection techniques. The data includes details about the books themselves, such as title and author, alongside individual review information, including rating and the review text. It serves as a valuable resource for various analytical tasks.
Columns
- book_id: A numeric identifier for each book, ranging from 0 to 19.
- book_title: The title of the romance book.
- book_author: The author of the romance book.
- review_id: A numeric identifier for each individual review, ranging from 0 to 599.
- review_date: The date when the review was published.
- review_writer: The screen name of the person who wrote the review.
- rating: The rating given to the book by the reviewer, on a scale of 1 to 5.
- review_text: The textual content of the review, which has been cleaned and stripped.
Distribution
The dataset is typically provided in a CSV file format. It contains approximately 600 unique reviews, collected from 20 distinct romance books. Each book contributed the first 30 reviews available. Specific numbers for rows or records beyond these counts are not detailed in the sources.
Usage
This dataset is ideal for a variety of applications, including:
- Data Cleaning Projects: Practising data preprocessing and normalisation techniques.
- Sentiment Analysis: Analysing public sentiment towards romance novels and authors.
- Natural Language Processing (NLP): Developing models for text classification, entity recognition, or language understanding.
- Educational Purposes: A practical resource for students learning web scraping with BeautifulSoup and general data analysis.
- Market Research: Gaining insights into reader preferences and popular trends within the romance genre.
Coverage
The dataset's geographic scope is global, as it is based on publicly available Goodreads data. The reviews included span a time range from 29th February 2020 to 2nd April 2023, specifically focusing on books from the "Best Romances of 2022" shortlist. There are no specific notes on data availability for particular demographic groups or years other than the mentioned time frame.
License
CC0
Who Can Use It
- Data Scientists: For developing and testing machine learning models, particularly in NLP and sentiment analysis.
- Students and Educators: As a practical case study for courses on data science, web scraping, and text analytics.
- Researchers: In fields such as literature, digital humanities, and consumer behaviour to study reading trends and reception.
- Content Creators and Marketers: To understand reader engagement and identify popular themes or authors in the romance genre.
Dataset Name Suggestions
- Goodreads 2022 Romance Reviews
- Best Romance Books 2022 Review Data
- Goodreads Romance Novel Reviews
- 2022 Romance Book Review Dataset
- Goodreads Top Romance Reviews
Attributes
Original Data Source: Goodreads Best Romance 2022