Opendatabay APP

Hotel Stay Prediction Dataset

Natural Language Processing

Tags and Keywords

Hotel

Reservation

Cancellation

Prediction

Customer

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Hotel Stay Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to help predict whether a customer will cancel a hotel reservation. In the modern hospitality industry, online booking platforms have revolutionised how customers make reservations. However, this convenience also leads to a significant number of cancellations or no-shows, often due to changes in plans or scheduling conflicts. While free or low-cost cancellation is beneficial for guests, it presents a challenge for hotels, potentially diminishing revenue. This dataset provides the necessary information to build models that can forecast cancellation likelihood, assisting hotels in managing their bookings and optimising revenue.

Columns

  • Booking_ID: A unique identifier for each hotel booking. (36,275 unique values)
  • no_of_adults: The number of adults included in the reservation. (Mean: 1.84, Std. Deviation: 0.52)
  • no_of_children: The number of children included in the reservation. (Mean: 0.11, Std. Deviation: 0.4)
  • no_of_weekend_nights: The number of weekend nights (Saturday or Sunday) the guest planned to stay. (Mean: 0.81, Std. Deviation: 0.87)
  • no_of_week_nights: The number of week nights (Monday to Friday) the guest planned to stay. (Mean: 2.2, Std. Deviation: 1.41)
  • type_of_meal_plan: Specifies the type of meal plan selected by the customer. (e.g., Meal Plan 1: 77%, Not Selected: 14%)
  • required_car_parking_space: Indicates if the customer requires a car parking space (0 for no, 1 for yes). (Mean: 0.03)
  • room_type_reserved: The specific room type reserved by the customer. (e.g., Room_Type 1: 78%, Room_Type 4: 17%)
  • lead_time: The number of days between the booking date and the arrival date. (Mean: 85.2, Std. Deviation: 85.9, Max: 443 days)
  • arrival_year: The year of the arrival date for the reservation. (Primarily 2017 and 2018)
  • arrival_month: The month of the arrival date for the reservation. (Mean: 7.42, Std. Deviation: 3.07)
  • arrival_date: The day of the month for the arrival date. (Mean: 15.6, Std. Deviation: 8.74)
  • market_segment_type: The market segment through which the booking was made. (e.g., Online: 64%, Offline: 29%)
  • repeated_guest: A flag indicating if the customer is a repeat guest (0 for no, 1 for yes). (Mean: 0.03)
  • no_of_previous_cancellations: The count of prior bookings cancelled by the customer. (Mean: 0.02)
  • no_of_previous_bookings_not_canceled: The count of prior bookings not cancelled by the customer. (Mean: 0.15)
  • avg_price_per_room: The average price per day for the reservation, considering dynamic pricing. (Mean: 103, Std. Deviation: 35.1, Max: 540)
  • no_of_special_requests: The total number of special requests made by the customer. (Mean: 0.62, Std. Deviation: 0.79, Max: 5 requests)
  • booking_status: The target variable, indicating if the booking was Canceled (33%) or Not_Canceled (67%).

Distribution

The dataset is provided in CSV format. It contains 36,275 records and consists of 19 columns. The file size is 3.24 MB. Specific numeric distributions for rows/records are available for each column, indicating no missing values.

Usage

This dataset is ideal for developing and evaluating predictive models, specifically for binary classification tasks. It can be used to predict the likelihood of a hotel reservation cancellation. Potential applications include:
  • Building machine learning models to forecast cancellation rates.
  • Identifying key factors influencing customer cancellation behaviour.
  • Optimising hotel revenue management strategies by predicting no-shows.
  • Developing dynamic pricing or overbooking strategies.

Coverage

The dataset covers hotel reservation data with arrival dates spanning the years 2017 and 2018. Geographic and specific demographic scopes beyond basic adult/children counts are not detailed within the dataset.

License

Attribution 4.0 International (CC BY 4.0).

Who Can Use It

This dataset is suitable for a wide range of users interested in predictive analytics and hospitality. It is particularly valuable for:
  • Data Scientists and Machine Learning Engineers: For building and testing classification models to predict hotel booking cancellations.
  • Business Analysts: To gain insights into customer booking behaviour and cancellation patterns.
  • Hotel Management and Revenue Managers: To inform operational decisions, such as overbooking, staffing, and marketing strategies aimed at reducing cancellations.
  • Students and Researchers: For academic projects in data science, predictive modelling, and business analytics.

Dataset Name Suggestions

  • Hotel Booking Cancellation Prediction
  • Customer Reservation Status Forecast
  • Hotel Stay Prediction Dataset
  • Reservation Cancellation Analytics

Attributes

Original Data Source: Hotel Stay Prediction Dataset

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format