Opendatabay APP

Hotel Reservation Cancellation Risk Data

Data Science and Analytics

Tags and Keywords

Cancellation

Hotel

Reservation

Revenue

Tourism

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Hotel Reservation Cancellation Risk Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Reservation data provides detailed insight into customer booking behavior and the dynamics of cancellation within the hospitality industry. This resource is engineered to support financial and operational analysis, specifically addressing the risk associated with variable payment contracts linked to reservations. The analysis focuses on calculating the financial implications of non-materialized bookings and the high cost (estimated at 120 USD extra) incurred by last-minute cancellations. It allows analysts to determine key drivers of reservation retention and abandonment, aiding strategic decision-making in contract negotiation and marketing efforts.

Columns

The dataset includes 31 attributes detailing the booking process and outcomes:
  • hotel: Type of hotel chain, identifying properties as either Resort or City Hotel.
  • is_canceled: Binary field indicating if the booking was canceled (1) or confirmed (0). The mean cancellation rate is 37%.
  • lead_time: The duration, measured in days, between the reservation date and the scheduled arrival date.
  • arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month: Components detailing the intended arrival date.
  • stays_in_weekend_nights: The count of weekend nights (Saturday or Sunday) booked by the guest.
  • stays_in_week_nights: The count of weekday nights (Monday to Friday) booked by the guest.
  • adults, children, babies: Numerical counts of guests across different age categories for the booking.
  • meal: Specifies the type of meal package reserved (e.g., BB, HB).
  • country: Guest's country of origin, presented using ISO 3155-3:2013 codes. Portugal (PRT) accounts for 41% of records.
  • market_segment: Designation of the market segment (e.g., Online Travel Agency, Offline TA/TO).
  • distribution_channel: The channel through which the reservation was distributed.
  • is_repeated_guest: Indicates if the guest is a recurring customer (1) or a new customer (0).
  • previous_cancellations: The number of prior reservations canceled by the customer.
  • previous_bookings_not_canceled: The number of prior reservations confirmed by the customer.
  • reserved_room_type, assigned_room_type: Codes representing the type of room reserved and the type of room actually assigned.
  • booking_changes: The total number of modifications made to the reservation since its creation.
  • agent, company: Identification numbers for the travel agent or the company responsible for payment.
  • days_in_waiting_list: The duration, in days, the booking spent on a waiting list before being confirmed.
  • customer_type: Category of the reservation (e.g., Transient, Group, Contract).
  • adr: Average Daily Rate, calculated by dividing the sum of all accommodation transactions by the total number of nights stayed.
  • required_car_parking_spaces: The number of parking spaces requested by the client.
  • total_of_special_requests: The total number of special requests made (e.g., specific bed type or floor).
  • reservation_status: The final status of the booking, categorized as Canceled, Check-Out, or No-Show.
  • reservation_status_date: The date when the reservation status was last updated.

Distribution

The data is contained within a CSV file named hotel_bookings.csv, with a file size of 15.39 MB. It consists of 31 fields and covers approximately 119,000 unique records. The data quality is high, with all columns being valid across the dataset.

Usage

This dataset is engineered for several key applications:
  • Financial Modelling: Calculating estimated annual savings if business contracts were renegotiated to link payments only to materialized (non-canceled) reservations.
  • Risk Assessment: Testing strategic hypotheses, such as whether longer lead times, lower Average Daily Rates (adr), or the absence of special requests correlate with a higher probability of cancellation.
  • Data Development: Practicing fundamental SQL commands (SELECT, FROM, WHERE, GROUP BY, COUNT, AVG) using database environments like Google BigQuery.
  • Business Intelligence: Creating visual reports and dashboards using tools like Power BI to communicate actionable findings to hotel management.

Coverage

The data covers hotel reservations made across three full years: 2015, 2016, and 2017. Geographically, it includes guests originating from 178 unique countries worldwide. Portugal (PRT) is the most frequent country of origin, accounting for 41% of the reservations. Demographic scope includes detailed guest counts for adults, children, and babies.

License

CC0: Public Domain

Who Can Use It

  • Data Analysts and Scientists: For building predictive models focused on guest churn and booking reliability.
  • Business Executives and Financial Managers: To understand revenue risk, evaluate contract profitability, and derive actionable business decisions regarding operational strategy.
  • Students and Trainees: For learning core data management skills, including data cleaning, filtering, summarizing, and visualization using data analysis and BI tools.

Dataset Name Suggestions

  • Hotel Reservation Cancellation Risk Data
  • Global Hospitality Booking Analysis (2015–2017)
  • Predicting Reservation Reliability
  • Hotel Contract and Cancellation Metrics

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

13/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format