Hotels Use Machine Learning to Predict Which Guests Will Stand Up

Nowadays, big data has been applied in all walks of life, and the hotel industry is no exception. Making full use of big data enables hotels to predict changes in market demand, conduct intelligent decision-making analysis, and improve their operating conditions.
Nowadays, major OTA (Online Travel Agency) platforms have greatly facilitated people's travel. Hotel accommodation, attraction tickets, etc. can be easily booked with just a few clicks.

In order to attract more users to make reservations, these platforms will encourage merchants to set more relaxed cancellation policies, such as free cancellation at any time, or free cancellation within a limited time.
Booking.com, the world's largest online hotel booking site by room sales, is popular among travelers due to its free cancellation advantage.
However, "free cancellation" is very nice for users, but it is a headache for hotels. Temporary cancellation of orders usually causes the following losses to hotels:
- The cancelled rooms cannot be sold in time, and the hotel loses revenue;
- Hotels cut prices to sell cancelled rooms, reducing profits
- In order to book these rooms as quickly as possible, the hotel needs to incur additional costs for publicity and distribution channels;
Since users can stand up the hotel at any time, is there any way for the hotel to minimize the losses?
Manuel Banza, a Portuguese business analyst (BA, a position equivalent to a product manager in an IT company), has more than five years of experience in hotel management.heUsing publicly available data from European hotel booking platforms, we discovered the characteristics of users who are more likely to cancel orders, helping hotels to stop losses in a timely manner.
From nearly 120,000 hotel booking data, we found a pattern
As a data science enthusiast, Manuel Banza started with data science and machine learning.
He first 「Hotel booking demand dataset」The dataset contains 32 dimensions of data for ordinary hotels and resort hotels, including:
Information such as user nationality, booking time, length of stay, number of adults and children or infants, whether the order was ultimately cancelled, and the total number of times the user canceled orders before this order.
Hotel Booking Demand
Hotel booking demand dataset
Publishing Agency:University of Lisbon, Portugal
Quantity included:A total of 119390 data, 32 dimensions
Data format:csv Data size:16.9 MB (1.3 MB compressed)
address:https://orion.hyper.ai/datasets/14866

Through statistics, Manuel Banza found that a lot of users canceled their hotel orders in a year.
In 2018, 49.8% of users on the OTA platform Booking canceled their orders; on HRS Group, this proportion was as high as 66%. Overall, the average booking order cancellation rate of multiple platforms in 2018 reached 39.6%.

Next, the author conducted an exploratory analysis of the data and found the following:
- Compared with regular hotels and resort hotels, reservations are more likely to be cancelled by guests;
- The cancellation rate is higher during Spring Festival and summer, while the lowest rate is seen in winter;
- Among various booking channels, users place the most orders on OTA platforms, but the most orders are also canceled on OTA platforms.
- The earlier the user makes a reservation, the greater the uncertainty and the greater the probability of cancellation.
The author said that booking time is one of the most important indicators when analyzing hotel revenue performance. The analysis results show that the cancellation probability of bookings made more than one year in advance is the highest, at 57.14%; the cancellation probability of bookings made within a week is the lowest, at 7.73%.

Machine Learning Model: Predicting Who is Most Likely to Stand Up
After a thorough analysis of the dataset, the authors began building a model to predict order cancellations.
Step 1: Data Cleaning
First, the missing values in the dataset are handled. If the variable is numeric, then these missing values must be replaced with the mean of the feature; if the variable is categorical, then they must be replaced with a constant.
Then remove reservation_status (reservation status, which represents whether the order was cancelled, 0 for not cancelled and 1 for cancelled), because this is the value that the machine learning model will predict.
Step 2: Select the best model
Before starting to test the best algorithm for the data, split the dataset in a ratio of 8:2. After that, 80% of the data will be used to train the model and 20% of the data will be used as a validation set.
In the field of data science, predicting order cancellations is a supervised classification problem, also known as binary classification. Therefore,The author selected several existing binary classification models such as LightGBM, CatBoost, XGBoost and H2O, trained and compared them, and finally selected the CatBoost model with the best experimental results.
Through CatBoost prediction results, we found the following points:
- If the user's nationality is Portuguese, the possibility of cancellation is high. However, for group bookings, hotels generally do not get everyone's nationality information in advance. If the order is cancelled, most hotels will default their nationality to the country where the hotel is located. Therefore, this information is only for reference and may not be accurate;
- Users who did not make any special requests were more likely to cancel their orders than those who made at least one special request;
- The lower the lead_time (the number of days between booking time and check-in time) value, the lower the likelihood that the booking will be canceled (this prediction result is consistent with the previous data analysis results).

CatBoost model performance on the validation set:

Performance on the entire "hotel booking demand" dataset:

Hotel: Before you cancel, let me save some money
Using this predictive model, hotels can know in advance which users are likely to cancel their orders and take timely remedial measures.
For example, contact users who are more likely to cancel in advance, and through communication, encourage them to cancel as early as possible, leaving the hotel with more time to sell rooms.
Alternatively, you can contact users who are inclined to cancel, introduce the advantages of the hotel to them, and offer some stay rewards to turn the tide and retain them.

News Source:
https://www.linkedin.com/pulse/u-hotel-booking-cancellations-using-machine-learning-manuel-banza