Application of Genetic Algorithms to Optimize a Truncated Mean k ...

Application of Genetic Algorithms to Optimize a Truncated Mean k -Nearest Neighbours Regressor for Hotel Reservation Forecasting Andrés Sanz-García, Julio Fernández-Ceniceros, Fernando Antoñanzas-Torres, and F. Javier Martínez-de-Pisón-Ascacibar EDMANS Research Group, University of La Rioja, Logroño, Spain http://www.mineriadatos.com

Abstract. Progress in information technologies and their broad social expansion have led to the appearance of websites specialized in online hotel booking. These sites offer new approaches for customers that have demonstrated a strong tendency towards making last minute reservations. This scenario has dramatically affected the task of predicting hotel bookings, making these estimations is now much more complex using the traditional forecasting models. Given the importance of this estimation, it is crucial to find more accurate prediction models that take into account these new situations. This work aims to develop an application to predict hotel room reservations that tackles the consequences of last-minute reservations. Our proposal combines genetic algorithms optimization and truncated mean k-nearest neighbours regressor. After the analysis, we conclude that our method shows a significant improvement regarding working with historical booking information when compared with classical models. Therefore, the results of the study illustrate that our application will enable the development of useful prediction demand calendars. Keywords: Genetic Algorithm, Optimization, k -Nearest Neighbours, Hotel Booking Forecasting, Decision Support System.

1

Introduction

In recent years, hotel guest-booking behaviour has evolved into a more complex pattern, primarily caused by an amplified range of new online booking services. These services has made clear hotel guests tend more towards last minute reservations [17]. Moreover, potential guests usually compare room prices within the same online services from different hotels to reduce costs. This novelty behaviour generates a significant impact in hotel pricing strategies, as well as in customer’s choices that need to be considered for the forecasting process [2]. For this reason, hotel managers need to be able to forecast hotel arrivals more accurately in order to adequately estimate room charges. Accurate arrivals forecasting is necessary since hotel managers maximize the profits by adjusting the room prices according to the predicted demand. Nowadays, a popular trend in the hotel industry E. Corchado et al. (Eds.): HAIS 2012, Part I, LNCS 7208, pp. 79–90, 2012. © Springer-Verlag Berlin Heidelberg 2012

80

A. Sanz-García et al.

is to implement a data management system, called revenue management (RM) system, for supporting decisions in room pricing policy. RM is a growing area within the information technology development, which focuses on how companies should adjust prices to maximize their profitability. The usage of RM systems is increasing amongst hotel managers, who have a great interest in pricing and revenue optimization. It should be noted that a prediction model lays at the heart of the RM model, and the accuracy of this model is crucial to the forecast´s success [8]. The availability of historical databases allows a RM system to be implemented using data management systems for improving hotel booking forecasting. Only a few studies focus on hotel reservation forecasting analysis. Previously, this methodology has been successfully applied to solve similar problems in other fields like the airline industry [13] Regarding hotel booking forecasting, Weatherford and Kimes [18] presented a detailed description using traditional methods for trying to solve this problem in 2003. The first attempt at forecasting consisted of clustering raw data into disjointed customer groups with the assumption of mutual independence. Then a prediction model was used to analyze each group. Other proposals for booking estimations employed Holt-Winters method [15] and Monte Carlo simulations [21]. In addition, other studies in forecasting prove that prediction accuracy can also be improved by dynamically updating the model as soon as new information is available [19]. In particular, Haensel et al. [8] adopted a method consisting of a dimension reduction and a penalizing least squares procedure for hotel booking prediction in order to reduce computation time. In this article we propose the use of a different methodology related to soft computing (SC) instead of classical forecasting methods. SC consists of the use of computational techniques and intelligent systems for solving inexact and complex problems [16]. It involves different computational techniques such as neural networks (NN) and fuzzy sets or genetic algorithms (GAs). These approaches are all stochastic and therefore suited to investigate many real world problems [5]. In particular, GAs, a promising SC technique that has emerged in recent years can provide an efficient multivariable optimization compared to classical exhaustive search method. GAs are inspired by the law of nature [3] and can solve optimization problems through the principles of biological evolution [4]. These processes are also known as survival of the ‘fittest’, sexual reproduction and mutation among others [11]. Our suggested method was mainly formed by an instance based learning (IBL) algorithm like k -nearest neighbours (k -NN). The optimization of the model´s parameters and the input selection were based on genetic algorithms (GAs). We have avidly studied the optimization of a variable selection with GAs in combination with k-nearest neighbours (k -NN), an IBL algorithm, for the hotel booking prediction within a complete year. To improve results, k -NN algorithm uses a truncated mean into the regression analysis. This is one of the simplest machine learning (ML) algorithms for regression, whose core is assigning the truncated average of the values of its k nearest neighbours for each new

GAs to Optimize k-NN Regressor for Hotel Reservation Forecasting

81

instance. Moreover, neighbours could be weighted in such a way so that the nearest elements contributes more to the weighted average. In our case, a specific prediction model for a hotel has been developed taking into account historical booking data and lists of local, regional and national festivities. We were able to work with real data from a Spanish hotel provided by Hoteloptimizer (http://www.hoteloptimizer.com), a new and dynamic company developing hotel RM systems. The experimental database was generated from the historical booking information of a Spanish hotel, taking into account relevant annual dates as well as other important information for a period of three years. Data were collected on a daily basis, providing useful information regarding the following characteristics: day of the week, day of the month, season, celebrations or festivals and so on. Furthermore, experiments were conducted to evaluate our proposed model comparing with four other classical methods. We have presented our results are presented by creating in a demand calendar for six months.

2 2.1

Algorithms and Methods for Regression Tasks Truncated Mean k -NN Method

K -NN is an IBL algorithm, and for this reason it does not induce rules, decision trees, or other types of abstractions. K -NN incrementally derives its concept descriptions from a sequence of training instances without training time [1]. Furthermore, this method has been described as the most suitable to solve highly dimensional problems. The distance usually used to determine the similarity metric between the objects is the Euclidean. However the method also permits the use of other methods like Chebyshev, Manhattan, and Mahalanobis [7]. In classification tasks, k -NN uses majority vote among the classification of the K objects and it does not take into account assumptions on the distribution of predicting variables during the learning process [6]. Similiarly, k -NN uses the weighted mean of the outcomes of its k nearest neighbours in regression problems. In this work, a truncated mean is used to obtain robust estimations. The truncated mean consists of obtaining the mean of the nearest values inside the inter-quantile range defined by q and 1 − q. Below are found the steps to predict variables using the proposed k -NN are listed: 1. Calculate the distance between training samples and the new instance. 2. Sort the k -th minimum distances to find the k nearest neighbours and determine their outputs yk . 3. The output of the new instance is found computing the truncated mean of the k output values yk . The main advantage of k -NN is the effectiveness in situations where the training dataset is large and biased. 2.2

Other Machine Learning Algorithms

Through a list of models we have collated several ML algorithms, which we compare with our proposal:

82


Hotel Booking Dataset Local Festivities (mL feat.) Regional Festiv. (mR feat.) National Festiv. (mS feat.) Month

XN1 x mL

XN2 x mR

LF Selection Binary Array

α1

RF Selection Binary Array

α2

XN1,sel x m'L

XN3 x mS

Week of Day of month week

Room Booking

X4 X5 X6 (yn)t

NF Selection Binary Array

α3

XN2,sel x m'R

XN3,sel x m'S

wi , i=1,..,6 k=1,..,N

xi

W(X) =

w1·d(X1,sel) + w2·d(X2,sel) + w3·d(X3,sel) + w4·d(X4) + w5·d(X5) + w6·d(X6) 6 q

yi , i=1,..,k Truncated Q-quantile mean Q(yi,q) = ŷn

Fig. 1. Scheme combining variable selection, k-NN and trucated Q-quantile mean

– M5P algorithm [14]. This algorithm uses a so-called separate and conquer strategy to create a model tree in which each leaf is a linear regression model. – Multilayer perceptron (MLP) [9]. MLP model can be considered as a highly adaptive nonlinear mathematical approximator. Its structure is an example of feedforward neural network where the activation of each hidden node (usually related to a sigmoid function) depends on the linear combination of its inputs. In case of numerical prediction, outer neurons are formed by linear units. The number of hidden neurons is not fixed and it defines the complexity of the model. Varying the number of neurons in the hidden layer or the input variables, the generalization performance of the MLP can be controlled. – Linear regression (LINREG) [20]. Classical linear prediction model that uses a greedy method for variable selection based on Akaike information criterion. – Least median squared (LMSQ) linear regression [12]. Based on linear regression, the method creates firstly a set of least squared regression functions using random subsamples of the training data. After that, the function with the lowest median squared error is selected as the resulting model.

3

Optimizing k -NN Regression Model with GAs

We introduce a method that mainly utilizes k -NN method for regression, with a previous variable selection to forecast hotel guest bookings for each day. It is well known the k-NN itself is a good approximator for time series. In this article, k -nearest days are selected for each day in such a way that the bias is minimized.


83

As mentioned before, in order to generate an accurate output, it is necessary to optimize the value of the number of the k nearest neighbours. We have to select a subset of relevant variables and choose the distance metric d(p, q). In this article, the optimization method based on GAs is selected as an effective technique, due to its ability in defining the best variables and their weighted array wi . Additionally, the aforementioned method will provide an optimal k and q which in turn will improve accuracy. As a result (see Figure 1), the optimization scheme obtains the optimal values of the k-NN regressor through the following steps: 1. First, variables are normalized between 0 and 1 so X i ≡ X ij, k [0, 1] where j = 1, . . . , mL , k = 1, . . . , N , and i = 1, . . . , 6. 2. Selection of the most important variables: (a) Matrix for local festivities in the city or in nearby cities X 1N ×mL where mL is the number of initial local festivities and N = 1095 days. (b) Matrix for regional festivities (also in nearby regions) X 2N ×mR where mR is the number of initial regional festivities. (c) Matrix for national festivities (only in Spain) X 3N ×mS where mS is the number of initial national festivities. (d) Arrays of characteristics of one date in the calendar: Month of the year X 4N ×1 , week of the month X 5N ×1 , and day of the week X 6N ×1 . t

3. Usage of binary arrays X i,sel = X i × (αi ) where αi , i = 1, . . . , 3 to reduce the initial selection in step 2 for the festivity matrix. 4. All matrices and column arrays are multiplied by their corresponding weights wi , i = 1, . . . , 6. 5. A weighted Euclidean distance matrix W is created with the calculation of the Euclidean distances d (p, q) between all dates. 3 6 i,sel + j=4 wj · d X j i=1 wi · d X (1) W = 6 6. Determine the similar days and mean results with an inverse distance weighted average. The number of room bookings for k -nearest days are selected for each date from the Euclidean distance matrix. In order to obtain robust values, the average room booking is calculated using the truncated mean of the k -values that falls into the range limited by quantile q and (1 − q). 3.1

Evolutionary Optimization Process Using a GA

A GA was used to optimize the parameters of the proposed model and to select the input variables of each model. Specifically, the parameters included into the genetic code are: number of nearest neighbours k, value of quantile q , weighted coefficients wn , and binary arrays for input selection αi where i = 1, . . . , 3 (see Figure 2). Before beginning this process, 50 individual algorithm’s parameters are initialized with random values. For each individual parameter, training

84

A. Sanz-García et al. w1 (6 bits)

0

1

w3 (6 bits)

w2 (6 bits)

1 0

1

1 1

w4 (6 bits)

0

0 0

1

Local Fest(mL bits) Regional Fest(mR bits) National Fest(mS bits)

1 α1

0

L.F. Selection Binary Array

1 0 α2

1

R.F. Selection Binary Array

1 1 α3

w5 (6 bits)

0

k neighbours (5 bits)

1

0

1 0

0

1 0

0 0

w6 (6 bits)

q quantile (5 bits)

1 0

0

N.F. Selection Binary Array

Fig. 2. Binary-coded chromosome for optimization process

data is randomly selected from the 70 % of the database. The other 30% of the database (validation data) is used to calculate the validated error between predictions and real room reservations. The data showed a highly skewed dependent variable (Figure 3). Based on Liu and Chawla studies [10], we defined a quadratic mean which measures the magnitude of varyring quantities. This one corresponds with the square root of the arithmetic mean of the averaged squares of each group of elements. In particular, we redefine the following quadratic mean validation error function, named RMSE10, as: n n2 n10 1 2 2 2 i=1 e1 (i) i=1 e2 (i) i=1 e10 (i) + + ... + n1 n2 n10 (2) RM SE10 = 10 where e1 (i) is the error for the n1 normalized target values included within the range [0, 0.1], e2 (i) is the errors which target values n2 belong to the range (0.1, 0.2], ..., and so on. Then, for each day of the validation database, k-nearest days are selected using the training matrix. In each case, the process is reproduced several times (10) with different training and validation data. Finally, the fitness function J to minimize is the following: 10 i=1 RM SE10i (3) min (J) = min 10 where RM SE10i is the quadratic root validation mean squared error of the i-th process. Those individuals in generation 0 with lowest fitness function are selected to be the parents for the next generation (generation 1). So, the aforementioned generation is built as follows: – 20 % comprising the best individuals from the previous generation. These are the parents for the next generation. – 70 % comprising individuals obtained by crossovers from parents. The crossover process involves changing various digits in the chromosomes of the variables to be modified. These chromosomes are constructed through digits of the variables removing decimal points and creating a single set.

85


Fig. 3. Histogram of the target variable: room reservations demand

●

18

●

17

●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

16

RMSE

●●●●●● ●●

●●●●●●●●●●

●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●

15

●

●●●● ●●●●●●

14

●● ●● ●● ●● ●● ●● ●● ●●●●●●●●●●●●●● ●

●●●●●● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

13

●●●●●●●●●●●●●●

1

5

9 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98

104 110 116 122 128

Generation

Fig. 4. Evolution of RMSE values for the 10 best individuals of generations 1 to 130: light grey boxes are validation RMSE values, dark grey boxes are testing RMSE values

– The remaining 10 % is obtained by the process of mutation. This mechanism follows into the creation of random chromosomes within the established ranges. The aim of such course of action is to find new solutions in an unexplored space. This process is repeated over several generations until the fitness function of the best individual is observed to remain constant or suffers in absence of a significant variation from one generation to other.

4

Experiments and Results

The database was created from a historical reservation data from a Spanish hotel. The first step was analysing significant variables (macro-economic, social patterns, meteorological state, annual festivities). These concepts were

86


Table 1. Validation errors: RMSE and MAE Algorithm

RM SE mean RM SE sd MAE mean MAE sd time

M5P IBk(k = 8) LIN REG MLP (n = 15) LM SQ

0.191 0.202 0.254 0.274 0.298

0.002 0.001 0.001 0.008 0.001

0.142 0.152 0.202 0.219 0.206

0.001 0.001 0.001 0.013 0.000

43.15 0.022 1.993 134.9 267.2

obtained from different sources such as: National Institute of Statistics of Spain (INE), meteorological databases and other data sources. The next step involved experts trained to select the most important variables. In this case of study, 119 attributes were selected from the following sources: historical room booking dataset, annual parameters database and set of indicators originated from one specific Spanish region. Moreover, several scatterplots were used to identify the correlation between the aforementioned variables and the hotel overnight attributes from different Spanish regions. This mechanism was conducted in order to reduce the list of initially important variables according to the analysed correlation. After the selection of the most significant variables, we proceeded to engage in a thorough quest for a methodology which can develop useful models. These models in turn would allow us to achieve a calendar forecast for a specific hotel. Finally, the design of the prediction models was structured in 22 input attributes, which define the main characteristics of each day (month, day of the week, season, festivities in closely regions or cities, festivities in Spain, etc.). Additionally, these prediction models will generate an output variable with the number of reservations for each day. All models were created and validated using historical reservation data as a training-validating dataset from the years 2007 and 2009 inclusive. Data before 2007 was not considered in this study because of the different economic situation in Spain. Final testing dataset was created with the reservation data between January and July 2010. 4.1

Experiments Using Classical Machine Learning Techniques

The models listed in Section 2 were trained using 70 % of random sampled data, and remaining data (30 %) were employed to validate each model. The results of the training and validation process were measured along the 10 training/testing runs. Table 1 displayed the computation time, mean and standard deviation (SD ) of validation root mean squared validation error (RMSE ) and validation mean absolute validation error (MAE ). The summary of the validation errors is arranged by RM SE mean . As observed, the best algorithm in terms of RMSE was M5P tree with a validation RM SEmean value of 19.1 % and M AEmean value of 14.2 %. By contrast,


87

Table 2. RMSE and MAE using data from January, 2010 to July, 2010 Algorithm

MAE RM SE

k − N N + GA M5P IBk(k = 8) IBk(k = 10)

0.100 0.131 0.141 0.143

0.157 0.164 0.180 0.180

Table 2 shows the results obtained after testing process. It can be said that the best algorithm was not the same as in validation process. The proposed model in this work achieved the minimum error with a testing RM SE value of 15.7 % and M AE value of 10 %. This results are detailed in the next paragraph. Experiments Using k -NN Method Optimized with GA

4.2

60

Taking 300 generations calculated during 50 days, the minimum J achieved with validation database was 15.20 and the minimum RMSE 15.69 . In Figure 4 we observe the evolution of J and testing RMSE value for 10 best individuals from generation 1 to 130. Figure 6 shows evolution of test RMSE of the best individuals with different fitness functions. These functions were MAE, RMSE, relative root squared error (RRSE ), and quadratic mean error proposed (RMSE10 ). RMSE10 clearly shows superior performance. Moreover, Figure 5 compares the predicted values with real reservations for the test database (first seven months of the year 2010). In this respective case, the plot shows a good trend due to its a proximity towards the diagonal line. Furthermore, the testing RMSE obtained with the dataset of the proposed model is RM SE = 15.69 while the testing RMSE of M5P model is RM SE = 16.40.

●

●

60

●

Test RMSE=15.69

●

50

●●

Test RMSE=16.40

30 20 0

10

●

0

0

● ● ●● ●● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●

● ●● ● ● ●● ● ● ●● ● ●

● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ●

40

● ● ●

● ● ●● ●● ●●●

● ● ●

30

●

20

●

Predicted room booking

40

●●

●●

10

Predicted room booking

●● ●● ●

●

50

● ●●● ● ●

10

20

30

40

Real room booking

(a)

50

60

● ●

● ●● ● ● ● ● ●

● ●● ● ● ● ● ● ● ●●● ● ● ●●● ●●●● ●● ●● ●●● ● ● ●●● ● ● ●● ●●

10

● ●● ●●●● ●● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ●●●● ● ● ● ● ● ●●● ●

20

● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ●

30

● ●

40

50

Real room booking

(b)

Fig. 5. Ordered plot between real and predicted room booking: a) Proposed k -NN model optimized with GA with variable selection; b) Quinlan’s M5P model


18

19

88

MAE

17

RRSE

16

TEST RMSE

RMSE

15

RMSE10

0

50

100

150

200

250

300

Generation

Fig. 6. Evolution of test errors using different measures as fitness function J

May

April L

M

X

J

V

S

D

1(58)[0]

2(59)[3]

3(56)[4]

4(5)[9]

9(39)[36] 10(39)[31]

L

M

X

J 6(14)[10]

V

D 2(4)[4]

8(29)[1]

9(26)[26]

5(9)[5]

6(12)[6]

7(13)[10]

8(10)[2]

11(4)[0]

3(9)[25]

4(6)[2]

5(12)[6]

12(8)[0]

13(12)[8]

14(16)[4]

15(10)[0] 16(39)[14]

17(39)[5]

18(6)[2]

10(18)[16]

11(7)[5]

12(6)[22] 13(15)[14] 14(22)[32] 15(44)[10]

16(7)[2]

19(10)[2]

20(12)[2]

21(13)[7]

22(10)[2] 23(39)[13] 24(43)[1]

25(5)[11]

17(9)[7]

18(6)[4]

19(11)[9] 20(13)[12] 21(19)[0] 22(44)[17]

23(8)[4]

26(9)[6]

27(12)[8] 28(28)[26] 29(10)[2] 30(39)[29]

24(10)[2]

25(6)[3]

26(20)[14] 27(16)[0]

7(18)[1]

S 1(51)[5]

28(16)[0] 29(44)[18]

30(7)[2]

31(6)[0]

June M

X

J

V

S

D

1(16)[4]

2(14)[0]

3(12)[4]

4(36)[20]

5(54)[4]

6(8)[1]

8(14)[8]

9(12)[8]

14(12)[10] 15(14)[9]

16(12)[6]

L 7(13)[2]

21(10)[3] 22(14)[16] 23(15)[3] 28(14)[12] 29(18)[7]

10(12)[10] 11(18)[2] 12(34)[16] 13(11)[10] 17(12)[6] 18(34)[25]

19(54)[4]

20(9)[4]

24(15)[3]

26(54)[8]

27(16)[8]

25(32)[0]

56.8−71 (80%−100%) 42.6−56.8 (60%−80%) 28.4−42.6 (40%−60%) 14.2−28.4 (20%−40%) 0−14.2 (0%−20%)

30(14)[2]

Fig. 7. Calendar with room booking forecast for April, May and June of 2010. Colour in each box represents the booking prediction level for each day

Finally, Figure 7 presents a forecasting calendar for April, May and June of 2010 calculated using the database created from years 2007 to 2009. The calendar gives the prediction for each day on the following manner: the first number on the left is the day of the month, the number shown in brackets indicates the value of the room booking prediction, and between squared brackets the confidence interval of 95% level is showed . Moreover, the booking level (type of day) is distinguished with grey scale with the interpretation of the references to grey color in the figure legend.

5

Conclusions and Future Work

In this article, a methodology has been designed with a heuristic strategy based on GAs and the combination of three steps: input selection, identification of the similarity between booking days, and the use of truncated mean for identified


89

days. The results with the truncated mean were more robust avoiding several days with an abnormal behaviour (i.e. congress meetings or weddings). Those days seemed to be equals if we did not use historical booking information, but they have very different room demand during the period of time selected. As noted, global parameters have been optimized with a GA. The optimization process calculated 300 generations, but we want to emphasize that there were not significant improved after the 170 generation. This process is time-consuming (for calculating 300 generations was necessary almost two months) but however, in few number of generations correlation is almost stable. Taking into consideration the problems in booking forecasting, we should point out that the respective model is limited by its high randomness. Nevertheless, results with the proposed model are considered to be above expectations. In conclusion, future research could be elaborated in order to obtain an important depth to this work. One of the additions to consider is the creation short time booking prediction models. The main objective of this amplifying decision is to predict prices for a short term, (week or days) by the means of actual booking curves and prices (stakeholder). Also to be noted, this course of action will also follow the type of day based on our “significant variable selection” of the long-term calendar. Acknowledgments. We would like to convey our gratitude to José Ignacio Pérez Moneo for his support and accessibility with tools like RMS (http://hoteloptimizer.com/). On the same line, we would also like to thank to the Autonomous Government of La Rioja for the continuous encouragement by the means of the “Tercer Plan Riojano de Investigación y Desarrollo de la Rioja” on the project FOMENTA 2010/13, and to the University of La Rioja and Santander Bank for the project API11/13.

References 1. Aha, D.W., Kibler, D.: Instance-based learning algorithms. Machine Learning, 37– 66 (1991) 2. Cantoni, L., Fans, M., Inversini, A., Passini, V.: Hotel websites and booking engines: A challenging relationship. In: Law, R., Fuchs, M., Ricci, F. (eds.) Information and Communication Technologies in Tourism 2011 (Proceedings of the Int. Conf. in Innsbruck, Austria), pp. 241–252. Springer, Heidelberg (2011) 3. Corchado, E., Abraham, A., Carvalho, A.: Hybrid intelligent algorithms and applications. Information Sciences 180(14), 2633–2634 (2010) 4. Corchado, E., Graña, M., Wozniak, M.: New trends and applications on hybrid artificial intelligence systems. Neurocomputing 75(1), 61–63 (2012) 5. Corchado, E., Herrero, A.: Neural visualization of network traffic data for intrusion detection. Applied Soft Computing (2010) 6. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967) 7. Deza, E., Deza, M.M.: Encyclopedia of distances, pp. 1–583. Springer, Heidelberg (2009)

90


8. Haensel, A., Koole, G.: Booking horizon forecasting with dynamic updating: A case study of hotel reservation data. International Journal of Forecasting 27, 942–960 (2011) 9. Haykin, S.: Neural networks: a comprehensive foundation. Prentice Hall (1999) 10. Liu, W., Chawla, S.: A quadratic mean based supervised learning model for managing data skewness. In: SDM, pp. 188–198. SIAM/Omnipress (2011) 11. Mitchell, M.: An introduction to genetic algorithms. The MIT Press (1998) 12. Portnoy, S., Koenker, R.: The gaussian hare and the laplacian tortoise: Computability of squared- error versus absolute-error estimators. Statistical Science 12, 279– 296 (1997) 13. Pölt, S.: Forecasting is difficult - especially if it refers to the future. In: AGIFORS - Reservations and Yield Management Study Group Meeting Proceedings (1998) 14. Quinlan, J.R.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992) 15. Rajopadhye, M., Ben-Ghalia, M., Wang, P.P., Baker, T., Eister, C.V.: Forecasting uncertain hotel room demand. Information Science 132, 1–11 (2001) 16. Sedano, J., Curiel, L., Corchado, E., de la Cal, E., Villar, J.: A soft computing method for detecting lifetime building thermal insulation failures. Integrated Computer-Aided Engineering 17(2), 103–115 (2010) 17. Sparks, B.A., Browning, V.: The impact of online reviews on hotel booking intentions and perception of trust. Tourism Management (2011) 18. Weatherford, L.R., Kimes, S.E.: A comparison of forecasting methods for hotel revenue management. International Journal of Forecasting 19, 401–415 (2003) 19. Weinberg, J., Brown, L.D., Stroud, J.R.: Bayesian forecasting of an inhomogeneous poisson process with applications to call center data. Journal of the American Statistical Association 102(480), 1185–1198 (2007) 20. Wilkinson, G.N., Rogers, C.E.: Symbolic description of factorial models for analysis of variance. Journal of the Royal Statistical Society. Series C (Applied Statistics) 22, 392–399 (1973) 21. Zakhary, A., Atiya, A.F., El-Shishiny, H., Gayar, N.E.: Forecasting hotel arrivals and occupancy using monte carlo simulation. Journal of Revenue and Pricing Management 42, 1–11 (2009)