Business Intelligence for Strategic Marketing - Springer Link

7 downloads 16148 Views 305KB Size Report
Business Intelligence for Strategic Marketing: Predictive Modelling of Customer Behaviour. Using Fuzzy Logic and Evolutionary Algorithms. Andrea G.B. ...
Business Intelligence for Strategic Marketing: Predictive Modelling of Customer Behaviour Using Fuzzy Logic and Evolutionary Algorithms Andrea G.B. Tettamanzi1 , Maria Carlesi2 , Lucia Pannese2 , and Mauro Santalmasi2 1 Universit` a degli Studi di Milano Dipartimento di Tecnologie dell’Informazione via Bramante 65, I-26013 Crema, Italy [email protected] 2 imaginary s.r.l. c/o Acceleratore d’Impresa del Politecnico di Milano, Via Garofalo 39, I-20133 Milan, Italy [email protected]

Abstract. This paper describes an application of evolutionary algorithms to the predictive modelling of customer behaviour in a business environment. Predictive models are represented as fuzzy rule bases, which allows for intuitive human interpretability of the results obtained, while providing satisfactory accuracy. An empirical case study is presented to show the effectiveness of the approach. Keywords: Business Intelligence, Data Mining, Modelling, Strategic Marketing, Forecast, Evolutionary Algorithms.

1

Introduction

Companies face everyday problems related to uncertainty in organizational planning activities: accurate and timely knowledge means improved business performance. In this framework, business intelligence applications represent instruments for improving the decision making process within the company, by achieving a deeper understanding of market dynamics and customers’ behaviour. Particularly, in the fields of business and finance, the executives can improve their insight of market scenarios by foreseeing customers’ behaviour. This information allows to maximize revenues and manage costs through an increase in the effectiveness and efficiency of all the strategies and processes which involve the customers. Predictions about customers’ intentions to purchase a product, about their loyalty rating, the gross operating margins or turnover they will generate, are fundamental for two reasons. Firstly, they are instrumental to an effective planning of production volumes and specific promotional activities. Secondly, the comparison of projections to actual results allows to spot meaningful indicators, useful for improving performance. M. Giacobini et al. (Eds.): EvoWorkshops 2007, LNCS 4448, pp. 233–240, 2007. c Springer-Verlag Berlin Heidelberg 2007 

234

A.G.B. Tettamanzi et al.

A versatile solution for business intelligence, called iCLIP (imaginary’s Client Profiler), has been developed by imaginary, a Milan-based company specialising in knowledge management. This paper describes the fuzzy-evolutionary approach to data mining used by iCLIP, in particular the MOLE engine, and presents the results of a case study on customer turnover prediction.

2

The Context

Traditional methods of customer analysis, like segmentation and market research, provide static knowledge about customers, which may become unreliable in time. A competitive advantage can be gained by adopting a data-mining approach whereby predictive models of customer behaviour are learned from historical data. Such knowledge is more fine-grained, in that it allows to reason about an individual customer, not a segment; furthermore, by re-running the learning algorithm as newer data become available, such an approach may be made to take a continuous picture of the current situation, thus providing dynamic knowledge about customers. iCLIP uses evolutionary algorithms (EAs) for model learning, and expresses models as fuzzy rule bases. EAs are known to be well-suited to tracking optima in dynamic optimization problems [5]. Fuzzy rule bases have the desirable characteristic of being intelligible, as they are expressed in a language typically used by human experts to express their knowledge.

3

The Approach

In the area of business intelligence, data mining is a process aimed at discovering meaningful correlations, patterns, and trends between large amounts of data collected in a dataset. Once an objective of strategic marketing has been established, the system needs a wide dataset including as many data as possible not only to describe customers, but also to characterize their behaviour and tracing their actions. The model is determined by observing past behaviour of customers and extracting the relevant variables and correlations between data and rating (dependent variable) and it provides the company with projections based on the characteristics of each customer: a good knowledge of customers is the key for a successful marketing strategy. The tool is based on the use of EAs which recognise patterns within the dataset, by learning classifiers represented by sets of fuzzy rules. 3.1

Fuzzy Classifiers

Each classifier is described through a set of fuzzy rules. A rule is made by one or more antecedent clauses (“IF . . . ”) and a consequent clause (“THEN . . . ”). Clauses are represented by a pair of indices referring respectively to a variable and to one of its fuzzy sub-domains, i.e., a membership function.

Business Intelligence for Strategic Marketing

235

Using fuzzy rules makes it possible to get homogenous predictions for different clusters without imposing a traditional partition based on crisp thresholds, that often do not fit the data, particularly in business applications. Fuzzy decision rules are useful in approximating non-linear functions because they have a good interpolative power and are intuitive and easily intelligible at the same time. Their characteristics allow the model to give an effective representation of the reality and simultaneously avoid the “black-box” effect of, e.g., neural networks. The output of the iCLIP application is a set of rules written in plain consequential sentences. The intelligibility of the model and the high explanatory power of the obtained rules are useful for the firm, in fact the rules are easy to be interpreted and explained, so that an expert of the firm can clearly read and understand them. An easy understanding of a forecasting method is a fundamental characteristic, since otherwise the managers are reluctant to use forecasts [1]. Moreover, the proposed approach provides the managers with an information that is more transparent for the stakeholders and can easily be shared with them. 3.2

The Evolutionary Algorithm

EAs are a broad class of stochastic optimization algorithms, inspired by biology and in particular by those biological processes that allow populations of organisms to adapt to their surrounding environment: genetic inheritance and survival of the fittest. Recent texts of reference and synthesis in the field of EAs are [7,6,3,2]. The iCLIP system incorporates an EA for the design and optimization of fuzzy rulebases that was originally developed to automatically learn fuzzy controllers [9,8], then was adapted for data mining [4] and is at the basis of MOLE, a general-purpose distributed engine for modelling and data mining based on EAs and fuzzy logic. A MOLE classifier is a rule base, of up to 256 rules, each comprising up to four antecedent and one consequent clause. Up to 256 input variables and one output variable can be handled, described by up to 16 distinct membership functions each. Membership functions for input variables are trapezoidal, while membership functions for the output variables are triangular. An island-based distributed EA is used to evolve classifiers. The sequential algorithm executed on every island is a standard generational replacement, elitist EA. Crossover and mutation are never applied to the best individual in the population. Genetic Operators. The recombination operator is designed to preserve the syntactic legality of classifiers. A new classifier is obtained by combining the pieces of two parent classifiers. Each rule of the offspring classifier can be inherited from one of the parent programs with probability 1/2. When inherited, a rule takes with it to the offspring classifier all the referred domains with their membership functions. Other domains can be inherited from the parents, even if they are not used in the rule set of the child classifier, to increase the size of the offspring so that their size is roughly the average of its parents’ sizes.

236

A.G.B. Tettamanzi et al.

Like recombination, mutation produces only legal models, by applying small changes to the various syntactic parts of a fuzzy rulebase. Migration is responsible for the diffusion of genetic material between populations residing on different islands. At each generation, with a small probability (the migration rate), a copy of the best individual of an island is sent to all connected islands and as many of the worst individuals as the number of connected islands are replaced with an equal number of immigrants. 3.3

Fitness

Modelling can be thought of as an optimization problem, where we wish to find the model M ∗ which maximizes some criterion which measure its accuracy in predicting yi = xim for all records i = 1, . . . , N in the training dataset. The most natural criteria for measuring model accuracy are the mean absolute error and the mean square error. One big problem with using such criteria is that the dataset must be balanced, i.e., an equal number of representative for each possible value of the predictive attribute yi must be present, otherwise the underrepresented classes will end up being modeled with lesser accuracy. In other words, the optimal model would be very good at predicting representatives of highly represented classes, and quite poor at predicting individuals from other classes. To solve this problem, MOLE divides the range [ymin , ymax ] of the predictive variable into 256 bins. The bth bin, Xb , contains all the indices i such that 1 + 255

yi − ymin  = b. ymax − ymin

(1)

For each bin b = 1, . . . , 256, it computes the mean absolute error for that bin errb (M ) =

1  |yi − M (xi1 , . . . , xi,m−1 )|, Xb 

(2)

i∈Xb

then the total absolute error as an  integral of the histogram of the absolute errors for all the bins, tae(M ) = b:Xb =0 errb (M ). Now, the mean absolute error for every bin in the above summation counts just the same no matter how many records in the dataset belong to that bin. In other words, the level of representation of each bin (which, roughly speaking, corresponds to a class) has been factored out by the calculation of errb (M ). What we want from a model is that it is accurate in predicting all classes, independently of their cardinality. 1 The fitness used by the EA is given by f (M ) = tae(M)+1 , in such a way that a greater fitness corresponds to a more accurate model. 3.4

Selection and Overfitting Control

In order to avoid overfitting, the following mechanism is applied: the dataset is split into two subsets, namely the training set and the test set. The training set

Business Intelligence for Strategic Marketing

237

is used to compute the fitness considered by selection, whereas the test set is used to compute a test fitness. Now, for each island, the best model so far, M ∗ , is stored aside; at every generation, the best individual with respect to fitness is obtained, Mbest = argmaxi f (Mi ). The test fitness of Mbest , ftest (Mbest ), is computed and, together with f (Mbest), it is used to determine an optimistic and a pessimistic estimate of the real quality of a model: for all model M , fopt (M ) = max{f (M ), ftest(M )}, and fpess (M ) = min{f (M ), ftest (M )}. Now, Mbest replaces M ∗ if and only if fpess (Mbest ) > fpess (M ∗ ), or, in case fpess (Mbest ) = fpess (M ∗ ), if fopt (Mbest ) > fopt (M ∗ ). Elitist linear ranking selection, with an adjustable selection pressure, is responsible for improvements from one generation to the next. Overall, the algorithm is elitist, in the sense that the best individual in the population is always passed on unchanged to the next generation, without undergoing crossover or mutation.

4

A Case Study on Customer Turnover Modelling

The iCLIP system has been applied to the predictive modelling of the turnover generated by customers of an Italian medium-sized manufacturing corporation operating in the field of wood and its end products. A pilot test was performed to demonstrate the feasibility of an innovative approach to customers modelling in turnover segments. In order to reduce time and costs, the traditional statistical analysis of data was skipped. Classifying customers into turnover segments can be useful not only to plan the activities and be aware of the returns for the next period, but also to identify characteristics which describe different patterns of customers, to recognise strategic and occasional customers, to target commercial/marketing activities and so on. iCLIP was used to develop a predictive model foreseeing customers’ turnover segments for a quarter using historical data of the year before the analysis. Customers were classified into three quarterly turnover segments: 1st segment: turnover >50,000 euro/quarter; 2nd segment: turnover between 10,000 and 50,000 euro/quarter; 3rd segment: turnover 50,000 EUR/Q) 0.71 0.5 2 (10,000–50,000 EUR/Q) 0.54 0.5 3 (