An adaptive ordered fuzzy time series with application to FOREX

Expert Systems with Applications 38 (2011) 475–485

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

An adaptive ordered fuzzy time series with application to FOREX Majid Bahrepour a,*, Mohammad-R. Akbarzadeh-T. b, Mahdi Yaghoobi a, Mohammad-B. Naghibi-S. c a

Islamic Azad University, Mashhad Branch, Iran Center for Applied Research on Intelligent Systems and Soft Computing, Departments of Electrical Engineering and Computer Engineering, Ferdowsi University of Mashhad, Iran c Department of Electrical Engineering, Ferdowsi University of Mashhad, Iran b

a r t i c l e

i n f o

Keywords: Fuzzy time series Adaptive order selection Self-organising maps FOREX Prediction

a b s t r a c t An adaptive ordered fuzzy time series is proposed that employs an adaptive order selection algorithm for composing the rule structure and partitions the universe of discourse into unequal intervals based on a fast self-organising strategy. The automatic order selection of FTS as well as the adaptive partitioning of each interval in the universe of discourse is shown to greatly affect forecasting accuracy. This strategy is then applied to prediction of FOREX market. Financial markets, such as FOREX, are generally attractive applications of FTS due to their poorly understood model as well as their great deal of uncertainty in terms of quote fluctuations and the behaviours of the humans in the loop. Specifically, since the FOREX market can exhibit different behaviours at different times, the adaptive order selection is executed online to find the best order of the FTS for current prediction. The order selection module uses voting, statistical analytic and emotional decision making agents. Comparison of the proposed method with earlier studies demonstrates improved prediction accuracy at similar computation cost. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Forecasting time series data from a time-dependant sequence of continuous values is important in a wide array of applications such as monitoring the air pollution in the environment, estimating blood pressure, predicting market trends in both stocks and foreign exchange markets (Li & Cheng, 2007). In 1993, Song and Chissom proposed a new concept of time series data prediction, namely Fuzzy Time Series (FTS) which uses the notion of fuzzy sets and approximate reasoning (Song & Chissom, 1993a, 1993b, 1994). They studied the problem of forecasting fuzzy time series by using the enrolment data in the University of Alabama and proposed a forecasting model that is mainly composed of five steps: (1) partitioning the universe of discourse into equal intervals, (2) defining fuzzy sets on the universe of discourse and fuzzifying the time series accordingly, (3) mining the fuzzy logical relationships that exists in the fuzzified time series, (4) forecasting and then (5) defuzzifying the forecasted output. Song and Chissom showed these steps to reduce the time complexity of FTS in comparison with the previous studies. Since the contribution of Song and Chissom, a number of other studies have been presented to either reduce computational overhead or increase forecasting accuracy. For example, to reduce

* Corresponding author. Tel.: +31 53 489 3765; fax: +31 53 489 4590. E-mail addresses: [email protected] (M. Bahrepour), Akbarzadeh@ ieee.org (Mohammad-R. Akbarzadeh-T.). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.06.087

computational overhead being produced in deriving the fuzzy relationship in Song and Chissom’s model, Sullivan and Woodall proposed a ‘Markov-based model’ (Sullivan & Woodall, 1994) using a conventional matrix multiplication. Also in 1994, Song and Chissom applied a first-order time-variant strategy for forecasting enrolment and discussed the differences between time-variant and time invariant models (Song & Chissom, 1994). To improve forecasting accuracy, Chen presented an efficient forecasting procedure for prediction of enrolments in the University of Alabama using simplified arithmetic operations (Chen, 1996) that reduced the complex arithmetic operations to some essential operations. Huarng proposed heuristic models by integrating problem specific heuristic knowledge with Chen’s model to reduce forecasting error (Huarng, 2001). Chen in his later works proposed a high-order fuzzy time series in which more than one step behind are given in the inputs of FTS for prediction (Chen, 2002). His work was compared with the previous studies that used only one previous step to provide the prediction. The high-order FTS revealed that prediction accuracy ratio is significantly increased by using higher order of inputs (more than one step behind as the input of FTS). Yu proposed a weighted averaging operator to record occurrences of each fuzzy relation and applied a weighting factor for the defuzzification (Yu, 2005). Li et al. proposed deterministic automatons to deal with the uncertainties in defuzzifying phase and partitioning phase (Li & Cheng, 2007). Bahrepour et al. modified Yu’s weighting model by partitioning the universe of discourse unequally by using genetic algorithm (Bahrepour et al., 2008). In their study, genetic algorithm

476

M. Bahrepour et al. / Expert Systems with Applications 38 (2011) 475–485

was employed to find the optimal length of each partition along with the weighted averaging technique. Similar study was also reported in Bahrepour et al. (2008), Chen and Chung (2006). But a GA-based approach is inherently slow, and hence not applicable for fast decision making under rapid changes in behaviour of markets such as FOREX. More specifically, using GA to partition universe of discourse unequally, the time complexity is OGeneticAlgorithm = (G I q2), where G is the number of generations, I is the number of individuals, and q is number of training data in computing the fitness function. This time complexity is further discussed in Section 4.4. In this paper, a novel approach to high-order fuzzy time series is presented. The proposed model is different from the previous studies in two facets. First is the use of a self-organising map (SOM) to partition the universe of discourse unequally. SOM is chosen here for its fast clustering function, particularly when compared with genetic algorithm (GA). Using SOM and Kohonen training algorithm, the time complexity of partitioning the universe of discourse to n unequal intervals is reduced to OSOM = (k n) where k is the training epochs and n is the number of intervals. The training algorithm is described in Section 3 and the time complexity is proved in Section 4.4. The second facet of the presented approach is an adaptive order selection that finds the best estimated order by incorporating three different agents that are voting agent, statistical agent and emotional agent. These agents work sequentially to find the best order of the high-order fuzzy time series. The voting agent tries to reach a consensus between different solutions. This consensus should be computationally inexpensive, yet efficient. Statistical analyst agent analyses the data for inconsistencies between data and removing those inconsistent solutions. And the emotional decision making agent provides an emotional signal when none of the previous agents can obtain a reliable solution. This signal is similar to a ‘‘gut-feeling” which is adapted from Somatic Marker Hypothesis (SMH) (Bechara & Damasio, 2005). The rest of this paper is organised as follows: Sections 2 and 3 review the basics of fuzzy time series and self-organising maps, respectively. The proposed method is explained in Section 4. Section 5 reports application of the proposed technique to FOREX market and provides empirical analysis. Finally, conclusions are made in Section 6. 2. Fuzzy time series basics Several basic definitions and principles of FTS are reviewed here. Let U(t) R (t = . . . , 0, 1, 2, . . .) be the universe of discourse on which fuzzy values (sets) fi(t) (i = 1, 2, . . .) are defined, and let F(t) be a sequence of fi(t). Then, F(t) is called fuzzy time series on U(t) (t = . . . , 0, 1, 2, . . .). Let F(t) and F(t 1) be fuzzy time series on U(t) and U(t 1) (t = . . . , 0, 1, 2, . . .). If for any fj(t) (j = 1, 2, . . .) 2 F(t), there exists an fi(t 1) (i = 1, 2, . . .) 2 F(t 1) such that there is a first-order fuzzy relation R(t, t 1) and fj(t) = fi(t 1) Rij(t, t 1), then F(t) is said to be caused by F(t 1). This can be denoted as fi(t 1) ? fj(t) or equivalently F(t 1) ? F(t). Song and Chissom derived the first-order model based on the first-order relation and extended it to mth-order model (Song, 1993). Definition 1. Suppose F(t) is caused by F(t 1) orF(t 2) or . . . or F(t m) (m > 0) only. The relation between F(t) and its cause can then be expressed by the following fuzzy relational equation:

FðtÞ ¼ Fðt 1Þ Rðt; t 1Þ or FðtÞ ¼ Fðt 2Þ Rðt; t 2Þ or FðtÞ ¼ Fðt mÞ Rðt; t mÞ

...

Or alternatively

FðtÞ ¼ ðFðt 1Þ [ Fðt 2Þ [ [ Fðt mÞÞ Rðt; t mÞ

ð1Þ

where ‘[’ is the union and ‘’ is the composition operators. R(t, t m) is a relation matrix to describe the fuzzy relationship between F(t m) and F(t). Eq. (1) is called first-order model of F(t). From Definition 1, we note that (Chen, 2002): 1. F(t) is a function of time. 2. F(t) is a linguistic function, i.e. a function whose values are linguistic values represented by fuzzy sets. 3. fi(t) (i = 1, 2, . . .) are possible linguistic values (fuzzy sets) that belong to F(t), where F(t) is a series of fi(t) (i = 1, 2, . . .). Definition 2. Suppose that F(t) is caused by F(t 1), F(t 2), . . ., and F(t m) (m > 0) simultaneously. This relation can be expressed by the following fuzzy relational equation:

FðtÞ ¼ ðFðt 1Þ; Fðt 2Þ; . . . ; Fðt mÞÞ Ra ðt; t mÞ

ð2Þ

The equation is called the mth-order of F(t), and Ra(t, t m) is a relation matrix to describe the fuzzy relationship between F(t 1), F(t 2), . . . , F(t m) and F(t) (Chen, 2002). In short, Eq. (2) means that more than one input in composition with a relational matrix can produce the predicted result. Based upon the above preliminaries, the proposed approach (similar to most other approaches on fuzzy time series) is presented in Section 4. 3. An introduction to self-organising maps SOMs consist of components called nodes that are centres of clusters in clustering application of SOMs . . . (Software, 2004). In our study, a simple self-organising map is used to bundle historical data into clusters. These clusters are then used for partitioning the universe of discourse unequally. The Kohonen training algorithm is widely used in this network as follows: (1) Initialise centre of each cluster, Ci (i = 1, 2, . . . , n), randomly. (2) Grab an input vector (3) Traverse each centre of cluster a. Use a similarity measure to find the distance between each cluster centre Ci (i = 1, 2, . . . , n) and the input data vector Dj. Euclidean distance is a common measure of similarity, as in this paper, and is calculated a bellow:

dðC il ; Djl Þ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Xm 2 ðC D Þ il jl l¼1

where Cil, Djl are the lth elements of two vectors Ci and Dj, and m is the vector dimension. b. Find the cluster centre C* which produces the smallest distance with the input vector. (4) Update the neighbours of the cluster centre C*, Cv (v = 1, 2, . . . , k), where Cv is a neighbour of the C* and k is the number of C*’s neighbours. This update is performed by pulling Cv closer to the input data vector D(t) by using the bellow formula

C v ðt þ 1Þ ¼ C v ðtÞ þ HðtÞaðtÞðDðtÞ C v ðtÞÞ where t is the current iteration, H(t) is restraint due to distance from C*, and a(t) is learning restraint due to time. (5) Increment t and repeat while t < T, where T is limit on time iterations.

or The outputs of SOM is a set of Ci (i = 1, 2, . . . , n) which is a set of cluster centres. Further information on SOMs can be found in

477


Gurney (1997), Demuth, Beale, and Hagan (2006), Gupta, Jin, and Homma (2003). 4. Proposed approach In this study, two modifications are proposed on Chen’s high-order fuzzy time series (Chen, 2002). First modification is partitioning the universe of discourse unequally by using the SOM, and the second is to adaptively find the best order of the FTS. SOM is important due to its fast clustering function which can bundle data into clusters faster than GA (Section 4.4 addresses this time complexity). In the previous studies such as Chen’s and Bahrepour’s (Bahrepour et al., 2008; Chen & Chung, 2006), GA was used to find the best length of the intervals. To reduce computational overhead during the partitioning of the universe of discourse, SOM is recommended here; and to augment prediction accuracy ratio, the adaptive order selection is introduced. In the following, the proposed algorithm is presented in several steps. An example on the USD/JPY currency-pair serves to illustrate the approach. This approach is then applied in Section 5 to FOREX daily dataset. 4.1. The algorithm Step 1. Partition the universe of discourse U into n unequal intervals (where U = {u1, u2, . . ., un}. This partitioning is accomplished by the following routine: I. Find the centre of n clusters (c1, c2, . . . , cn) using SOM. II. Let Dmin and Dmax be the minimum value and the maximum value of the historical data (minimum and maximum quotes in FOREX dataset example). Let U = [Dmin D1, Dmax + D2] be the universe of discourse, where D1 and D2 are two proper positive numbers for marginal extensions (that might be needed for unseen data), then U is partitioned into n unequal intervals by the below rule:

where n = 7. Table 1 contains a number of USD/JPY quotes with their corresponding linguistic value. Step 3. Derive fuzzy logical relationships. These relationships are used for prediction in the next step. For example if the quotes at time t 1 is A5 and at time t is A7 then A5 ? A7 (according to Definition 1 for first-order fuzzy time series). Since a fuzzy variable may entail more than one output (e.g. A1 entails A3 and A5 at two different times), all the entailed values should be gathered in groups. In addition, the repetitive observations are stored in the groups (e.g. if A3 entails A6 on two different occasions, this is stored as A3 ? {A6, A6}). Therefore all the fuzzified historical data should be gathered in the groups. An example of the fuzzy logical relationships for the first-order model is illustrated in Table 2. From Definition 2, mth-order relationships are similarly groupbased like: Aj1, Aj2, . . . , Ajk ? {Aj1, Aj2, . . . , Ajl}, where k is the order of FTS, j index refers to different linguistic variables, and l is the number of entailed linguistic values. For the example of USD/JPY quote prediction, some fuzzy logical relationships are shown in Table 3. The number sign (‘#’) indicates the null or the missing values, i.e. there is no corresponding input/ output in the historical data. These relationships are used in the next step for prediction and defuzzification. In the proposed model, these relationships are driven for first-order, second-order, and up to mth-order; where m is determined by the user. The adaptive order selection module then finds the best order among these m predictions. Step 4. Forecasting and defuzzifying. In this step the input and the fuzzy logical relationships (being obtained from the previous step) are used to forecast and defuzzify the forecasted result. The following rules perform forecast and defuzzification for first-order and high-order FTS.

Table 1 Several USD/JPY quotes with their corresponding linguistic values.

ðc1 þ c2 Þ ðc1 þ c2 Þ ðc2 þ c3 Þ ; u2 ¼ ; ; ...; u1 ¼ Dmin D1 ; 2 2 2 ðcn1 þ cn Þ ; Dmax þ D2 un ¼ 2 In the USD/JPY currency-pair example, (Dmin D1) = 102 and (Dmax D2) = 123. The universe of discourse is partitioned into seven unequal intervals and the outputs of SOM are c1 = 108, c2 = 110, c3 = 114, c4 = 115, c5 = 117, c6 = 119, c7 = 121. Therefore, u1 = [102, 109], u2 = [109, 112], u3 = [112, 113], u4 = [113, 116], u5 = [116, 118], u6 = [118, 120], u7 = [120, 123]. Step 2. Define fuzzy sets on the universe of discourse U and fuzzify the historical data. A fuzzy set Ai of U is defined as Ai ¼ fAi ðu1 Þ=u1 þ fAi ðu2 Þ=u2 þ þ fAi ðun Þ=un , where fAi ðuj Þ indicates the grade of membership ujin Ai

Ai ¼ fAi ðu1 Þ=u1 þ fAi ðu2 Þ=u2 þ þ fAi ðun Þ=un

¼ 1=u1 þ 0:5=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ 0=u7 ¼ 0:5=u1 þ 1=u2 þ 0:5=u3 þ 0=u4 þ 0=u5 þ 0=u6 þ 0=u7 ¼ 0=u1 þ 0:5=u2 þ 1=u3 þ 0:5=u4 þ 0=u5 þ 0=u6 þ 0=u7 ¼ 0=u1 þ 0=u2 þ 0:5=u3 þ 1=u4 þ 0:5=u5 þ 0=u6 þ 0=u7 ¼ 0=u1 þ 0=u2 þ 0=u3 þ 0:5=u4 þ 1=u5 þ 0:5=u6 þ 0=u7 ¼ 0=u1 þ 0=u2 þ 0=u3 þ 0=u4 þ 0:5=u5 þ 1=u6 þ 0:5=u7 ¼ 0=u1 þ 0=u2 þ 0=u3 þ 0=u4 þ 0=u5 þ 0:5=u6 þ 1=u7

Linguistic value

104.20 108.34 119.25 112.23 122.45

A1 A1 A6 A3 A7

Table 2 The fuzzy logical relationships for the first-order model. A1 ? {A2, A2, A3} A4 ? {A5, A6}

A2 ? {A1, A2, A3} A5 ? {A2, A6, A6 , A6}

A3 ? {A1} A6 ? {A4, A5}

ð3Þ

If the data (e.g. the quotes in the example of FOREX dataset) obtains highest membership degree with Ak, then the fuzzified data is labelled as Ak. For example, linguistic values are defined as:

A1 A2 A3 A4 A5 A6 A7

USD/JPY quotes

Table 3 The fuzzy logical relationships for the high-order model. Second-order

Third-order

A1, A1 ? {A2, A3} A1, A2 ? {A1} A2, A3 ? {A1, A2, A3, A3} A7, A6 ? #

#, A1, A1 ? {A2} A1, A1, A2 ? {A1, A3} A2, A2, A3 ? {A2, A3, A3} A6, A7, A7 ? #

Fourth-order #, A1, A1, A1 ? {A2, A4} A1, A1, A2, A2 ? {A1, A3, A3} A1, A2, A2, A3 ? {A2, A3, A4, A4} A6, A6, A7, A7 ? #

Fifth-order #, A1, A1, A1, A2 ? {A2, A4, A5} A1, A1, A2 ,A2, A3 ? {A3, A3} A1, A2 ,A2, A3, A3 ? {A2, A4} A5, A6, A6, A7, A7 ? #

478


Pk

First-order FTS: Rule 1: If F(t 1) = Ai, and Ai ? { }, i.e. there is no match/precedence in historical data for Ai, then the predicted result ^Þ at time t is the midpoint of interval ui being centre of ðy the ith cluster (ci) in which the maximum membership degree of Ai is located

^ ¼ ci y In other words, in the absence of earlier historical data with similar conditions, the best assumption is that there is no change in the time series. For example, we have the following fuzzy relationships and the input is A7; since ^ ¼ c7 where c7 is the centre of cluster for u7 A7 ! f g; y where maximum membership of A7 is located. A1 ? {A2, A2, A3} A4 ? {A5, A6}

A2 ? {A1, A2, A3} A5 ? {A2, A6, A6, A6}

A3 ? {A1} A6 ? {A4, A5}

Rule 2: If F(t 1) = Ai and Ai ? {Aj}, i.e. there is consistently only one match in the historical data, then the predicted result ^Þ at time t is the midpoint of interval uj being centre of ðy jth cluster (cj) in which the maximum membership degree of Aj is located

^ ¼ cj y For example we have the following fuzzy relationships and ^ ¼ c1 where c1 is the input is A3; since A3 ? {A1} therefore, y the centre of cluster for u1 where maximum membership of A1 is located. A1 ? {A2, A2, A3} A4 ? {A5, A6}

A2 ? {A1, A2, A3} A5 ? {A2, A6, A6, A6}

A3 ? {A1} A6 ? {A4, A5}

Rule 3: If F(t 1) = Ai and Ai ! fAj1 ; Aj2 ; . . . ; AjH g, where H > 1 is the number of entailed elements, and j refers to different lin^Þ at time t is guistic variables, then the predicted result ðy

^¼ y

1 ðc2 þ c6 þ c6 þ c6 Þ 4

where c2 and c6 are the centres of clusters for u2 and u6, respectively (where maximum memberships of A2 and A6 are located). A1 ? {A2, A2, A3} A4 ? {A5, A6}

þ 1 iÞ cjðkþ1iÞ Pk i¼1 i

where cj1, cj2, . . . , cjk are the midpoints (the cluster centres) of the intervals uj1, uj2, . . . , ujk in which the maximum membership degree of Aj1, Aj2, . . . , Ajk are located, respectively. For example we have the following fuzzy relationships and ^ is: the input is A3, A7; since A3, A7 ? { }therefore, y

^¼ y

ð2 c7 Þ þ ð1 c3 Þ 3

where c7 and c3 are the centre of clusters corresponding to u7 and u3, where maximum membership of A3 and A7 are also located, respectively. Second-order

Third-order


#, A1, A1 ? {A2} A1, A1, A2 ? {A1, A3} A2, A2, A3 ? {A2, A3, A3} A6, A7, A7 ? #

Fourth-order

Fifth-order

#, A1, A1, A1 ? {A2, A4} A1, A1, A2, A2 ? {A1, A3, A3} A1, A2, A2, A3 ? {A2, A3, A4, A4} A6, A6, A7, A7 ? #

#,A1, A1, A1, A2 ? {A2, A4, A5} A1, A1, A2, A2, A3 ? {A3, A3} A1, A2, A2, A3, A3 ? {A2, A4} A5, A6, A6, A7, A7 ? #

Rule 2: If F(t 1) = Aj1, Aj2, . . . , Ajk and Aj1, Aj2, . . . , Ajk ? {Aj1}, where k is the order of FTS and variable j shows linguistic labels are ^Þ at time t is the midvaried, then the predicted result ðy point (cluster centre or cj1) of interval uj1 in which the maximum membership degree of Aj1 is located

^ ¼ cj1 y For example, if we have the following fuzzy relationships A1, ^ is: A2 ? {A1}, and the input is A1 ; A2 ; y

^ ¼ c1 y

H 1 X cji H i¼1

where cj1, cj2, . . . , cjH are the midpoints (of the cluster centres) of the interval uj1, uj2, . . . , ujH in which the maximum membership degree of Aj1, Aj2, . . . , AjH are located, respectively. For example, if we have the following fuzzy relationships ^ is computed as: A5 ? {A2, A6, A6, A6} and the input is A5, y

^¼ y

i¼1 ðk

^¼ y

A2 ? {A1, A2, A3} A5 ? {A2, A6, A6, A6}

where c1 is the centre of cluster for u1 where maximum membership of A1 is located. Second-order

Third-order


#,A1, A1 ? {A2} A1, A1, A2 ? {A1, A3} A2, A2, A3 ? {A2, A3, A3} A6, A7, A7 ? #

Fourth-order

Fifth-order


#, A1, A1, A1, A2 ? {A2, A4, A5} A1, A1, A2, A2, A3 ? {A3, A3} A1, A2, A2, A3, A3 ? {A2, A4} A5, A6, A6, A7, A7 ? #

A3 ? {A1} A6 ? {A4, A5}

Higher order FTS (kth-order): Rule 1: If F(t 1) = Aj1, Aj2, . . . , Ajk and Aj1, Aj2, . . . , Ajk ? { }, where k is the order of FTS and j shows linguistic labels are varied. If cardinality j{ }j = 0 is zero, i.e. there is no match in the his^Þ at torical data forAj1, Aj2, . . . , Ajk, then the predicted result ðy time t is:

Rule 3: If F(t 1) = Aj1, Aj2, . . . , Ajk and Aj1, Aj2, . . . , Ajk ? {Aj1, Aj2, . . . , AjH} (H > 1), where k is the order of FTS and variable j shows ^Þ at linguistic labels are varied, then the predicted result ðy time t is

^¼ y

H 1 X cji H i¼1

where cj1, cj2, . . . , cjH are the midpoints (cluster centres) of the interval uj1, uj2, . . . , ujH in which the maximum membership degree of Aj1, Aj2, . . . , AjH are located, respectively.

479


For example we have the following fuzzy relationships A2, ^ is: A3 ? {A1, A2, A3, A3}, and the input is A2, A3, then y

Start

1 ^ ¼ ðc1 þ c2 þ c3 þ c3 Þ y 4 where c1, c2, c3 are the midpoints (cluster centres) of the intervals u1, u2, u3, where maximum membership of A1, A2, A3 are located respectively.

Second-order

Third-order


#,A1, A1 ? {A2} A1, A1, A2 ? {A1, A3} A2, A2, A3 ? {A2, A3, A3} A6, A7, A7 ? #

Fourth-order

Fifth-order


#,A1, A1, A1, A2 ? {A2, A4, A5} A1, A1, A2, A2, A3 ? {A3, A3} A1, A2, A2, A3, A3 ? {A2, A4} A5, A6, A6, A7, A7 ? #

Step 5. Compute quantitative measures H and V, as defined below, from the above grouping process of causal relations. Definition 3. Let H (number of hits) be the number of matched patterns in the historical data for a given set of antecedent fuzzy propositions. For example, if F(t 1) = Aj1, Aj2, . . . , Ajk and Aj1, Aj2, . . . , Ajk ? {Aj1, Aj2, . . . , Ajp}, where k is the order of FTS (k P 1) and variable j shows linguistic labels can be varied, then H is the same as the cardinality of H = j{Aj1, Aj2, . . . , Ajp}j = p. Definition 4. Let V (dispersion) indicate the dispersion among the elements of a group. For example, consider the relationship, like F(t 1) = Aj1, Aj2, . . . , Ajk, where k is the order of FTS (k P 1) and variable j shows linguistic labels are varied, and Aj1, Aj2, . . . , Ajk ? {A1, A3, A3, A5}. Assume that the maximum membership degrees of {A1, A3, A3, A5} are located in {c1, c3, c3, c5}, respectively. Dispersion can be numerically obtained from the following formula: H 1 X V¼ ðci cÞ2 H i¼1

V (dispersion) is the same as variance, H is the number of hits (Definition 3), ci’s are cluster centres, c is the mean value of the participating ci’s. In Step 4, m fuzzy relationships are driven (first-order, second-order up to mth-order, where m is chosen by the user). In Step 4, m predicted results are obtained. In this step, a total of m V’s and H’s are computed. These m predicted results along with m H’s and V’s are used in the following adaptive order selection module. Step 6. Find the best order among the m predicted results by using the adaptive order selection module and present the predicted result which is chosen by the adaptive order selection module in the output. Algorithm of the adaptive order selection module is detailed in Section 4.2. Fig. 1 shows the complete flowchart of the proposed approach. 4.2. Adaptive order selection algorithm The adaptive order selection module employs three agents (voting, statistical and emotional) to find the best order. These agents work sequentially by the following order. Fig. 2 shows the algorithm of the adaptive order selection module.

Partition the universe of discourse into n unequal intervals. Centre of each interval is determined by a SOM

Define fuzzy sets on the universe of discourse and place maximum membership degrees on the centres of the clusters that obtained from the SOM

Derive fuzzy logical relationships for first order up to mth order. (m is defined by the user and usually m>1 because in the next step there would be m prediction and automatic order selection will find the best order among these m prediction)

Forecast and defuzzify the forecasted outputs for all the m predictions

Compute V (variance) and H (hits) for all m outputs

Find the best order among the m orders by using the Adaptive Order Selection module

Display the predicted value being obtained from the Adaptive Order Selection module

End

Fig. 1. Flowchart of the proposed algorithm.

1. Voting Agent (AV): Aggregate the results by employing a voting agent. This is a simple and yet efficient voting technique based upon popularity that investigates whether there are two or more forecasts with the same value or not. These voters must be H P 1 in order to attend the voting procedure. Having H P 1assures that the corresponding voter to have found at least one matching pattern in the historical data and to be ready to join the voting. Those voters with H = 0 are not ready for voting because they have failed to find any solution or matched pattern within historical data. Mathematically speaking, the voting agent is looking for a solution with order of z (if such a solution exists) ^z ¼ y î Þ; ði – zÞ; ði; z ¼ 1; 2; . . . ; mÞ; ðHy^s > 0 and where 9z; 9i ðy Hyî > 0Þ, where z is the desired order (which is found by autoî is matic order selection), m is the maximum allowed order, y

480


Forecast one period ahead using 1st order up to m-th order respectively and store each forecasted value along with its s 2 (variance) and H (hits) information

Voting Agent (AVG) Choose the predicted result with order z

Yes

Is aggregation possible for order z?

No

Yes

Does exist a predicted result z in which sz2=0 and Hz>1 (Hits)

No

Yes

Statistical Agent (ASA) (Phase 1)

Statistical Agent (ASA) (Phase 2)

Does exist an output z in which s 2z < s 2i and Hz>Hi (i=1,2,...m and I ? z)

No Emotional Agent (AEM)

Choose the i th order as the optimal order

Use the output as best answer

Fig. 2. Flowchart of the adaptive order selection module.

the predicted result by using ith-order, and Hyî is the number of î . If such a solution exists, order z is hits for predicted result y selected. If such an aggregation could not be accomplished, the next agent (statistical agent) will seek the best order. 2. Statistical Agent (AS): Analyse the results by employing a statistical analytic agent. This analysis is carried out in two stages. The first phase tries to find the answer with variance r2 = 0 and the number of hits H P 1. The idea behind this is to seek an answer with no dispersion while ensuring that the predicted result has already been repeated earlier in the historical data. H = 0 is not desirable because it means the predicted result is

simply a repetition of the previous step (see Step 4 and Definition 3) and these answers are not usually good choices. H = 1 is desirable because it means the predicted result has already occurred in the historical data once and the next step is likely to be the same as what has happened in the past. Therefore, the first stage seeks an order z s.t. 9z ðr2z ¼ 0Þ; ðHz > 0Þ; ðz ¼ 1; 2; . . . ; mÞ. If the first phase fails to find the appropriate order, the second phase tries to find the predicted result with smallest variance and largest H. Therefore an order z is desirable here where 9z; 8i ðr2z < r2i Þ; ðHz > Hi Þ; ðHz ; Hi > 0Þ; ði; z ¼ 1; 2; . . . ; mÞ. The answer with greater H and lesser r2 is the answer

481


which is several times repeated in historical data with less dispersion which implies the higher order or the greater H is always chosen. If these two phases both failed to find the best order, the next agent (emotional agent) will choose the best order. 3. Emotional Agent (AE): Decide about the order by using the emotional agent. The two aforementioned methods use rational agents and when rationality cannot find a solution, emotions should be utilised, just like the human decision making process. This technique can be also regarded as integration of human/ expert knowledge to an expert system; however, since this information comes from the hunches of the FTS users, this is an emotional signal (further information is available in Sections 4.3.3–4.3.5). Therefore, integration of human knowledge or hunches according to the previous experiences is called emotional signal (Antoine Bechara, 2005). According to Bechara et al. the right combination of rationality and emotion yields advantageous decision making (Antoine Bechara, 2005). To do an advantageous decision making, the emotional decision making agent chooses the ith-order of FTS where i is varied based upon the dataset which the FTS is used for. In our study i is three for FOREX application. Investigations by Li and Cheng (2008) and our observations show that three (3) is an appropriate order for financial data prediction. Therefore this experience is formulated as an emotional signal in the function of the final issue in the decision making procedure. Consequently, if the two previous agents fail to find the best order, hunches of the FTS users in the role of an emotional signal choose the best or the ith-order (3rd order on FOREX historical data). For other applications, the hunches of FTS users on the best order should be learnt and replaced with i. 4.3. Introduction to the techniques used in adaptive order selection module To find the best order, three different agents are employed. Each agent is an approach to the problem solving and tries to find the optimal solution. To introduce why these agents were chosen, the following subsections include necessary information about each agent and the corresponding technique it uses.

4.3.1. Introduction to voting technique When number of the choices is increased, decision making becomes complicated. Voting is an intuitive approach to aggregate different choices. Lang in 2004 and 2005 and Chevaleyre et al. in 2006 have discussed different voting methods that are mostly computationally hard algorithms (NP problems). Chevaleyre et al. surveyed a number of voting techniques in their paper (Chevaleyre, 2006) and we have investigated those techniques for problem of aggregation in the adaptive order selection module. Thereafter, a simplified and problem-oriented voting technique based upon popularity is proposed. Choosing the popularity and simplifying it, is because of continuous nature of voting in the adaptive order selection; as a result, no NP-complete voting algorithm is appropriate and a simple yet efficient aggregation as presented in Section 4.2 is suited to the problem we addressed. Time complexity of the proposed voting technique is O(n2), where n is the number of voters.

4.3.2. Introduction to statistical analyst Variance gives the necessary information on dispersion of data causing uncertainty and inconsistency in decision making process. H (hits) demonstrates the number of successful matched patterns in the historical data; thus, by increasing the H, uncertainty is decreased. In addition, decrement of dispersion is desired to increase consistency of the outputs.

4.3.3. Introduction and roles of emotions in human decision making process According to the experience of the neurologists (Bechara & Damasio, 2005), human with a lesion in amygdala (the section of brain that is responsible for emotion) could not perform advantageous decision making. In a playing card scenario, people with emotional problems were unable to make the best decisions during gambling play (Bechara & Damasio, 2005). Thus emotion plays a key role in decision making process and it cannot be separated from rationality. On the other hand, relying too much on emotion in decision making processes can yield inferior final results. As a result, an appropriate decision making could be one that finds a balance between emotion and rationality.

Table 4 Time complexity comparison. Time complexity The Proposed Approach

OProposedApproach ¼ Oðk nÞ þ ðm2 p2 Þ þ ðm2 Þ where k is the number of training epochs for SOM, p is the number of training data, n is the number of intervals, and mis the maximum order chosen by user (the prediction is carried out from 1st order up to mth-order)

High-order method (Chen, 2002)

OHFTS ¼ O½ðn pÞ þ ðk p2 Þ þ ðk pÞ where n is the number of intervals, p is the total number of training data, and k is the order of the FTS

FTS and Genetic Algorithm (Chen & Chung, 2006)

OFTSGA ¼ O½ðp2 Þ þ ðG I q2 Þ where p is the total number of training data, G is the number of generations, I is the number of individuals and q is the number of training data are used in genetic algorithm (usually q < p to speed up the process of partitioning)

Table 5 MSE error rates (average error rates along with its SDa). The proposed method

USD/EUR USD/GBP EUR/GBP a


GA-FTS and Genetic Algorithm (Chen & Chung, 2006)

Mean

SDa

Mean

SDa

Mean

SDa

1.1042e005 7.1063e005 8.6845e005

4.2306e004 5.6123e006 1.9308e005

2.1757e004 1.5131e004 4.7017e004

1.7702e004 6.2356e005 2.5174e004

0.0011 9.4688e005 2.2954e004

3.4695e004 2.2318e005 8.1593e005

Standard deviation.

482


4.3.4. Function of emotions in financial decision making process In economy and financial managements ‘gut-feeling’ and ‘hunches’ plays a key role in decision making process (Bechara & Damasio, 2005). These emotional feelings are obtained from an individual’s earlier experiences and are important due the fact that they convey the personal aspects of an individual’s experiences which may not be enumerable. In economy and financial managements, experience has shown to play a key role to success. Therefore, it seems reasonable to augment them in an AI-based system to reach a more prudent decision making process.

the most time consuming term in a polynomial time complexity expression. Therefore we would have:

4.4. Time complexity consideration Accuracy rate is not only the key to compare algorithms; however, time complexity is also significant. In this paper, the time complexity for the proposed approach is derived and compared with GA-FTS and Chen’s high-order method. Section 4.1.1 discusses the time complexity of the proposed approach. Proofs for other methods are available in Appendix A. Section 4.4.2 compares different time complexity results. 4.4.1. Time complexity of the proposed approach The proposed approach consists of three main parts. The first is to partition the universe of discourse into unequal intervals, the second is to make the prediction and the third is to select the best order adaptively. Therefore, the time complexity is:

OProposedApproach ¼ ðOClustering þ OPrediction þ OAutomaticOrderSelection Þ OClustering can be computed by the bellow formula:

OClustering ¼ O½k ð1 þ n þ n þ N BMU Þ where k is the number of training epochs, n is the number of intervals, and NBMU is the maximum number of neighbours for best matching. SOM and its method of clustering is described in Section 3. Time complexity of making predictions is as follow:

" OPrediction ¼ O ðp nÞ þ

m X

j

j¼1

p X

! i þ p

m X

!# j

j¼1

i¼1

where p is the number of training data being used in FTS prediction, mis the maximum order was chosen by the user (the prediction is accomplished from 1st order up to mth-order). n) is P (p Pp m the time complexity of fuzzifying historical data. j¼1 j i¼1 i is time complexity of making m fuzzy relationships and the Pm p j¼1 j is the time complexity of making prediction from the derived fuzzy relationships (the time complexity for prediction of first-order is (p) for second-order is (2 p) and for having m prediction from first-order up to mth-order, the time complexity P is p m j¼1 j ). The time complexity of adaptive order selection is:

" OAutomaticOrderSelection ¼ O

m X i

! i þmþ

m X

!

#

j þ1

j

where m is the maximum order was chosen by user prediction P(the m is accomplished from 1st order up to mth-order), i i is the time complexity of voting, Pm is the time complexity of first phase of m statistical analysis, j j is the time complexity of second phase of statistical analysis and 1 is the constant order of emotional decision making process. According to the computation complexity techniques (Neapolitan & Naimipour, 2004) in calculating the big O, we should choose

Fig. 3. (a)–(c) One of the predicted signals for each currency-pair. The proposed method has been compared with high-order method (Chen, 2002) and FTS with genetic algorithm (Chen & Chung, 2006).

483


OClustering ¼ Oðk nÞ

OPrediction ¼ O

m X

j

j¼1

p X

! i

i¼1

The number of rows in dataset is 670, from these 670 rows of data 450 consecutive rows are randomly-selected for training and 10 randomly-selected rows are selected for testing. Since repetition of the same test yields different error value, the test is repeated 10 times and the average error along with its standard deviation using MSE and MAPE keys are reported in Tables 4,5. This number of testing and training data is common for financial data prediction (Li & Cheng, 2007). Fig. 3 illustrates one of these tests per ach currency-pair. MAPE and MSE are computed by using Eqs. (7) and (8).

m ðm þ 1Þ p ðp þ 1Þ ¼O 2 2

¼ Oðm2 p2 Þ

OAutomaticOrderSelection ¼ O

m X i

iþ

m X

! j

j

m ðm þ 1Þ ¼O 2 2

¼ Oðm2 Þ As a result, the total time complexity of the proposed approach is:

OProposedApproach ¼ O½ðk nÞ þ ðm2 p2 Þ þ ðm2 Þ

ð4Þ

where k is the training epochs for SOM, p is the number of training data, n is the number of intervals and m is the maximum order was chosen by user (the prediction is carried out from first-order up to mth-order). 4.4.2. Time complexity comparison Time complexity of the proposed approach is compared with Chen’s high-order method (Chen, 2002) and FTS with genetic algorithm (Chen & Chung, 2006). Table 4 reports the time complexities. By comparing the results in Table 4, it can be seen that the approaches are functions of different things. The time complexities can be compared by considering the most time consuming terms. The most time consuming part of the proposed approach is (m2p2); therefore it is an exponential term which is power of 4. Chen’s high-order method’s most time consuming term is (k p2) which is also exponential, power of 3. FTS with genetic algorithm’s most time consuming term is (G I q2) which is exponential power of 4. We can conclude that the proposed approach is similar to GAFTS in terms of time complexity; however, both approaches are computationally harder than Chen’s high-order method. 5. Experimental results To compare the proposed approach with the previous studies on FTS, a number of tests and comparisons are conducted on FOREX daily dataset (FOREX quotes). The dataset is the real quotes of currency-pairs in FOREX that are obtained from http://fx.sauder.ubc.ca/ (courtesy of the University of British Columbia, Sauder School of Business). Three currency-pairs being USDEUR, USDGBP and EURGBP are chosen and investigated. The FTS setting we have chosen with respect to application in FOREX is as follows: Number of intervals (n) = 7, adaptive order selection is accomplished between first up to fifth-order (m = 5). For clustering data by Neural Network Toolbox of MatlabÒ the chosen parameters are as follows: number of clusters: 7, topology: hexagon, distance: Euclidian, ordering phase learning rate: 0.2, ordering phase steps: 100.

MAPE ¼ MSE ¼

n 1X Real Value Forecated Value n j¼1 Real Value

ð7Þ

n 1X ðForecated Value Real ValueÞ2 n j¼1

ð8Þ

To compare the proposed method, two previous studies are chosen. The first study uses high-order FTS but it partitions the universe of discourse equally (Chen, 2002) and the second study partitions the universe of discourse unequally using genetic algorithm (Chen & Chung, 2006) but does not use higher orders of inputs. To the best of the authors’ knowledge, there is no study that combines both techniques; therefore the proposed method is compared with each study separately and the numerical results are reported in Tables 5 and 6.

6. Conclusions and future works Order selection is an important but often overlooked step of identification. Conventional identification algorithms pre-assign order based on analysis of physical laws, intuition or thru an ad hoc procedure. While this may be adequate for systems that maintain a general behaviour in time, or those whose behaviour is generally identifiable by analysis of physical laws, it becomes a costly assumption for sufficiently complex systems that change their general behaviour in time. Identification of such systems, such as financial time series, requires a learning algorithm that adjusts not only its parameters but also its structure. Another aspect that is often overlooked is appropriate partitioning of the decision space. Most learning techniques begin their analysis by equally partitioning the decision space. However, different parts of the decision space can require different grades of granulation. In contrast to the earlier techniques, the proposed high-order fuzzy time series identification scheme utilises an adaptive order selection scheme and partitions the universe of discourse using self organising maps. This partitioning scheme allows different granularity at different parts of decision space based. The proposed technique is then applied to prediction of FOREX daily dataset. To compare performance, two earlier studies are also applied to this time series data. Results indicate that the proposed method surpasses the two earlier studies by providing more accurate prediction. The improvement in terms of precision is obtained by using the SOM to partition the universe of discourse unequally

Table 6 MAPE error rates (Average error rates along with its SDa). The proposed method

USD/EUR USD/GBP EUR/GBP a


FTS and Genetic Algorithm (Chen & Chung, 2006)

Mean

SDa

Mean

SDa

Mean

SDa

0.0032 0.0083 0.0057

0.0103 0.0013 0.0035

0.0119 0.0154 0.0245

0.0041 0.0037 0.0078

0.0066 0.0115 0.0145

0.0078 0.0028 0.0028

Standard deviation.

484


and using the adaptive order selection to find out the best order at different times. However, this hybrid algorithm also requires more computation. While this limitation is not severely hampering for the considered example, since decisions are made only on a daily basis here, it can become a limiting factor when fast decision and adaptation is required. Future direction of this research is to reduce execution time by using more parallel agents. We also believe better performance can be obtained by exploiting the uncertainty in the information and the decision making process. Appendix A A.1. Time complexity of the FTS with genetic algorithm The time complexity of the FTS with genetic algorithm (OFTSGA) introduced in Chen and Chung (2006) is as follows:

OFTSGA ¼ ðOGeneticAlgorithm þ OPrediction Þ where OGeneticAlgorithm is the time complexity for partitioning the universe of discourse unequally using genetic algorithm and OPrediction is the time complexity for fuzzifying data, deriving fuzzy relationships and predicting. OGeneticAlgorithm can be obtained by the bellow formula:

OGeneticAlgorithm ¼ O½ðG IÞðP c þ Pm þ OF y þ 1Þ where G is the number of generations, I is the number of individuals, Pc is the combination probability, Pm is mutation probability, OF y is the time complexity of the fitness function and 1 is the time complexity of selection being performed one time for each individual in the each generation. Time complexity of OF y for the problem of partitioning the universe of discourse is:

" OF y ¼ O ðn qÞ þ

q X

!

i þq

OF y ¼ O

i

i¼1

¼O

q þ ðq þ 1Þ ¼ Oðq2 Þ 2

Therefore OGeneticAlgorithm is:

OGeneticAlgorithm ¼ O½ðG IÞðP c þ Pm þ q2 þ 1Þ Again we use the most time consuming term in the above polynomial to calculate big O in order to find the time complexity of genetic algorithm in partitioning the universe of discourse unequally, which is:

OGeneticAlgorithm ¼ OðG I q2 Þ The time complexity of OPrediction is:

" OPrediction ¼ O ðn pÞ þ

p X

! i

¼O

i¼1

p ðp þ 1Þ ¼ Oðp2 Þ 2

As a result, the time complexity of OFTSGA is:

OFTSGA ¼ O½ðp2 Þ þ ðG I q2 Þ

ð5Þ

where p is the total number of training data, G is the number of generations, I is the number of individuals and q is the number of training data are used in genetic algorithm (usually q < p to speed up the process of partitioning). A.2. Time complexity of Chen’s high-order method The time complexity of Chen’s high-order model (OHFTS) which is introduced in Chen (2002) is as follows:

" OHFTS ¼ O ðn pÞ þ k

p X

!

#

i þ ðk pÞ

i¼1

where n is the number of intervals, p is the total number of training data and k is the order of the FTS. (n p) is the complexity of fuzz P ifying the data, k pi¼1 i is the complexity of deriving fuzzy relationships and (k p) is the time complexity of finding the appropriate fuzzy relationship and defuzzification. By using the aforementioned techniques in calculating big O, we have:

ð6Þ

where n is the number of intervals, p is the total number of training data and k is the order of the FTS.

where n is the number of intervals, q is number of training data are used for finding the optimal interval length (q should be much smaller that total training data or p to speed up the process of parP titioning). (n q) is the complexity of fuzzifying the data, qi¼1 i is the time complexity of deriving fuzzy relationships and q is the time complexity of finding the appropriate fuzzy relationship and defuzzification. According the computation complexity techniques (Neapolitan & Naimipour, 2004) in calculating the big O, we should choose the most time consuming term in a polynomial time complexity expression. Therefore we would have:

!

OPrediction ¼ O

p X

OHFTS ¼ O½ðn pÞ þ ðk p2 Þ þ ðk pÞ

#

i¼1

q X

complexity of deriving fuzzy relationships and p is the time complexity of finding the appropriate fuzzy relationship and defuzzification. To calculate the time complexity, the most time consuming term should be chosen. Therefore we would have:

# iþp

i¼1

where n is the number of intervals, p is the total number of training P data. (n p) is the complexity of fuzzifying the data, pi¼1 i is the

References Antoine Bechara, A. R. D. (2005). The somatic marker hypothesis: A neural theory of economic decision. Games and Economic Behavior(52), 336–372. Bahrepour, M., Akbarzadeh-T., M.-R., & Yaghoobi, M. (2008). A novel fuzzy time series. In 13th Iranian computer conference, Kish Island, Persian Gulf. Bechara, A., & Damasio, A. R. (2005). The somatic marker hypothesis: A neural theory of economic decision. Games and Economic Behavior(52), 336–372. Chen, S.-M. (1996). Forecasting enrollments based on fuzzy time series. Fuzzy Sets and Systems(81), 311–319. Chen, S.-M. (2002). Forecasting enrollments based on high-order fuzzy time series. Cybernetics and Systems: An International Journal(33), 1–16. Chen, S.-M., & Chung, N.-Y. (2006). Forecasting enrollments of students by using fuzzy time series and genetic algorithms. Information and Management Sciences, 17(3), 1–17. Chevaleyre, Y. et al. (2006). A short introduction to computational social choice. Publications of the Universiteit van Amsterdam (Netherlands). Demuth, H., Beale, M., & Hagan, M. (2006). Neural network toolbox, for use with MATLAB. User’s guide. The MathWorks. Gupta, M. M., Jin, L., & Homma, N. (2003). Static and dynamic neural networks, from fundamentals to advanced theory. IEEE Press. Gurney, K. (1997). An Introduction to Neural Networks. CRC Press. Huarng, K. (2001). Heuristic models of fuzzy time series for forecasting. Fuzzy Sets and Systems, 123, 369–386. Li, S.-T., & Cheng, Y.-C. (2007). Deterministic fuzzy time series model for forecasting enrollments. Computer and Mathematics with Applications, 53(12), 1904–1920. Li, Shen-Tun, & Cheng, Yi-Chung (2008). Deterministic fuzzy time series model for forecasting enrollments. Computer and Mathematics with Applications. Neapolitan, R. E., & Naimipour, K. (2004). Foundations of algorithms using C++ pseudocode: Using C++ pseudocode. Jones & Bartlett Publishers. Software, I.O. Self-Organizing Maps Overview (2004). Available from: . Song, Q. B. S. C. (1993). Fuzzy time series and its models. Fuzzy Sets and Systems, 54, 269–277. Song, Q., & Chissom, B. S. (1993a). Forecasting enrollments with fuzzy time series— part I. Fuzzy Sets and Systems(54), 1–9.

M. Bahrepour et al. / Expert Systems with Applications 38 (2011) 475–485 Song, Q., & Chissom, B. S. (1993b). Fuzzy time series and its models. Fuzzy Sets and Systems, 54, 269–277. Song, Q., & Chissom, B. S. (1994). Forecasting enrollments with fuzzy time series— part II. Fuzzy Sets and Systems, 62, 1–8.

485

Sullivan, J., & Woodall, W. H. (1994). A comparison of fuzzy forecasting and Markov modeling. Fuzzy Sets and Systems, 64, 279–293. Yu, H.-K. (2005). Weighted fuzzy time series models for TAIEX forecasting. Physica(349), 609–624.