Financial Time Series Prediction Using a Support ...

Financial Time Series Prediction Using a Support Vector Regression Network Boyang LI, Jinglu HU, and Kotaro HIRASAWA

Abstract— This paper presents a novel support vector regression (SVR) network for financial time series prediction. The SVR network consists of two layers of SVR: transformation layer and prediction layer. The SVRs in the transformation layer forms a modular network; but distinguished with conventional modular networks, the partition of the SVR modular network is based on the output domain that has much smaller dimension. Then the transformed outputs from the transformation layer are used as the inputs for the SVR in prediction layer. The whole SVR network gives an online prediction of financial time series. Simulation results on the prediction of currency exchange rate between US dollar and Japanese Yen show the feasibility and the effectiveness of the proposed method.

I. I NTRODUCTION Financial forecasting is a typical example of time series analysis problem which is challenging due to high noise, nonstationarity, and non-linearity [1]. The prediction of future events from noisy time series data is commonly done using various forms of statistical models. Support vector machines (SVM) are closely related to statistical models and have been proved useful in a number of regression applications [2]. The SVM used as a regression predictor is called Support Vector Regression (SVR). However, There still are some fundamental limitations and inherent difficulties when using SVRs for the processing of high noise, non-stationarity signals. The first problem of learning in SVR is ill-posed, i.e. there are infinitely many models which fit the training data well, but few of these generalize well. In order to form a more accurate model, it is desirable to use as large a training set as possible. However, for the case of highly non-stationary data, such as foreign currency exchange data, increasing the size of the training set results in more data with statistics that are less relevant to the task. The second problem is overfitting, which is a result of the high noise in the data. Random correlations between the inputs and outputs present great difficulty in modeling. The models typically do not explicitly address the temporal relationship of the inputs, e.g. they do not distinguish between those correlations which occur in temporal order, and those which do not [3][4]. In addition, differing from other regression problem, in time series prediction, especially financial forecasting, the price is used not only as the input but also the output [3]. Boyang LI, Jinglu HU, and Kotaro HIRASAWA are with the Graduate School of Information, Production and Systems, Waseda University, Hibikino 2-7, Wakamatsu-ku, Kitakyushu-shi, Fukuoka-ken, Japan (phone/fax (+81)93-692-5271; email: [email protected], {jinglu, hirasawa}@waseda.jp)

With the changing of price, the performance of price curve is also changed. In other words, the current price of a stock or the current exchange rate of a currency is also an important factor in the prediction. As a complex regression problem, financial forecasting has several different input variables and it is hard to determine the relationship between the input and the target. Maybe a variable is useful when the price is high, but in the low price region, it is useless even like a noise. So we propose a new intelligent time series prediction method which addresses these difficulties. The method uses a network consisting of several SVRs to convert the original input into some symbolic representations. In existing methods, data usually split up into some clusters by using some clustering and other methods as the preprocessing before regression [1][5][6]. But in the proposed method, we divide the training data into some subsets based on its output. For example, we divide the data into 3 regions: high-price region, mid-price region and low-price region. In each price region, we use an SVR to describe the characters of time series and select useful input variables in each price region. These 3 SVRs make up of the transformation layer of the network. Their outputs are used as the input of the final predictor. In other words, these outputs from the transformation layer are the representations of the inputs for different price regions and the SVR used as the last predictor combine them to get the final result. We apply the whole SVR network to give an online prediction of the currency exchange rates, and we find significant predictability in comprehensive experiments covering foreign exchange rate. This paper is organized as follows: Section 2 provides an overview of financial time series data prediction and analysis on its properties. Section 3 introduces the idea and structure of the general model we propose, and its application to financial time series prediction. Section 4 discusses results, comparing the performances of proposed model with conventional SVR for different foreign exchange rates prediction. Section 5 presents conclusions. II. F INANCIAL T IME S ERIES DATA P REDICTION A. Properties of financial time series The domain of financial time series prediction is a highly complicated task and has following properties [5]: • Some researchers asserted that financial time series is a random-walk process, rendering the prediction impossible from a theoretical point of view. • Financial time series are always subject to macroeconomic, regime shifting, macro-control and so on, i.e.

622 c 978-1-4244-1821-3/08/$25.002008 IEEE

•

statistical properties of financial time series are different at different points in time domain and price domain [7]. Financial time series are usually very noisy, there is a large amount of random (unpredictable) day-to-day variations [8].

The predictability of most common financial time series is a controversial issue and has been questioned in scope of the efficient market hypothesis (EMH) [9]. But the noise is still a serious problem in financial time series prediction [10]. B. Input selection

to partition the price domain into 3 regions, a simple way is to calculate two dividing values d1 and d2 as follows: d1 = TM in + (TM ax − TM in )/3 d2 = TM ax − (TM ax − TM in )/3

(1)

Then the training data is divided into following 3 regions: ⎧ ⎨ high-price region, (Tmax > T > d2 ) mid-price region, (d2 > T > d1 ) (2) ⎩ low-price region, (d1 > T > TM in )

Because financial time series are subject to many factors and their effectiveness is difficult to be determined, how to select the input variable is also a problem. There are several different types of data can be used as the variable in the financial time series prediction, but at different points in time domain and price domain, their effectiveness are different. In our research, we want to predict the price of exchange rate between US dollar and Japanese Yen. For including more useful information, we choose two kinds of data as the input variables: technical data and fundamental data. •

•

Technical data is usually referred to by financial time series prediction. This kind of data includes such figures as past prices etc. Fundamental data are data describing current economic activity in the world or a country whose currency price is to be predicted. Further, fundamental data include information about current market situation as well as macroeconomic parameters.

So we choose past prices of exchange rate between US dollar and Japanese Yen, past prices of exchange rate between US dollar and Euro, NIKKEI 225 and price of oil as input variables [11].

(a) target output

(b) high-price region

III. SVR N ETWORK BASED ON T HE P RICE D OMAIN D IVISION A. Price domain partition As discussed in Section 2, financial time series are subject to macroeconomic, regime shifting, macro-control, etc., and its statistical properties are different at different points in time domain and price domain. In other words, it is hard to know that which input variable is useful in a certain time or a certain price region. So it is difficult to build a model to generalize well in the whole price domain. It is natural to consider dividing the price domain into some subregions and building a regression model for each price region. Some existing methods also divide the training data into some subsets before the regression process, but their methods are all based on the input variables, such as clustering and so on. Differently, our method divides the price domain into some sub price regions and the price is the output of financial time series. Consider a training data set with output T , the maximum exchange rate is TM ax and the minimum is TM in . If we want

(c) mid-price region

(d) low-price region Fig. 1.

Price domain partition

2008 International Joint Conference on Neural Networks (IJCNN 2008)

623

As shown in Fig. 1, the significant data in each price region are reserved as the sub-target outputs f1 (t), f2 (t) and f3 (t). These sub-target outputs are used to train SVRs in transformation layer. B. Structure of SVR network After the price domain partition of the training data as the formula (2), we use SVRs as the regression models in these 3 regions. SVR is the SVM used as a regression model that is introduced by Vapnik and has attracted much research attention in recent years due its demonstrated improved generalization performance over other techniques in many real world applications [12][13][14]. The main difference between SVM and these techniques including neural network is that it minimizes the structural risk instead of the empirical risk [6][15]. The principal is based of the fact that minimizing an upper bound on the generalization error rather than minimizing the training error is expected to perform better. The generalization error rate is bounded by the sum of training error rate and a term that depends on Vapnik-Chervonenkis(VC) dimension. VC dimension is a measure of complexity of the dimension space. SVRs find a balance between the empirical error and the VC-confidence interval [16]. SVRs are able to construct spline approximations of given data independently from the number of input-dimensions regarding complexity during training and with only linear complexity - compared to exponential complexity in conventional methods. In this paper, we choose the past currency exchange rates, the price of oil and NIKKEI 225 as the input variables and the number of input-dimensions is more than common cases only using past prices. So it is suitable to choose the SVR to deal with this high input-dimension data. We construct the whole structure of SVR network as shown in Fig. 2. SVR network has 4 SVRs, 3 of them make up of the transformation layer to transform the input into symbolic representations. These 3 representations, in other words outputs from transformation layer, are used as the inputs of the SVR in prediction layer. The training data for SVRs in transformation layer are the data in 3 different regions in the formula (2). In this paper we are interested in using SVRs to describe the relationship between the input and output in different price regions. We restrict our attention to multilayer perceptrons with a single transformation layer. For scalar foreign exchange prediction this network has n-dimension input X(t) = (x1 , x2 , x3 , ..., xn ). Let f (t) indicate the target output of the time series prediction, Fx (t) indicate the currency exchange rates of Euro against US dollar, N (t) indicate the value of NIKKEI 225, and O(t) indicate the price of oil. Using the past 5 days data to predict the next day’s price, the input can be written in details as X(t) = (f (t), f (t − 1), f (t − 2), f (t − 3), f (t − 4), Fx (t), Fx (t − 1), Fx (t − 2), Fx (t − 3), Fx (t − 4), N (t), N (t − 1), N (t − 2), N (t−3), N (t−4), O(t), O(t−1), O(t−2), O(t−3), O(t− 4)), and the input-dimension is 20.

624

(a) SVR network training

(b) SVR network prediction Fig. 2.

Structure of SVR network

As introduced in the previous subsection, the first step of training process is data partition. The output-dimension of data is only 1 and smaller than the input-dimension of data, so the partition based on the output price is much easier than clustering method based on the input. Using training data corresponding to f1 (t), f2 (t) and f3 (t) in price regions , we train 3 SVR models in transformation layer to minimize e1 (t) = f1 (t) − fˆ1 (t), e2 (t) = f2 (t) − fˆ2 (t) and e3 (t) = f3 (t) − fˆ3 (t). The output of the transformation layer and the final target f (t) in training data are used as the input and the standard output for the prediction layer. Minimizing e(t) = f (t)−fˆ(t), we can construct SVR4. Then these 4 SVRs are used as the prediction model to predict the test data. The SVR network can give an online prediction of the currency exchange rate. The flowchart of time series prediction is shown in Fig. 3. Because the prediction is an online process, we should update the training data and repeat all operations after the prediction [17]. C. Mathematical analysis of SVR network As shown in Fig. 2, we use 4 SVRs in the proposed model. Although they are used for different price regions or different layers, we use the same structure and algorithm. The basic idea of SVR is nonlinearly mapping the input data into a high dimensional feature space by means of a kernel function. And then do the linear regression in the transformed space. The whole process results in nonlinear regression in the low-


the trading-off between the training error and model flatness. Introducing the slack variables ξ and ξ ∗ , SVR regression is formulated as minimization of the following optimization problem: 1 ω2 + C (ξi + ξi∗ ) 2 i=1 N

minimize

(6)

⎧ ⎨ gˆ(Xi ) + b − yi ≤ + ξi yi − gˆ(Xi ) − b ≤ + ξi∗ subject to ⎩ ξi , ξi∗ ≥ 0

Fig. 3.

The above optimization with constraint is a quadratic programming problem that can be solved by constricting a Lagrangian and transforming into the dual problem, and its solution is given as follows.

Flowchart of time series prediction using SVR network

dimensional space [18]. Consider a data set consisting G = (Xi , yi )N i=1 of N data points where each input Xi is mapped into the corresponding output yi . In our research, we use the time series data, so the input data can be written as X(t). Given that the data set realizes some unknown function f (t + 1) = g(X) = y. And we need to determine a function fˆ(t + 1) = gˆ(X) that approximates f (t + 1), based on the knowledge of data set G. In the SVR, the vector X(t) is first mapped into a higher dimensional space F via a nonlinear mapping, and perform linear regression in this space. Then SVR approximates the function as gˆ(Xi ) =

D

ωi Φ(Xi )+b with Φ : n → F, ω ∈ F

fˆ(t + 1) = gˆ(X) =

N 1 1 |yi − gˆ(Xi )| + ω2 N i=1 2

(4)

where is a parameter to be set a priori and an error below is not penalized according to the following error function. 0 if |yi − gˆ(Xi )| < (5) |yi − gˆ(Xi )| = |yi − gˆ(Xi )| otherwise The fist term in Eq. (4) describes the -insensitive loss function and the second term is a measure of function flatness. Thus SVRs perform linear regression in high dimensional feature space using -insensitive loss and at the same time tries to reduce model complexity by minimizing ω2 . The constant C > 0 is a regularization constant determining

(7)

where Lagrange multipliers αi and αi∗ are associated with each data point Xi , and subject to the constraints 0 ≤ N αi∗ ,αi ≤ C and i=1 (αi∗ − αi ) = 0. Training points with nonzero Lagrange multipliers are called support vectors. The smaller the fraction of support vectors, the more general the solution is. But large support vectors do not necessarily mean an overtrained solution. The kernel function K(·) describes an inner product in the D-dimensional space as below and satisfies the Mercer’s condition. K(X, Xi ) =

i=1

R=C

(αi∗ − αi )K(X, Xi ) + b

i=1

(3)

where ωi are the coefficients and b is a threshold value. This approximation can be considered as a hyperplane in the Ddimensional feature space F defined by the functions Φ(X) where the dimensionality can be very high, possibly infinite. Since Φ is fixed, ωi can be determined from the data by minimizing the sum of empirical risk and a complexity term defined in the following risk function.

N

D

Φ(X)Φ(Xi )

(8)

i=1

The coefficients α, α∗ are obtained by maximizing the following quadratic form subject to the conditions stated earlier: R(α, α∗ ) = −

N

yi (αi∗ − αi ) −

i=1 N

N

(αi∗ + αi )

i=1

1 (α∗ − αi )(αj∗ − αj )K(Xi , Xj ) 2 i,j=1 i

(9)

Once the coefficients are determined, the regression estimate is given by Eq. (7). The threshold b is computed from the constraints in Eq. (6) using the fact that the first constraint becomes an equality with ξi = 0 if 0 < αi < C, and the second constraint becomes an equality with ξi∗ if 0 < αi∗ < C. In the SVR network, assume that X hi , X mid and X lw denote the inputs for high-price region, middle-price region and low-price region, then for SVRs in conversion layer


625

Eq. (7) needs to be rewritten as follows: fˆ1 (t + 1)

=

N

to attain low SSE and MAE but high CP and CD. In practice, sometimes we may have a model that yields superior SSE and MAE but inferior CP and CD. Some researchers have argued that directional change metrics may be a better standard for time series prediction.

(αi∗ − αi )K(X, Xihi ) + b

i=1

fˆ2 (t + 1) fˆ3 (t + 1)

= =

N i=1 N

(αi∗ − αi )K(X, Ximid ) + b (10) (αi∗ − αi )K(X, Xilw ) + b

i=1

fˆ1 (t + 1), fˆ2 (t + 1) and fˆ3 (t + 1) are the output of the transformation layer and the input of the prediction layer. Let the input of prediction layer Z = (fˆ1 (t+1), fˆ2 (t+1), fˆ3 (t+ 1)), then the final solution of the SVR in prediction layer is given as follows: fˆ(t + 1) = gˆ(Z) =

N

(αi∗ − αi )K(Z, Zi ) + b

(11)

i=1

The generalization performance (i.e., accuracy in predicting exchange rates in this study) also depends on the selection of kernel type [19]. Common kernel types are linear, polynomial and Gaussian kernels. This paper chooses linear kernel to build SVR models. IV. S IMULATIONS AND R ESULTS Two major methodologies for financial forecasting are technical and fundamental analysis [20]. Technical analysis has drawn particular academic interest due to the increasing evidence that markets are less efficient than was originally thought. In our study, we use past exchange rates, NIKKEI 225 and price of oil as input variables. In the simulation of predicting the next day’s rate, previous 5 days’ exchange rates are used to build SVR network to predict following target rate. The indicators are f (t), f (t−1), f (t−2), f (t−3), f (t − 4), Fx (t), Fx (t − 1), Fx (t − 2), Fx (t − 3), Fx (t − 4), N (t), N (t−1), N (t−2), N (t−3), N (t−4), O(t), O(t−1), O(t − 2), O(t − 3), O(t − 4), namely, past 5 days’ exchanger rates between Japanese Yen and US dollar, past 5 days’ rates between US dollar and Euro, past 5 days’ values of NIKKEI 225 and past 5 days’ prices of oil. The predicted value f (t+1) is the next day’s exchange rate. Using the same method, we also predict f (t+5) and f (t+10), i.e., next week forecasting and next two weeks forecasting. In addition, the 5 days moving average (MA) data of the currency exchange rate of Japanese Yen against US dollar is also used.

B. Simulation conditions The foreign exchange rate of Japanese Yen against US dollar is used to test our model. The data showing the exchange rate during January 2000 to December 2006. In total 1499 daily data were considered of which the first 500 data were used as the initial training set. The remaining 999 daily data were used to validate the model and update the training set. To build SVR models we must select the kernel type and the value of the parameters C and . Comparing with other kernels in simulations, the model was build using the simple linear kernel defined as K(Xi , Xj ) = Xi · Xj in this study, where · is the inner product operator. Because over a wide range of value, C has very little impact on prediction performance for SVR model, we let C equals 10. In time sires prediction, the value of slightly affect the accuracy and does not cause any drastic degradation in performance, so we keep fixed at 0.001 as common cases. C. Results and discussions Table II shows the performance metrics for exchange rate of Japanese Yen against US dollar and its MA data. A comparison of results with conventional SVR model illustrates that SVR network can predict the exchange rate better, in terms of both accuracy (SSE and MAE) and trend prediction (DP and CD). SVR network model produces lower SSE and MAE and higher CP and CD than those of conventional SVR based model. Especially in the long term prediction, prediction by SVR network is much better than those achieved by the conventional SVR. Based on this property, we plan to build a long term trading system by using this model in our further research.

A. Evaluation of the model The forecasting performance of the above model is evaluated against 4 widely used statistical metrics, namely, Sum of Square Error (SSE), Mean Absolute Error (MAE), Correct Up trend (CP) and Correct Down trend (CD). These criteria are defined in Table I. SSE and MAE measure the deviation between the actual and forecasted values. Smaller values of these metrics indicate higher accuracy in forecasting. CP and CD measure the correctness of predicted up and down trend, respectively. In general, it is desirable for a forecasting model

626

Fig. 4.

Forecasting by conventional SVR and SVR network

Figure 4 shows the actual and forecasted time series by SVR network and Conventional SVR. For example, in


TABLE I P ERFORMANCE METRICS IN SIMULATIONS

SSE

SSE =

MAE

M AE =

k (yk

1 N

k

− yˆk )2

|yk − yˆk |

d

CP = 100 k tkk k

CP

dk =

1 0

if (ˆ yk − yˆk−1 ) > 0, (yk − yk−1 )(ˆ yk − yˆk−1 ) ≥ 0 otherwise tk =

1 0

if (yk − yk−1 ) > 0 otherwise

d

CD = 100 k tkk k

CD

dk =

1 0

if (ˆ yk − yˆk−1 ) < 0, (yk − yk−1 )(ˆ yk − yˆk−1 ) ≥ 0 otherwise tk =

1 0

if (yk − yk−1 ) < 0 otherwise

TABLE II P ERFORMANCE METRICS OF THE CURRENCY EXCHANGE RATE AND ITS MA DATA

Future forecast

Criteria

Currency exchange rate ( USD / JPY ) SVR SVR network

next day forecast (t+1)

SSE MAE CP CD

47.3925 0.1672 76.8116 79.4118

36.2098 0.1471 77.8468 79.8039

42.5252 0.1586 76.8116 78.4736

35.1087 0.1451 77.2257 80.4305

next week forecast (t+5)

SSE MAE CP CD

60.5918 0.1877 76.7635 78.3465

38.4611 0.1520 79.6680 79.3307

56.3304 0.1820 77.8468 77.9528

37.6336 0.1498 79.5031 79.3307

next 2 weeks forecast (t+10)

SSE MAE CP CD

71.5295 0.2052 77.4530 78.9370

39.7972 0.1544 80.1670 79.9213

67.7718 0.1998 77.0833 78.3465

40.3541 0.1548 80.4167 80.1181

predicting moving average exchange rate of Japanese Yen between day 300∼360 where a sudden sharp fall is followed by a steep rise, SVR network model could closely follow the actual rate whereas conventional SVR suffered from gross

Moving average data of exchange rate SVR SVR network

deviation. The ability of SVR network to closely following the actual rate in such a situation will also make a significant difference in the outcome of financial gain in the trading system.


627

V. C ONCLUSION AND D IRECTION FOR F UTURE R ESEARCH In this paper, we introduced a new regression model – SVR network, for dealing with the high noise, non-stationarity foreign exchange prediction problem. Differing from the existing methods, we use the price domain partition of data as the preprocess of training. Different SVRs are used to describe the characters of data and the relationship between input variables and the sub-target in each price region. The outputs of these SVRs are used as the input of the prediction layer, and then we obtain the final result. Because the whole price domain is partitioned into some sub intervals and separately dealt with, the problem caused by the unstable statistical properties of financial time series in price domain can be solved to certain extent. The method was experimented on the foreign exchange rate data of Japanese Yen against US dollar. The results obtained show that the SVR network approach performs better than the conventional SVR model. There remains some future problems to be solved. Since the price domain is just simply partitioned into 3 parts, maybe it is not the best way, a important problem is to confirm a proper partition mechanism. In addition, more studies should be done with more irregularly sampled data. And we also plan to design a long-term trading system based on the model proposed in this paper.

[16] Johan Suykens. “Least Squares Support Vector Machines”. In Tutorial IJCNN, 2003. [17] M. A. H. Dempster and C. M. Jones. “A real-time adaptive trading system using genetic programming”. In Quantitative Finance Volume 1, 397-413, 2001. [18] Mordecai Avriel. “Nonlinear Programming: Analysis and Methods”. In Dover Publishing, 2003. [19] O. Chapelle and V. Vapnik. “Model selection for Support Vector Machines”. In S. Solla, T. Leen, and K.-R. Müler, editors, Adv. Neural Inf. Proc. Syst. 12, Cambridge, MA, MIT Press, 2000. [20] Larry Williams. “Long-Term Secrets to Short-Term Trading”. John Wiley & Sons, April, 1999.

R EFERENCES [1] Neil F. Johnsona, David Lampera, Paul Jeeriesa, Michael L. Harta and Sam Howisonb. “Application of multi-agent games to the prediction of financial time-series”. In Elsevier, May 2001. [2] Klaus Robert Muller, Alexander J. Smola, Gunnar Ratsch, Bernhard Scholkopf, Jens Kohlmorgen and Vladimir Vapnik “Using Support Vector Machines for Time Series Prediction”. 1999. [3] M. A. H. Dempster and C. M. Jones. “Trading on the Edge: Neural, Genetic and Fuzzy Systems for Chaotic Financial Markets”. In New York Wiley, 1994. [4] JiFeng Huang, JiaJun Lin, XiaoFu He, Meng Dai. “The Algorithm for Detecting Hiding Information Based on SVM ”. In Advances in Neural Networks - ISNN 2004, 2004. [5] C. Lee Giles, Steve Lawrence, Ah Chung Tsoi. “Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference”. In Machine Learning, Volume 44, pp. 161-183, 2001. [6] Rodrigo Fernandez. “Predicting Time Series with a Local Support Vector Regression Machine”. In ACAI 99, 1999. [7] Dave Touretzky and Kornel Laskowski. “Neural Networks for Time Series Prediction”. In Artificial Neural Networks, 2006. [8] Dimitri Pissarenko. “Neural Networks For Financial Time Series Prediction: Overview Over Recent Research”. 2001-2002. [9] Yuehui Chen, Bo Yang and Jiwen Dong. “Time-series prediction using a local linear wavelet neural network”. In Neurocomputing 69 (2006) 449-465, 2006. [10] Michael Small and C. K. Tse. “Minimum description length neural networks for time series prediction”. In PHYSICAL REVIEW, 2002. [11] Justin Wolfers and Eric Zitzewitz. “Prediction Markets”. In Journal of Economic Perspectives. Volume 18, 2004. [12] N. Cristianini, J. Shawe-Taylor. “An Introduction to Support Vector Machines”. Cambridge, UK: Cambridge Univ. Press, 2000. [13] Steve R. Gunn. “Support Vector Machines for Classification and Regression”. In Technical Report, Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science, 10 May, 1998. [14] Alexander Smola. “Regression Estimation with Support Vector Learning Machines”. Technische Universität München, 1996. [15] Haizhon Li and Robert Kozma. “A Dynamic Neural Network Method for Time Series Prediction Using the KIII Model”. 2003.

628