Using A Trainable Neural Network Ensemble for Trend Prediction of ...

IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.12, December 2007

287

Using A Trainable Neural Network Ensemble for Trend Prediction of Tehran Stock Exchange Hossein Nikooa, Mahdi Azarpeikanb, Mohammad Reza Yousefib,c, Reza Ebrahimpourb,c, and Abolfazl Shahrabadia,d a

b c

Faculty of Management, Qazvin Islamic Azad University, Qazvin, Iran

Department of Electrical Engineering, Shahid Rajaee University, Tehran, Iran

School of Cognitive Sciences, Institute for Studies on Theoretical Physics and Mathematics, Tehran, Iran d

Department of Human Science, Zanjan Islamic Azad University, Zanjan, Iran

Summary This paper represents a study of neural network ensembles for stock price trend prediction. The historical data available in this case study are from Kharg petrochemical company in TSE (Tehran stock Exchange). This company is a big producer of petrochemicals, including methanol, in Iran and its stock price is very much dependent on world methanol price. The results show how neural network ensembles can overcome just a Multilayered Perceptrons (MLPs), as a Non-parametric combinatorial forecasting method. This study also demonstrates how we can bit the market without the use of extensive market data or knowledge.

Key words: Trainable neural network ensemble, Stock price trend prediction, Tehran Stock Exchange, Iran

1. Introduction The purpose of this study is to suggest an ensemble neural network for stock price trend prediction and to develop a new method for evaluating the networks. Market is a resultant of supply and demand, probabilities, preferences, and psychological, political, sociological, biological, physiological aspects of human being conception [1]. As mentioned in [2], “prices fully reflect all available information”, that is, the available information show their effects by influencing the stock prices. Considering the importance of stock prices, we propose a model, based on neural network ensembles, to trace the price trend. Considering the literature of the past three decades of Efficient Market Hypothesis, it is seen that the linear or even quadratic statistical approaches cannot provide us with the real perception of Market efficiency. As an alternative approach, we base our model on neural networks that are known for solving old problems with new findings. Jensen [3] defines market efficiency as: “A market is efficient with respect to information set qt if it is impossible to make economic profits by trading on the basis of information set qt. If qt is historical information, so this definition shows the weak form of market Manuscript received December 5, 2007 Manuscript revised December 20, 2007

efficiency. If qt is not only historical information, but also currently published information, the definition shows semi-strong form of market efficiency. And if qt is not only both historical and currently published information but also includes insider information, the definition shows the strong form of market efficiency”. Based on the above-mentioned definition, there are two distinct analytical methods for valuing a single stock. First, technical analysis based on trend movement, in which constantly changing attitudes of investors in response to different forces is reflected on stock trend. Using historical prices to predict future prices originally comes from this analytical method. The weak form of market efficiency hypothesis is in contradiction with technical analysis. Second, fundamental analysis, based on in-depth analysis of company’s performance and profitability, which discovers stock price intrinsic value. Fundamental analyzers believe in making profit by historical and currently published information about the company, the market condition, related industry and other environmental factors that can affect stock prices. The second method applies to a wide range of models for companies valuation (such as regression and correlation analysis, trend analysis of financial statements, decision trees, dividend base models, and different stock valuation models such as P/E model and dividend discount model ). The semi-strong form of market efficiency is in contradiction with fundamental analysis. In this paper, we examine the weak form of market efficiency on a single stock of Tehran Stock Exchange (TSE). In many real-world problems, such as the field of finance, Artificial Neural Network (ANN) models have outperformed statistical multiple-regression techniques in data analysis task. Neural networks have excellent capability to learn the relationship between input and output mapping for a given data set without any prior knowledge or assumptions. In these attempts, combining single neural networks to achieve higher accuracy is an important issue, which is referred to under different terms such as committee machine, neural network ensembles,

288


classifier fusion and multiple classifiers [4]. We expect to enhance the results by achieving different local minima on error surface, through the convergence of differently trained neural networks. Of the most famous methods of neural network ensembles are Min, Max, Average, Majority vote and stacked generalization, to name a few. Stacked generalization method, first introduced in [5], is a layered architecture, in which the classifiers at the level-0 receive the original data as their input. Successive layers receive the predictions of the layer immediately preceding it as an input and finally a single classifier at the top level outputs the final prediction. Stacked generalization attempts to minimize the generalization error by using classifiers at higher layers to learn the type of errors made by the classifiers immediately below. As we mentioned before, using historical prices to predict future trends originally comes from technical analysis. There is a wide range of research that use this factor as an input to neural networks [6-8, 10-17, 19-20], and also at least each tree months we get a financial report from the company which it is not adequate for short term prediction. In spite of these, because of high range of uncertainty over TSE, most investors are speculators and try to profit from short-term market fluctuations. This short term characteristics of trading bring us about developing trend prediction methods to forecast shortterm price movements. However, in this paper, we try a new methodology and a different network topology to enhance previous methods presented in the literature. In the literature, there are several works that utilize MLPs and neural network ensembles to predict stock price trend and stock market indices, however, it is the first time that a neural network ensemble, a trainable combining methodology (stack generalization), is applied to stock price trend prediction. In order to improve the prediction results, we apply a rejection criterion, which are optimum threshold values, for the output nodes.

hidden layers and a linear activation function for output layer. A major advantage of MLPs is its ability to represent non-linear decision surfaces. Hereafter, when referring to MLPs we imply feed-forward networks trained with Back-propagation algorithm. In figure 1, a typical topology of a fully connected MLPs is shown.

The rest of this paper is organized as follows: section 2 describes neural network methodology, section 3 provides a brief overview of ensembles, section 4 considers experimental results and discussion, and section 5 summarizes and concludes the paper.

latter ‘errors’ of a fully connected feed-forward network are the same as the number of output units (one for each output). To put it straight, each weight in (1) d h = oh (1 - o h ) wkh d k

2. Neural network methodology There is a vast literature on the application of backpropagation, BP, neural networks to stock market analysis [6-8, 10-17, 19-20]. It is assumed that a simple MLPs neural network with a single hidden layer and output layer can solve complex problems like non-linearity. We use non-linear hyperbolic tangent activation functions for

Fig.1. Multilayer Perceptrons (MLPs) architecture Back-propagation algorithm is a variation of Delta rule. While inputs are fed to the ANN forwardly, the ‘Back’ in Back-propagation algorithm refers to the direction to which the error is transmitted. Description of BP can be found in [22]. In table 1, we show the basic steps for the stochastic gradient descent of BP algorithm [23]. Here we introduce d h factor. There are target values just for the output units not for the outputs in each hidden layer. So we calculate summation of the errors, d h , that each output unit connected with the specified hidden unit ‘h’, instead of calculating (t k - ok ) . The total numbers of the

å

k outputs

gives the responsibility degree of hidden unit ‘h’ for the error in output ‘k’. The following algorithm can be converted to the standard gradient descent version of BP if the gradient becomes:

dk =

åo

nk n patterns

(1 - onk )(t nk -onk )

(2)

Where n is the number of one of the training patterns and k is the number of the output unit. Usually d h is divided by the total number of training patterns in order to

IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.12, December 2007 constrain the weight update to the mean of the updates caused by each training pattern. Table 1: stochastic gradient descent version of BP algorithm • Initialize all network weights to small random numbers • Until the termination condition (it will be discussed later) do: { For each training example do: { Propagate the input forward to the network and compute the observed Outputs. Propagate the errors backward as follows: For each network output unit k calculate its error term d k = ok (1 - ok )(t k - ok ) For each hidden unit calculate its error term

d h = oh (1 - oh )

åw

kh

dk

k outputs

Finally, update each weight where

w ji = w ji Dw ji where Dw ji = -h .d j .x ji } } ( pointer ‘ji’ means from unit ‘i’ to ‘j’)

3. Neural network ensembles Neural Networks Ensembles, from a learning point of view, can be coarsely divided into two categories of trainable and non-trainable ensembles. In non-trainable ensembles, the experts’ outputs are combined with a fix rule, such as Max, Min, Average, or Majority Vote. Trainable ensembles contain at least two levels of trainable networks. The experts’ outputs of the first level are combined by the second-level network, which is trained with the outputs of the first level. Actually, the second-level network, the combiner, learns to map the outputs of the first-level experts to the target data. Although most approaches of combining are to a large extent heuristic, there are some theoretical frameworks for neural network ensembles (see Ref. [24]). In this paper, we base our model on stacked generalization, which is under the category of trainable ensembles, and compare its performance with four of the most famous combining methods of non-trainable ensembles (namely minimum, maximum, majority vote and average methods). These methods are briefly described in the following.

289

Minimum Rule: in this method, the output node that is the maximum value among the minimums of experts’ outputs, determines the final decision. Maximum Rule: in this method, the output node that is the maximum value among the maximums of experts’ outputs, determines the final decision. Majority vote method: in this method, the final output is the experts’ output node that has been selected as the output by the majority of experts. Average method: in this method, the final decision is made by averaging the experts’ outputs. Stacked generalization method: the general framework of this method consists of two layers; each layer is formed by one or more MLPs networks. The networks of the first layer are trained with the input data and the target output. Predictions of the model from the first level along with the corresponding target class of the original input data are then used as inputs to train the network of the second layer. Feeding information from one generalizing set to another, before forming the final output, is the major concept of stacked generalization scheme. The basic uniqueness of this method is the multiple partitioning of the original learning set that makes the information fed into the generalizing net. We can keep into account the stacked generalization scheme as a more sophisticated version of cross validation and experimentally effective method to improve generalization ability of ANN models over using single MLPs [25]. The general framework of a neural network ensemble based on stacked generalization is shown in figure 2. As shown in Fig. 2, a set of K “level-0” networks from to are arranged as the first layer, and their outputs are combined using a “level-1” network . First, the level-0 networks are trained, using the input data and the target output. The outputs of the first layer with the corresponding target class are used to train the level-1 network.

Fig.2. block diagram of a multiple neural networks system based on stacked generalization


4. Experimental results and discussion Kharg petrochemical company, located in Kharg Island (Iran, Persian Gulf), is one of the major producers of chemical products. Kharg co’s products consist of Methanol (700000 Tones/year), Propane (114000 Tones/year), Butane (188000 Tones /year), Naphtha (C5+; 86000 Tones /year) and Sulphide (185000 Tones /year). It has an 8-year history, which is not sufficient data to use for forecasting. However, because of exporting, its products are very much dependent on world prices, and as a result, the company’s stock prices are volatile both in short- and long-term periods. Another point to be considered is the concept of market efficiency. In TSE, any evidence on market efficiency even in the weak form cannot be found. Therefore, we try it again in our work.

(

)

Nù é P2 = P1 + ê P 2 - P1 * ú if Mû ë

ì ï ï ï ïï í ï ï ï ï ïî

N ³1 M

1

(3) N