Exception Mining on Multiple Time Series in Stock Market

0 downloads 0 Views 195KB Size Report
multiple time series data which aims to assist stock market surveillance by identifying ... ket surveillance by using exception mining technologies on multiple time series in ... of exceptions from multiple time series and apply them in stock market ...
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Exception Mining on Multiple Time Series in Stock Market Chao Luo, Yanchang Zhao, Longbing Cao, Yuming Ou, and Chengqi Zhang Faculty of Engineering & Information Technology University of Technology, Sydney {chaoluo,yczhao,lbcao,yuming,chengqi}@it.uts.edu.au Abstract

refers to the trades based on the non-public information by insiders, such as the officers, directors, employees and major shareholders [4, 3]. Whereas market manipulation refers to the actions or trades which attempt to interfere with the free and fair operation of the market and create artificial, false or misleading appearances of a stock [5]. These anomalies severely impair the stock market and lead to the loss of other investors. Stock market surveillance is to identify the above illegal behaviors and maintain integrated stock market [2]. The process of stock market surveillance starts with alerts. For example, when the price of a security is above a predefined price threshold, an alert will be triggered immediately. Then the surveillance staff will analyze the generated alerts and will decide whether to conduct further investigations on them. The technologies to identify the anomalies in stock market have been developed for decades, from the simple rulebased approaches to advanced data mining technologies [2]. In the early time, the alerts were generated by some simple rules. Then, advanced statistical technologies are utilized, such as the mean and the deviation of stock price, in order to improve the accuracy of identification. Today, some advanced information technologies are used in stock exchanges, such as outlier mining [7], clustering [13], etc. Outlier mining is to separate outliers from large amount of normal data [9]. The outliers are defined as the data objects which obviously differ from the rest data. Traditionally, outlier mining technologies are classified into four categories: the statistical approach, the distance-based approach, the deviation-based approach and the density-based approach [9]. A successful application of outlier mining in stock market surveillance is the variance-based outlier mining model (VOMM) proposed by Qi and Wang [14]. In their research, they built the principle curves to describe the prices of stock and ranked the most deviated price movements as the outliers. There are some other researches on stock market surveillance by using clustering technologies [13], correlation analysis [15] and so on. Most of them are researching on the movements of a single measure over

This paper presents our research on exception mining on multiple time series data which aims to assist stock market surveillance by identifying market anomalies. Traditional technologies on stock market surveillance have shown their limitations to handle large amount of complicated stock market data. In our research, the Outlier Mining on Multiple time series (OMM) is proposed to improve the effectiveness of exception detection for stock market surveillance. The idea of our research is presented, challenges on the research are analyzed, and potential research directions are summarized.

1. Motivation The identification of exceptions in stock market plays an important role in stock market surveillance [2, 12]. The effective and efficient identification is beneficial to maintain a fair and efficient trading platform and avoid the waste of time and human resources caused by false alerts [2]. Therefore, we research on improving the accuracy of stock market surveillance by using exception mining technologies on multiple time series in stock market. Exception mining is to identify outliers, exceptional events or exceptional patterns from a large amount of data. Multiple time series data refers to the data with two or more measures (e.g., price volume and index) except time measurement. We design effective methods for effective mining of exceptions from multiple time series and apply them in stock market surveillance. The domain knowledge of stock market is utilized to design exception mining models and evaluate the models.

2. Background There are two key anomalies in stock market: insider trading and market manipulation [1, 11]. Insider trading 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.302

690

time, instead of multiple measures.

3. Problems Stock markets are changing all the time and it is significant to measure a stock market with appropriate data [2, 10]. The stock data are normally categorized into inter-day data and intra-day data [5]. The inter-day data contain the daily information of a stock, such as the daily closing price, the daily highest price, the daily lowest price, the daily volume or daily trade amount, whereas the intra-day data contain the detailed trade information, such as the order time, order category, trade price, trade volume and so on [12]. It is obvious that the intra-day data can provide more accurate identification than inter-day data. However, it is more complicated to process the intra-day data. So data selection is a basic problem of stock market surveillance. It is obvious that multiple timer series provide more information and can be used to generate more accurate results than a single time series. However, to effectively combine multiple time series is a tough task. It does not work well to combine the multiple data by simple calculation, because different data have different meaning and measurements. Part of our research is to find appropriate ways of combining multiple time series. The definition of exception in stock market is another problem. Previous researches have shown that the outliers in stock price can be treated as exceptions. However, it is believed there are more complicated exceptions except the outliers in stock price [2]. To efficiently identify the patterns is also a challenging problem in stock markets [2, 12]. With the development of stock market, the size of transaction data increases dramatically. It is imperative to analyze the transactions in limited time and with limited computation resources. Online realtime detection of the patterns remains an open problem.

Figure 1. Process of V-BOMM and P-BOMM.

price return, daily price range and daily trade amount. The daily price return and daily price range are calculated based on the following financial formulaes. DailyP ricedReturn =

P rice1 − P rice2 , P rice2

(1)

where P rice1 stands for the current daily closing price, and P rice2 stands for the previous closing price. DailyP riceRange = P riceH − P riceL ,

(2)

where P riceH stands for the daily highest price, and P riceL stands for the daily lowest price. The first step is to carry out outlier mining on the three individual measures to generate three groups of outliers. Then the three generated outliers are combined to form a final list of outliers with majority voting. As a result, the records which occur in two or three generated groups of outliers are outputed as final outliers. The P-BOMM combines multiple time series based on probabilities. The multiple time series are also daily price return, daily price range and daily trade amount. The first step of P-BOMM is to generate three groups of outliers on the three individual measures with the probabilities. A quantitative measurement of ourliers is defined as

4. Approach In our research, we studied the approaches to deal with the problems mentioned above. The idea is to detect the outliers on multiple time series. The data is at inter-day level and the identification process is divided into two main steps shown in Figure 1. The first step is to generate a group of outliers by using the principle curve algorithms on individual measures. The second step is to generate the final outliers by combining the generated multiple groups of outliers on individual measures. We proposed a Voting-Based Outlier Mining on Multiple time series (V-BOMM) and a Probability-Based Outlier Mining on Multiple time series (P-BOMM). The V-BOMM is to combine multiple time series based on majority voting. The multiple time series data are daily

R = (HV − AV )/(HV − LV ),

(3)

where HV stands for the values of test samples, AV stands for the average value of all samples which are less than the test samples and LV stands for the lowest value. The larger the measurement R is, the more likely the sample is an outlier. Then a ranked list of outliers is generated by maximizing the probabilities of generated outliers. The outliers with higher probabilities are regarded as more exceptional ones in stock market.

691

5. Experimental Results

probability to be an outlier. The points with star represent the real anomalies in stock market and the points with circle represent the outliers identified. It indicates that most of the outliers with maximum probabilities are real exceptions in stock market.

The experiments on real stock exchange data have shown that both V-BOMM and P-BOMM are feasible and perform better than the previous approaches. The experimental data are from Shanghai Stock Exchange from 1 June 2004 to 3 Mar 2006, which include 425 trading days. In this period of time, the stock market surveillance found 21 exceptional trading days, which are used as the benchmark of the experiments.

Daily Price Return

0.15 0.1 0.05 0

Figure 4. Evaluation of Different Approaches. −0.05 −0.1

100

200

300

400

500

600

Trading Day

700

Figure 4 shows the comparison of different approaches in accuracy, specificity, precision and recall, and the bars from left to right in each group denote the results of VOMM on daily price return, VOMM on daily price range, VOMM on daily trade amount, V-BOMM and P-BOMM. We can see that both V-BOMM and P-BOMM perform better than the VOMM on individual measure. The above experimental results show that our technique outperforms the traditional method in stock market surveillance. It also provides a theoretical framework for outlier mining on multiple time series.

800

Figure 2. The Result of Outlier Mining on Price Return.

Figure 2 shows the result of outlier mining on an individual measure. The horizontal axis represents the trading day of one stock, and the vertical axis represents the daily price return. The points labeled with star represents the real anomalies in stock market, and the points with circle represents the outliers identified by principle curve algorithm. The curve passing through the middle of the data sets is the “principal curve” which describes the price movement trend, and the outliers are calculated based on the distance between the data and the principal curve [14]. Figure 3

6. Future Works The exceptions identified by V-BOMM and P-BOMM are only outliers which differ from the rest data. However, there are more complicated exception patterns hidden in stock markets [9,11]. For example, some market manipulators may place a large amount of small orders to influence the stock market for a long time. Even though each individual transaction may look normal, these orders in together are typical market manipulation behaviors in stock markets. However, it is more difficult to identify such exceptions. Frequency-based analysis may be a possible solution to identify them. A significant approach is the research on market microstructure. Market microstructure theory is a branch of finance theory about how specific trading mechanisms affect the price formation process. It researches on the costs, risks and asymmetric information in stock markets [1]. More specifically, it measures the stock market with bid-ask spread, depth, liquidity and volatility [8]. There are many interesting findings on market microstructure in financial

Probability of being an outlier

1 0.95 0.9 0.85 0.8 0.75 0.7 0

Real Alerts Points Outliers Ranked with Probabilities 50

100

150

200

250

Trading Day

300

350

400

450

Figure 3. The Result of P-BOMM. shows the result of P-BOMM. The horizontal axis represents the trading day and the vertical axis represents the

692

Acknowledgements

research [1, 4, 3], which are valuable references for identification of exception in stock market. For example, some financial researches have studied the impact of insider trading on price, bid-ask spread, depth and volatility [1, 4, 3]. Therefore, it is valuable to identify the insider trading by analyzing the movements of these measures. Another potential approach of identifying complicated exception patterns is to use semi-supervised classification technologies. The task of time series classification is to map each time series to one of the predefined classes. However, the identified exceptions in stock market are few in reality. This means that the labeled examples are rare, but unlabeled data is abundant. For example, there are only about 30 insider trading cases found in Hong Kong Stock Exchange from 1997 to 2007, and it is almost impossible to train a good classifier based on these few identified cases. An alternative way is to use the semi-supervised classification technology to construct accurate classifiers with few labeled examples. The shape-based analysis is another possible way of identifying exceptional patterns [6]. It is significant to define the movement pattern based on the shape of time series data. For example, the time series may change in the order of increase, decrease and then increase again, which form a shape of a wave. This approach is based on the assumption that there exist unique shapes of time series corresponding to some exceptions in stock market. Firstly, the frequent shapes on multiple time series are identified, and then the exceptional shapes are to be found. However, it is challenging to define the shapes of time series movement and the similarity between time series. Another challenge is how to split the time slides to compare time series. The statistical analysis is also possible to identify exception patterns on multiple time series. Any form of combination of counts, frequency, or distribution of data may be potential ways of identifying the exceptions. It is also challenging to define the statistical measures on proper time slides. For example, marking the close is a typical manipulation in most of exchanges. The manipulator normally places a lot of small orders just near the close of the market in order to influence the closing price change. Therefore, a possible method of identifying this type of manipulation is to analyze the combination of frequency of order, the price change and times slide by statistical methods. In conclusion, exception mining on multiple time series in stock market is a challenging and potential research topic. The key issues are the selection of multiple time series data, the combination of multiple time series, the definition of exception patterns and the identification approach of exception pattern. Some potential technologies to tackle these problems are the application of market microstructure theory, the semi-supervised classification, the shape-based analysis and the statistical analysis.

This work was partly supported by the Australian Research Council (ARC) Linkage Project LP0775041 and Discovery Projects DP0667060 & DP0773412, and by the Early Career Researcher Grant from University of Technology, Sydney, Australia.

References [1] F. Allen and G. Gorton. Stock price manipulation, market microstructure and asymmetric information. European Economic Review, pages 624–630, 1992. [2] P. Brown and P. GoldSchmidt. Alcod idss: Assisting the australian stock market surveillance teams review process. Applied Artificial Intelligence, pages 625–641, 1996. [3] L. Cheng, M. Firth, T. Leung, and O. Rui. The effects of insider trading on liquidity. Pacific-Basin Finance Journal, pages 467–483, 2006. [4] B. Cornell and B. Sirri. The reaction of investors and stock prices to insider trading. Journal of Finance, pages 1031– 1059, 1992. [5] K. Felixson and A. Pelli. Day end returns: Stock price manipulation. Journal of Multinational Financial Management, pages 95–127, 1999. [6] T. Fu, F. Chung, R. Luk, and C. Ng. Stock time series pattern matching: Template-based vs. rule-based approaches. Artificial Intelligence, pages 347–364, 2007. [7] A. Ghoting, M. E. Otey, and S. Parthasarathy. Loaded: Linkbased outlier and anomaly detection in evolving data sets. Proceeding of the Fourth IEEE International Conference on Data Mining, 2004. [8] V. P. P. Gopikrishnan and H. E. Stanley. Quantifying fluctuations in market liquidity: Analysis of the bid-ask spread. The American Physical Society, pages 1–7, 2005. [9] J. Han and M. Kamber. Data Mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco, California, USA, 2001. [10] H. C. Lucas. Market expert surveillance system. Business Computer, pages 28–34, 1993. [11] M. Minenna. Insider trading abnormal return and preferential information: Supervising through a probabilistic model. Journal of Banking and Finance, pages 59–86, 2003. [12] J. Neville, O. Simsek, D. Jensen, J. Komoroske, and a. H. G. K. Palmer. Using relational knowledge discovery to prevent securities fraud. In the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 449–458, Chicago, Illinois, USA, August 2005. ACM. [13] G. K. Palshikar and M. M. Apte. Collusion set detection using graph clustering. Data Mining and Knowledge Discovery, 16:135 – 164, 2008. [14] H. Qi and J. Wang. A model for mining outliers from complex data sets. In the 2004 ACM symposium on Applied computing, pages 595–599. ACM, March 2004. [15] M. Vlachos, K. Wu, S. Chen, and P. Yu. Correlating burst events on streaming stock market data. Data Mining and Knowledge Discovery, 16:109 – 133, 2008.

693