Intrusion Detection by Backpropagation Neural Networks ... - CiteSeerX

7 downloads 843 Views 514KB Size Report
data (the time/money spent to recover from the attack) but also customer confidence and ..... Networks," RAID Proceedings, West Lafayette, Indiana,. Sept 1999.
International Journal of Computational Intelligence Research. ISSN 0973-1873 Vol.3, No. 1 (2007), pp. 6-10 © Research India Publications http://www.ijcir.info

Intrusion Detection by Backpropagation Neural Networks with Sample-Query and Attribute-Query Ray-I Chang, Liang-Bin Lai , Wen-De Su*, Jen-Chieh Wang*, Jen-Shiang Kouh Department of Engineering Science and Ocean Engineering, National Taiwan University No.1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan {rayichang, d93525009, kouhjsh}@ntu.edu.tw *

Information Management Center Chung-Shan Institute of Science and Technology Armaments Bureau, Ministry of National Defense Longtan, Taiwan, ROC

Abstract: The growing network intrusions have put companies and organizations at a much greater risk of loss. In this paper, we propose a new learning methodology towards developing a novel intrusion detection system (IDS) by backpropagation neural networks (BPN) with sample-query and attribute-query. We test the proposed method by a benchmark intrusion dataset to verify its feasibility and effectiveness. Results show that choosing good attributes and samples will not only have impact on the performance, but also on the overall execution efficiency. The proposed method can significantly reduce the training time required. Additionally, the training results are good. It provides a powerful tool to help supervisors analyze, model and understand the complex attack behavior of electronic crime. Keywords: intrusion detection system (IDS), backpropagation neural networks (BPN), Query-based learning.

I. Introduction The enlargement of this electronic environment comes with a corresponding growth of electronic crime where the computer is used either as a tool to commit the crime or as a target of the crime [1]. In past years, numerous computers are hacked because they do not consider the necessary of precautions to protect against network attacks. The failure to secure their systems puts many companies and organizations at a much greater risk of loss. Usually, a single attack can cost millions of dollars in potential revenue. Moreover, that's just the beginning. The damages of attacks include not only loss of intellectual property and liability for compromised customer data (the time/money spent to recover from the attack) but also customer confidence and market advantage. There is a need to enhance the security of computers and networks for protecting the critical infrastructure from threats. Accompanied by the rise of electronic crime, the design of safe-guarding information infrastructure such as the intrusion

detection system (IDS) for preventing and detecting incidents becomes increasingly challenging. Figure 1 illustrates the intrusion detection system and external/internal network intrusion attacks. The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between bad intrusions and normal connections. Recently, an increasing amount of research has been conducted on applying neural networks to detect intrusions. An artificial neural network consists of a collection of processing elements that are highly interconnected. Give a set of inputs and a set of desired outputs, the transformation from input to output is determined by the weights associated with the interconnections among processing elements. By modifying these interconnections, the network is able to adapt to the desired outputs. The ability of high tolerance for learning-by-example makes neural networks flexible and powerful in IDS. However, the time required to induce the model from a large dataset is long. Our paper introduces a novel query-based methodology for learning intrusion detection by neural networks with sample-query and attribute-query. Our method first applies the information theory to select good attributes. Then, we use the query-based methodology [2] to include a subset of samples in learning. Choosing good attributes and samples will not only have impact on the performance, but also on the overall execution efficiency. We examine our method by a benchmark intrusion dataset to verify its feasibility and effectiveness. Results show that the proposed method can not only reduce the training time but also improve the training results. We can accurately predict probable attack behavior in IDS.

Intrusion Detection by Backpropagation Neural Networks

7

III. Proposed Method A learning machine consists of a learning protocol to specify the information achievement manner, and a deduction procedure to learn the correct concept [10]. For a learning protocol, the input information can be examples that exemplify the concept to be learned, or oracles that, when presented with data, tell whether or not the data exemplify the concept. Therefore, we can apply not only the samples presently at hand but also extra samples produced by the oracle to train a system. When the point of query is set as y, the oracle would respond with a(y). The pair (y, a(y)) is called the queried sample.

Figure 1. The intrusion detection external/internal network intrusion attacks.

system

and

II. Related Works As network intrusions are constantly changing, a flexible IDS is required to analyze the enormous amount of network traffic in a manner which is less structured than the traditional rule-based system. In [3] [4], neural networks have been proposed as alternatives to the statistical analysis component of anomaly detection systems. They determine what is normal and flag for further inspection if an abnormal or anomalous event occurs. In [5] [6], neural networks have been applied to build keyword-count-based misuse detection systems. The data presented to the systems consisted of attack-specific keyword-counts in network traffic. Using neural networks, [7] analyzed program behavior profiles for both anomaly detection and misuse detection to identify the normal system behavior. In [8], a neural network detection system is developed where packet-level network data was classified according to 9 packet characteristics. In [9], a statistical neural network classifier for anomaly detection is developed. It can identify UDP flood attacks. Comparing different neural network classifiers, the backpropagation neural network (BPN) has showed to be more efficient in developing IDS. However, the time required to induce models from large datasets is long. In our experiments of learning an IDS dataset with nearly 500000 samples, BPN never gets a feasible result within the pre-specified criteria. It is time-consuming. Therefore, we focus on the combination of data reduction and classification with a query-based learning methodology in this paper. By analyzing and identifying the most important components of training data, we can reduce processing time, communications overhead and storage requirements in mining network intrusion.

A. Sample-Query According to [2] [11] [12], samples from the decision boundary can produce the best training results. We want to decides the points y to let a(y)=0.5. Notably, conventional approaches have assumed that, for each input or output point, the oracle knows its input-output pattern. Randomly select a boundary point P, its conjugate data pair (points P+ and P-) can be extracted along the reverse boundary. Notably, samples with P, P+ and P- are arbitrary input-output patterns. However, we may not have experts or simulators or the oracle may be very expensive for specifying the correct output. To resolve this drawback, we divide the samples into one training set and one query set. Then, an oracle is designed to follow the self-regulation rule [12] to select samples (environment-focus) those are close to the conjugate data pair (self-focus). It provides system the ability to interact with the environment to train the system by queried samples. As [13] has reported, the system could use some particular samples in the data set to learn almost completely what the full data set is taught. In this paper, an oracle is designed to achieve appropriate samples for further training. Thus, the learning performance is improved by labeling only those data that are expected to be informative. In the proposed method, we first examine the non-trained samples to detect whether they are put in the wrong class. As the output also indicates the distance from the boundary to the sample, we can easily store these mistake samples in a priority queue (min-heap). Then, the stored points that are the most close to the class boundary are picked as the extra training samples. B. Attribute-Query Notably, the learning accuracy may be degraded from presence of noise and large number of attributes. In some applications, the benefit of increased accuracy obtained from the additional data proved the expense. The economically reasonable decision is often to use only a subset of the available training samples. Attributes that are not likely to be useful are discarded. For large training datasets, choosing good attributes and samples will have impact on not only the performance but also the overall execution efficiency. In this paper, we use information gain to select subsets of attributes for training. Given a sample set S, the information gain g(Ai) gives the information about the value of Ai.

8

Ray-I Chang et al g (Ai ) = I ( p, n ) − ei

(1)

where I(p,n) is the total information required to classify a random example and ei is the expected amount of information needed to classify a random example. Given p and n as the numbers of positive and negative examples in the sample set, I(p,n) can be calculated as follows. I ( p, n) = −

p p n n log 2 log 2 − p+n p+n p+n p+n.

(2)

(LAN) simulating a typical US Air Force LAN. They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks. A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided. Its objective was to survey and evaluate research in intrusion detection. These intrusions fall into four main categories: Denial of Service (DoS), Probe, Remote to User (R2L), and User to Root (U2R).

In our experiments, we calculate the information gain of each attribute to select a constant number of attributes in training. The objective is to produce concise and highly informative training sets. C. Training with Sample-Query and Attribute-Query A step-by-step description of the proposed algorithm is shown as follows. The learning process is finished either the number of iterations or the root-of-mean-squared-error obtained is over the given threshold. Step1: Initialize all weights in the neural networks randomly. Give the iteration threshold N and the error threshold RMSE. Step2: The dataset is S = {ai ∈ R n } where n is the number of selected attributes. Get the initial training samples SS ⊂ S by stratified random sampling. Step3: Train the neural networks by SS. IF (the error E < RMSE) or (the iteration number > N) then EXIT. Step4: Examine the non-trained samples (S – SS). Step5: Add some samples that are the most close to the class boundary to SS. Go to Step 3. Notably, the goal of learning intrusion detection is not to obtain an exact representation of the training data but rather to extract a “model” of attack function. Therefore, the generalization ability for making good prediction to unseen attack is much more important. As in the real world, a passive learner will simply learn the samples going by. However, an active learner will explore the unknown portion of environment to learn extra information. The proposed method with the generalization ability is very suitable for learning network intrusion. Additionally, network log data are usually with large redundancy. The selection of concise subsets of training data can reduce the training time.

IV. Experimental Results In this paper, we used the dataset applied in KDD intrusion detection contest to evaluate the performance of the proposed approach. Figure 2 show the data distribution of DARPA data attack breakdown of the training set. This dataset is a version of the DARPA intrusion detection evaluation dataset prepared and managed by MIT Lincoln Laboratory. Researchers set up an environment to acquire 9 weeks of raw TCP dump data for a local-area network

Figure 2. The data distribution of DARPA data attack breakdown. In the KDD dataset, the training set contains 494021 samples and the test set contains 311029 samples. Nearly 80% of samples are DoS attacks. Samples of normal connection are about 20%. Other types of attack samples, including U2R (0.011%), R2L (0.228%) and Probe (0.831%), are really rare. It is important to note that the test data is not from the same probability distribution as the training data. It includes 17 specific attack types those are not in the training set. This makes the dataset more realistic.

Actual Negative Positive

Table 1. Confusion Matrix [14] Predicted Negative Positive a c

Recall rate: d (c + d ) Precision rate: d (b + d ) False-Positive rate: b ( a + b) False-Negative rate c (c + d )

b d

Network intrusion detection is a two-class classification problem. Its effectiveness can be defined as the ability to make correct class predictions of samples. For each single prediction, there are four different outcomes (known as the confusion matrix for the two-class case shown in Table I). The true-positives and true-negatives are correct classifications. A false-positive occurs when the system classifies an action as anomalous (a possible intrusion) when it is a legitimate action. Although this type of error may not be completely eliminated, a good system should minimize its occurrence to provide useful information to the users. A false-negative occurs when an actual intrusive action has occurred but the system allows it to pass as non-intrusive

Intrusion Detection by Backpropagation Neural Networks behavior. In other words, malicious activity is not detected and alerted. It is a more serious error. Notably, in a real world system, the effect of incorrectly detecting abnormal network behavior (false-negative) is different from that of incorrectly predicting normal classification outcome (false-positive). These two kinds of errors will generally have different costs; likewise the two types of correct classification will have different benefits. In this paper, we examine our proposed method and the original BPN for comparison. The convergence criteria used for training are a root mean squared error (RMSE) less than or equal to 0.01 or a maximum of 10000 iterations. The diagram of training phase is show in Figure 3. In the KDD dataset, each sample has 41 attributes where some attributes may be more useful in distinguishing normal connection from attacks. For large training datasets, choosing good attributes and samples will have impact on not only the performance but also the overall execution efficiency. In this paper, we use information gain to select 11 attributes for training. Attributes that are not likely to be useful are discarded. Without loss of generality, the results of the bagged boosting method are directly obtained from the KDD intrusion detection contest. It uses all samples and attributes in classification.

9 queried samples are added for further training. All the experimental results are summarized in Table II. Notably, in a real world system, the effect of incorrectly detecting abnormal network behavior (false-negative) is different from that of incorrectly predicting normal classification outcome (false-positive). These two kinds of errors will generally have different costs; likewise the two types of correct classification will have different benefits.

V. Conclusion In this paper, a new method is introduced for learning intrusion detection. Instead of accepting whatever samples fed, the suitable samples could be selected for training to produce valuable results. Experiments show that the proposed method could gain effective classification with less training cost. It is flexible and powerful. Intrusion detection systems must be capable of distinguishing between normal (not security-critical) and abnormal user activities, to discover malicious attempts in time. However translating user behavior (or a complete user-system session) in a consistent security-related decision is often not that simple — many behavior patterns are indistinguishable and unclear. If uncertain behavior is not considered anomalous, then intrusion activity may not be detected. If uncertain behavior is considered anomalous, then system administrators may be alerted by false alarms. The proposed method which we presented to the connection record that can't be distinguished, predict that will classify it as other in way, and does not force and include in one. So can analyses further about the record of other. Our future works are to extend this concept to develop more learning methods for more real world applications. Table 2. The experimental result in confusion matrix. Predicted Actual Probe

R2L

ours BPN ours BPN ours BPN

Precision rate

ours BPN

Recall rate

ours BPN

False-Positive rate

ours BPN

False-Negativ e rate

ours BPN

DoS U2R

Figure 3. The diagrams of training phase.

Our experiments show that the training time of the proposed method is 1447 seconds. However, the training of BPN is over 21746 seconds. Notably, as BPN doesn’t converge to a feasible result within the pre-specified training iterations, we can only select 9500 samples (by stratified random sampling) in training. But we use much fewer samples in the proposed method. Our method starts with only 494 samples (about 1% of the KDD training set) in initial training. Then, 16

ours BPN

Probe

DoS

U2R

R2L

2523 2515 564 25 25 0 14 4

81 181 227080 222153 0 0 479 2

509 0 126 0 83 0 147 0

8 0 0 0 8 0 6660 0

75.8% 91.1% 60.7% 60.3% 24.2% 8.96% 39.4% 39.6%

99.1% 99.6% 98.8% 96.7% 00.9% 0.42% 1.20% 03.4%

9.2% 00.0% 36.4% 00.0% 90.8% 00.0% 63.6% 100.0%

76.5% 00.0% 41.2% 00.0% 23.5% 00.0% 58.6% 100.0%

10

Ray-I Chang et al

References [1]

[2] [3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

A. Brungs, R. Jamieson, "Identification of Legal Issues for Computer Forensics," Information Systems Management, vol. 22, issue 2, Springer 2005, pp. 57-66. R.I Chang, "Disease Diagnosis using Query-Based Neural Networks," LNCS, 2005. H. Debar, B. Dorizzi, "An Application of a Recurrent Network to an Intrusion Detection System,” International Joint Conference on Neural Networks. 1992, pp. (II) 478-483. H. Debar, M. Becke, D. Siboni, "A Neural Network Component for an Intrusion Detection System, " IEEE Symposium on Research in Security and Privacy, 1992. J. Ryan, M. Lin, R. Mikkulainen, "Intrusion Detection with Neural Networks,” Advances in Neural Information Processing Systems, vol. 10, 1998, MIT Press. R. Lippmann, R. Cunningham, "Improving Intrusion Detection performance using Keyword selection and Neural Networks," RAID Proceedings, West Lafayette, Indiana, Sept 1999. A. Ghosh, A. Schwartzbard, "A study in using Neural Networks for Anomaly and Misuse Detection, " Proceedings of the 8th USENIX Security Symposium, 1999. J. Cannady, "Artificial Neural Networks for Misuse Detection,” Proceedings of the 1998 National Information Systems Security Conference (NISSC'98), 1998. Z. Zhang, J. Li, C.N. Manikopoulos, J. Jorgenson, J. Ucles, "Hide: a hierarchical network intrusion detection system using statistical preprocessing and neural network classification," IEEE Workshop on Information Assurance and Security, 2001, pp. 85–90. L. Valiant, "A theory of the learnable," Communications of the Association for Computing Machinery, vol. 27, no.11, 1984, pp.1134-1142. J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query-based learning applied to partially trained multilayer perceptrons," IEEE Trans. on Neural Networks, Vol. 2, pp.131-136, 1991. R.I Chang, P.Y. Hsiao, "Unsupervised query-based learning of neural networks using selective-attention and self-regulation," IEEE Trans. on Neural Networks, vol.8, no.2, 1997, pp.205-217. T. Oates, D. Jensen, "The Effects of Training Set Size on Decision Tree Complexity," International Conference on Machine Learning, 1997, pp.254-262. R. Kohavi, F. Provost, "Glossary of Terms," Machine Learning, vol. 30, no. 2-3, 1998, pp. 271-274.

Author Biographies He received the Ph.D. degree in Ray-I Chang electrical engineering and computer science from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1996. He then joined the Computer Systems and Communications (CSCL) Laboratory, Institute of Information Science, Academia Sinica. In 2003, he joined the Department of Engineering Science, National Taiwan University, Taipei, Taiwan.

Liang-Bin Lai In 1999, he received the B.S. degree in information management from the National Yunlin University of Science and Technology, Yunlin, Taiwan. Since 2004 he has been studying toward the Ph.D. degree. His research interests include neural networks, fuzzy logic, data mining, and database applications.