Toward Lightweight Intrusion Detection System

Toward Lightweight Intrusion Detection System Through Simultaneous Intrinsic Model Identification Dong Seong Kim, Sang Min Lee, and Jong Sou Park Network Security Lab., Hankuk Aviation University, Seoul Korea {dskim, minuri33, jspark}@hau.ac.kr

Abstract. Intrusion Detection System (IDS) should guarantee high detection rates with minimum overheads to figure out intrusion detection model and process audit data. The previous approaches have mainly focused on feature selection of audit data and parameters optimization of intrusion detection models. However, feature selection and parameters optimization have been performed in separate way. Several hybrid approaches based on soft computing techniques are able to perform both of them together but they have more computational overheads. In this paper, we propose a new approach named Simultaneous Intrinsic Model Identification (SIMI), which enable one to perform both feature selection and parameters optimization together without any additional computational overheads. SIMI adopts Random Forest (RF) which is a promising machine learning algorithm and has been shown similar or better classification rates compared to Support Vector Machines (SVM). SIMI is able to model lightweight intrinsic intrusion detection model with optimized parameters and features. After determination of the intrinsic intrusion detection model, we visualize normal and attack patterns in 2 dimensional space using Multidimensional Scaling (MDS). We carry out several experiments on KDD 1999 intrusion detection dataset and validate the feasibility of our approach. Keywords: Intrusion Detection System, Data mining, Random Forest.

1 Introduction As the amount of information which is interconnecting within networks has been increased tremendously, network security is getting more essential. Among many security methods for protecting network systems such as firewalls and access control, an Intrusion Detection System (IDS) plays a vital role in network security field. The main purpose of the IDS is to inspect all inbound and outbound network activity and identify suspicious patterns that may indicate a network or system attack from someone attempting to compromise a system [4]. IDS should guarantee high detection rates with minimum overheads to figure out intrusion detection model and process audit data. The previous approaches have mainly focused on twofold; parameters optimization of intrusion detection model and feature selection of audit data. The purpose of parameters optimization of detection model is to adjust the value of several parameters and figure out optimal value of them. A lot of researches on parameters optimization have been studied based on data-mining algorithms and machine G. Min et al. (Eds.): ISPA 2006 Ws, LNCS 4331, pp. 981 – 989, 2006. © Springer-Verlag Berlin Heidelberg 2006

982

D.S. Kim, S.M. Lee, and J.S. Park

learning algorithms such as Artificial Neural Networks, Support Vector Machines (SVM), and so on. The objective of feature selection of audit data is to remove irrelevant features and find out intrinsic features of audit data. Several wrapper [9, 13] and filter methods [2, 3] have been proposed. However, the feature selection and parameters optimization have been performed in separate way. Several hybrid approaches [8, 14] based on soft computing techniques are able to perform both of them together but they have more computational overheads. Therefore, in this paper, we propose a new approach named Simultaneous Intrinsic Model Identification (SIMI), which enable one to perform both parameter optimization and feature selection without any additional overheads. SIMI adopts Random Forest (RF) which is a promising machine learning algorithm and have been shown similar or better classification rates compared to SVM. We perform feature selection and parameters optimization together through SIMI. Then, we are able to get intrinsic intrusion detection model with only selected important features. We visualize normal and attacks patterns into 2 dimensional spaces using Multi-dimensional Scaling (MDS). We carry out several experiments on KDD 1999 intrusion detection dataset and validate the feasibility of our approach. The rest of this paper is organized as follows. The related works are introduced in section 2. Our proposed approach and flow of it is described in section 3. The experiments and their analysis are presented in section 4. Some concluding remarks are given in section 5.

2 Related Works In this section, we introduce several related works to our approach. As mentioned in section 1, the previous approaches to design and model intrusion detection systems have mainly been studied in twofold: parameters optimization of intrusion detection models and feature selection of audit data. In former case, a lot of studies proposed intrusion detection models using Naïve machine learning algorithms such as Artificial Neural Networks (ANN), Support Vector Machines (SVM). Their main concern is to maximize intrusion detection rates while minimizing false positive rates. They regulated value of parameters of machine learning algorithms, for example, the weight values and number of hidden layers on neural networks, value of parameters of kernel function of support vector machines, and so on. This is can be considered as parameter optimization problems. Moradi et al. [11] adjusted Multi-Layer Perceptron neural network (MLP). Mukkamala et al. [12] optimized value of parameters of kernel function in SVM. Kim et al. [7] also regulated kernel function in using empirical method. In later case, the objective of feature selection is to find out intrinsic important features. All features are not essential to classify network audit data because irrelevant features not only increase computational cost, such as time and overheads, but also decrease the classification rates. Exhaustive analysis requires 2 N experiments if total number of feature is N so that this is effective in terms of computational overheads. There are two representative methods in machine learning: wrapper [9] and filter method [2, 3]. Wrapper method adopts classification algorithms and performs cross-validation to identify important features. Otherwise, filter method utilizes correlation based approaches independent to classification algorithms. Filter

Toward Lightweight IDS Through Simultaneous Intrinsic Model Identification

983

method is more lightweight than wrapper methods in terms of computation time and overheads but has lower classification rates than wrapper method since it is performed independent of classification algorithms. In IDS, Sung et al. used an empirical method named performance based feature ranking [17]. But the variance of feature importance between each feature is very small and it’s infeasible to modeling IDS. Middlemiss et al. [10] proposed feature selection using Genetic Algorithm (GA). In above approaches, both parameters optimization and feature selection were performed in separated way. Several hybrid approaches [9, 15] based on soft computing techniques are able to perform both of them together. Kim et al. [8] proposed fusion of GA and SVM for anomaly detection. Park et al. [14] proposed which combine filter method with wrapper method based on GA. However, these hybrid approaches sometimes show a little degradation on detection rates with more computations rather than the naïve filter methods, do not provide the variable importance of features and are complicated to implement. In this paper, we proposed a new approach named Simultaneous Intrinsic Model Identification (SIMI) which performs feature selection and parameter optimization simultaneously without any additional overheads. We adopt Random Forest (RF) which is a stage-of-the-art data mining algorithm comparable to SVM [1]. Zhang et al. also [20] proposed a network intrusion detection using RF. But their approach only cut off 3 features after identifying important features and optimized only mtry value of RF. We perform feature selection and parameter optimizations based on RF and then only select top m numbers of important features and optimize both of mtry and ntree. Our approach enables one to identify intrinsic model through this procedure. Furthermore, we use Multidimensional Scaling (MDS) to visualize attack and normal patterns with only selected important feature. The next section presents our proposed approach.

Fig. 1. Overall flow of proposed approach

3 Proposed Approach The overall flow of our approach is depicted in Figure 1. The preprocessed network audit data is divided into two datasets; training and testing set. The training set is

984


further separated into learning set and validation set. Although we do not need to perform cross-validation to get a balanced estimate of generalization error since RF is robust against over-fitting [1], we adopt n-fold cross validation to minimize that. The learning set is used to generate classifiers and aggregate their results based on RF and find out variable importance of each feature of network audit data and optimal parameters for RF simultaneously. These classifiers can be considered as detection models in IDS. The validation set is used to compute detection rates according to estimating error rates which is Out-Of-Bag (OOB) errors in RF. Feature selection is performed by eliminating the irrelevant features which are low ranked in the ranking of variable importance. In next steps, therefore, only important features that have more effect on classification and optimal parameters are used to build detection models and evaluate by testing set with respect to detection rates. This demonstrates our approach named SIMI. If the detection rates fulfill our design requirement, the overall procedure is finished. To evaluate the feasibility of our approach, we perform several experiments on KDD 1999 intrusion detection dataset. The following section presents the results of experiments and their analysis.

4 Evaluation In this section, we carry out several experiments on KDD 1999 intrusion detection dataset [5] to verify the feasibility of our approach. We first present the experiments on parameters optimization for RF. Then, we describe the experiments of using random forest to eliminate irrelevant features. Finally, we evaluate our approach. Next section describes experimental dataset and environments and experimental results. 4.1 Evaluation Dataset and Environments We have used the KDD 1999 intrusion detection dataset. The dataset contains a total of 24 attack types that fall into four main categories [6]: DoS (Denial of Service), R2L (unauthorized access from a remote machine), U2R (unauthorized access to root privileges) and probing. The data was preprocessed by extracting 41 features from the tcpdump data in the 1998 DARPA datasets and we have labeled them as f1, f2, f3, f4 and so forth. We have only used DoS type of attacks since the others have very small number of instances so that they are not suitable for our experiments [16]. According to overall flow presented in section 2, the dataset is divided into 3 datasets; learning set, validation set and testing set. The learning set is used to build the initial detection models based on RF. Then, the Validation set is used to estimate the generalization errors of detection models. The generalization errors are represented as OOB errors in RF. In order to minimize the OOB errors, in other words, maximize detection rates, we have used 10-fold cross validation with 2000 samples. Finally, we have used the testing set to evaluate the detection models that are built by training set. All Experiments were performed in a Windows environment having configurations Intel® Pentium® 4, 1.70GHz (over 1.72GHz), 512 MB RAM. RF version (R 2.2.0) and MDS algorithm in open source R-project [18] is used to perform several experiments.


985

4.2 Evaluation Results and Analysis There are only two parameters in RF to be optimized; the number of variables in the random subset at each node (mtry) and the number of trees in the forest (ntree). To get the best classification rates, that is, the best detection rates, it is essential to optimize both two parameters. This is considered as parameters optimization. Fortunately, we could get the optimal value of mtry using tuneRF() function provided in randomForest package of R-project [18] and it turned out mtry = 6. In case of ntree, there is no specific function that figures out the optimal value of it. Thus, we got the optimal value of ntree by choosing the ntree value that has high and stable detection rates. We assume that 350 trees are enough to be the maximum value to evaluate our approach and detection rates are determined by equation “1 – OOB errors”. The experimental results for determination of the optimal value of ntree are described in Figure 2. 99.92 99.88

(% se ta 99.84 R no 99.8 it ce te D99.76 eg ar 99.72 ev A 99.68 99.64

10

50

90

130

170 210 ntree Values

250

290

330

Fig. 2. Average detection rates vs. ntree values

According to Figure 2, average detection rates of RF turned out the highest value when ntree = 310. As the result of experiments, we set two optimized parameter values; mtry = 6, ntree = 310. After optimizing two parameters, feature selection of network audit data was carried out employing the feature selection algorithm supported by RF. We ranked features thorough the average variable importance of each feature as the results of 10-fold with 2000 samples. As the results, feature important of each individual feature were decided. The importance value of each feature varies and we rank features with respect to their average importance values of cross validation experiments. We partially show the top 5 important features and their properties in Table 1. Our approach showed reasonable context information for each important feature. f23 represents “number of connections to the same host as the current connection in the past two seconds” property and f6 represents “number of data bytes from destination to source” and so on.

986

D.S. Kim, S.M. Lee, and J.S. Park Table. 1. Top 5 important features and their properties

Features f23 f6 f24 f3 f5

Properties number of connections to the same host as the current connection in the past two seconds number of data bytes from destination to source number of connections to the same service as the current connection in the past two seconds network service on the destination, e.g., http, telnet, etc. number of data bytes from source to destination

Average variable importance 0.4023 0.3318 0.3172 0.3163 0.2973

Then, we carried out several times of experiments with elimination of irrelevant features and measure detection rates. The experimental results are depicted in Figure 3. 100 99.95

) % ( 99.9 et ar n99.85 oti ce te 99.8 d

Upper Average Lower

99.75 99.7

21

25

29 33 number of features

37

41

Fig. 3. Detection Rates vs. number of Features

In Table 2, we present comparison results between our approach and other approaches. Our approach showed higher detection rates than others. Even though the detection rates is slightly high than others, our approach only use selected important features and training and testing time is faster than others. We need to calculate computational complexity and compare it to other approaches. But this is out of scope of this paper because both Kim et al. and Park et al.’s approach utilized Genetic Algorithm [15]. Although Both Kim et al. and Park et al.’s approaches have showed “optimal feature set”, they didn’t show the numeric value as the variable importance


987

Table 2. The comparison results with other approaches Feature selection

Parameters optimization

Detection rates

method

Kim et al. [8]

99.85%

GA

Optimal feature set

GA

Park et al. [14]

98.4%

Filter method with GA

Optimal feature set

N/A

Default value of SVM

mtry only

Partially regulated RF

mtry and ntree

Optimal RF

Approaches

Zhang et al. [20]

99.4%

RF

Our approach

99.87 %

RF

result

Individual feature importance /38 features remain Individual feature importance /m features remain

method

result Optimal parameters value of Kernel function in SVM

of each feature. Our approach is able to get individual feature importance so that only important individual features can be used. Zhang et al. also used RF as classification algorithm but they only eliminate 3 features and optimized only mtry of RF. We remove irrelevant features and only used m number of features to detect DoS type of attacks (see Figure 3). We also optimized ntree value of RF to figure out intrinsic model (see Figure 2). In summary, these results proved that our approach is superior to Kim et al. [8], Park et al.’s approaches [14], and Zhang et al.’s approach. Furthermore, we visualized normal and DoS attacks patterns based on MDS [19] plots in figure 4. These figure 4 plots can easily make one understand about intrusion context information.

Fig. 4. Visualization of normal and DoS attacks patterns using MDS

988


5 Conclusions In this paper, we have presented a new approach named Simultaneous Intrinsic Model Identification (SIMI) for modeling lightweight intrusion detection model. We utilized Random Forest (RF) to perform both feature selection of audit data and parameters optimization of intrusion detection model together without additional overheads. We have evaluated our approach by carrying out several experiments on KDD 1999 intrusion detection dataset and the results have showed that our approach is able to guarantee higher detection rates while figuring out optimal features and intrusion detection model together.

Acknowledgement This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) sup-port program supervised by the IITA (Institute of Information Technology Assessment).

References 1. Breiman, L.: Random forest. Machine Learning 45(1) (2001) 5–32 2. Dash, M.: Feature Selection for Clustering – A Filter Solution. In Proc. of IEEE Int. Conf. on Data Mining (ICDM) (2002) 115–122 3. Hall, M.A. and Smith, L. A.: Feature subset selection: a correlation based filter approach. In Proc. of Fourth Int. Conf. on Neural Information Processing and Intelligent Information Systems (1997) 855–858 4. Intrusion Detection System.: http://www.webopedia.com/TERM/I/intrusion_detection_system.html 5. KDD Cup 1999 Data.: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 6. KDD-Cup-99 Task Description.: http://kdd.ics.uci.edu/databases/kddcup99/task.html 7. Kim, D., Park, J.: Network-Based Intrusion Detection with Support Vector Machines. In.: Kang, H. (eds.): Information Networking. Lecture Notes in Computer Science, Vol. 2662. Springer-Verlag, Berlin Heidelberg New York (2003) 747–756 8. Kim, D., Nguyen, H.-N., Ohn, S.-Y., Park, J.: Fusions of GA and SVM for Anomaly Detection in Intrusion Detection System. In.: Wang J., Liao, X., Yi, Z. (eds.): Advances in Neural Networks. Lecture Notes in Computer Science, Vol. 3498. Springer-Verlag, Berlin Heidelberg New York (2005) 415–420 9. Kohavi, R., John, G. H.: Wrappers for feature subset selection, Artificial Intelligence, 97(1–2). (1997) 273-324 10. Middlemiss, M., Dick, G.: Feature Selection of Intrusion Detection Data using a Hybrid Genetic Algorithm/KNN Approach. Third Int. Conf. on Hybrid Intelligent Systems, Melbourne, Australia (2003) 11. Moradi, M., Zulkernine, M.: A Neural Network Based System for Intrusion Detection and Classification of Attacks, In Proc. of IEEE Int. Conf. on Advances in Intelligent SystemsTheory and Applications, Luxembourg (2004)


989

12. Mukkamala, S., Sung, A. H., Ribeiro, B. M.: Model Selection for Kernel Based Intrusion Detection Systems, In Proc. of Int. Conf. on Adaptive and Natural Computing Algorithms, Springer-Verlag (2005) 458–461 13. Noelia S-M. : A New Wrapper Method for Feature Subset Selection 14. Park, J., Shazzad, Sazzad, K. M., Kim, D.: Toward Modeling Lightweight Intrusion Detection System through Correlation-Based Hybrid Feature Selection. In.: Feng, D., Lin, D., Yung, M. (eds.): Information Security and Cryptology. Lecture Notes in Computer Science, Vol. 3822. Springer-Verlag, Berlin Heidelberg New York (2005) 279–289 15. Rylander, B.: Computational Complexity and the Genetic Algorithm. Thesis for Ph.D., University of Idaho (2001) 16. Sabhnani, M., Serpen, G.: On Failure of Machine Learning Algorithms for Detecting Misuse in KDD Intrusion Detection Data Set. Intelligent Analysis (2004) 17. Sung, A. H., Mukkamala, S.: Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks. In Proc. of the 2003 Int. Symposium on Applications and the Internet Technology, IEEE Computer Society Press (2003) 209–216 18. The R Project for Statistical Computing, http://www.r-project.org/ 19. Young, F. W., Hamer, R. M.: Theory and Applications of Multidimensional Scaling. Eribaum Associates, Hillsdale, NJ (1994) 20. Zhang, J., Zulkernine, M.: Network Intrusion Detection using Random Forests. In Proc. of 3rd Annual Conf. on Privacy, Security and Trust (2005)