Using a Particle Swarm Optimization Approach for Evolutionary Fuzzy ...

1 downloads 0 Views 105KB Size Report
Mohammad Saniee Abadeh [email protected]. Jafar Habibi [email protected]. Saman Aliari [email protected]. Department of Computer Engineering.
Using a Particle Swarm Optimization Approach for Evolutionary Fuzzy Rule Learning: A Case Study of Intrusion Detection Mohammad Saniee Abadeh [email protected]

Jafar Habibi [email protected]

Saman Aliari [email protected]

Department of Computer Engineering Sharif University of Technology Azadi Ave., Tehran, Iran

Abstract An evolutionary fuzzy rule-learning algorithm is proposed in this paper. This algorithm utilizes a Particle Swarm Optimization (PSO) approach to search the neighborhood of each individual in the problem search space. The evolutionary algorithm is based on the Michigan approach, which means that each individual in the population represents a fuzzy if-then rule. The proposed learning approach is performed on intrusion detection that is a complex classification problem. Results show that the final classification system is capable of detecting known intrusive behaviors in a computer network with an acceptable performance. Keywords: Evolutionary Computation, Fuzzy Rule Learning, Particle Swarm Optimization, Intrusion Detection.

1

Introduction

With the growing number of intrusions into networked computers has grows and raises concerns about computer security. Intrusion Detection Systems are important security tools, placing inside a protected network and looking for known or potential threats in network traffic and/or audit data recorded by hosts. Basically, an IDS analyzes information about users’ behaviors from various sources such as audit trail, system table, and network usage data.

The problem of intrusion detection has been studied extensively in computer security [1, 2], and has recently received a lot of attention in machine learning and data mining [3, 4]. The intrusion detection techniques can be divided into two groups according to the type of information they use: misuse detection and anomaly detection. Signature or misuse detection is based on patterns of known intrusions. In this case, the intrusion detection problem is a classification problem. This approach allows the detection of intrusions which the system has learned their signatures perfectly. To remedy the problem of detecting novel attacks, anomaly detection attempts to construct a model according to the statistical knowledge about the normal activity of the computer system. In this case, intrusions correspond to anomalous network activity, i.e., to traffic whose statistical profile deviates significantly from the normal. In this paper, we exploit a genetic fuzzy system to develop a novel IDS based on misuse detection. The goal of our algorithm is to find high quality fuzzy if-then rules to predict the class of input patterns correctly. Traditionally, fuzzy if-then rules were gained from human experts. Recently various methods have been suggested for automatically generating and adjusting fuzzy if-then rules without using the aid of human experts [5]. Genetic algorithms have been used as rule generation and optimization tools in the design of fuzzy rulebased systems. Those GA-based studies on the design of fuzzy rule based systems are usually referred to as fuzzy genetics-based machine

learning methods. These methods are classified into Pittsburgh or Michigan approaches. [6, 7] The Pittsburgh approach [6] is similar to the traditional GA where every individual in the population is a set of rule, representing a complete solution to the learning problem. Crossover and mutation are then applied in the usual way to create new generations of such populations. The Michigan approach [7], however, has generally used a different partial solution to the overall learning task. Each individual represents a single rule, and through cooperation with the other rules in the population, the overall problem is being solved. The proposed algorithm is a new model based on the Michigan approach. In this algorithm, the global population is divided into some subpopulations. The number of subpopulations is equal to the number of classes in the classification problem. For each subpopulation, all of the individuals represent rules with the same class in the “prediction” part of the rule according to the fact that each individual in the presented algorithm represents a rule which has the form “if condition then prediction”. We have empowered our algorithm using a heuristic local search procedure, which is based on Particle Swarm Optimization (PSO). This procedure aims to improve the quality of derived fuzzy rules by searching their neighborhoods according to some restrictions. Section 3 will discuss about the details of this procedure clearly. This paper is organized as follows: In section 2 we propose the structure of Michigan based genetic fuzzy rule learning algorithm. Section 3 presents the PSO-based heuristic local search procedure. The following section will discuss about the experimental results, which we have obtained. At the last section of the paper we derive some conclusions.

2

Evolutionary Fuzzy Rule Learning

First, let us explain about the method of coding fuzzy rules, which is used in this paper. Each fuzzy if-then rule is coded as a string. The following symbols are used for denoting the five linguistic values (Figure 1): small ( A1 ), medium small ( A2 ), medium ( A3 ), medium large ( A4 ) and large ( A5 ). For example, the following fuzzy if-then rule is coded as ( A3 , A2 , A5 , A1 ) : If

x1 is medium and x 2 is medium small and x3

is large and x 4 is small then Class

Cj

with

CF = CF j .

Figure 1: Membership functions of five linguistic values (S: small, MS: medium small, M: medium, ML: medium large, L: large). Outline of the proposed Michigan approach based IDS algorithm is as follows: Step 1: Generate an initial population of fuzzy if–then rules. (Initialization) Step 2: Generate new fuzzy if–then rules by genetic operations. (Generation) Step 3: Each individual can live according to its age number. The procedure that accomplishes this lifetime is based on a PSO heuristic algorithm. (PSO-Based Local Search) Step 3: Replace a part of the current population with the newly generated rules. (Replacement) Step 4: Terminate the algorithm if a stopping condition is satisfied, otherwise return to Step 2. (Internal termination test) Step 5: Save the best individuals of the resulted population. Terminate the whole algorithm if a maximum number of iterations is satisfied. Otherwise, go to step 1. Each step of this fuzzy classifier system is described as follows:

2.1

Initialization

Let us denote the number of fuzzy if-then rules in the population by N pop . To produce an initial population,

N pop

fuzzy if-then rules are

generated according to a random pattern in the train dataset [7]. The evolutionary fuzzy system (Figure 2) is considered for each of the classes of the

classification problem separately. One of the important benefits of this separation is that the learning system can focus on each of the classes of the classification problem. According to this fact, the mentioned random pattern is extracted according to the patterns of the training dataset, which their consequent class is the same as the class that the algorithm works on. Next, for this random pattern, we determine the most compatible combination of antecedent fuzzy sets using only the five linguistic values (Figure. 1). The compatibility of antecedent fuzzy rules with the random pattern is calculated by equation (1). µ R j ( x p ) = µ A j1 ( x p1 ) × . . . × µ A jn ( x pn ),

where µ A ji (.) is the membership function of A ji . After generating each fuzzy if-then rule, the

consequent class of this rule is determined according to (2). βClasshˆ ( R j ) = max{βClass1 ( R j ), . . . , βClass c ( R j )}. j

(2)

where,



µ j (x p ) x p ∈Class h

(3)

N Class h ,

β Class h ( R j ) is

the

sum

of

the

their corresponding class is Class h . Each of the fuzzy rules in the final classification has a certainty grade, which denotes the strength of that fuzzy rule. This number is calculated according to (4). CF j = ⎛⎜ β Class hˆ ( R j ) − β ⎞⎟ j ⎝ ⎠

c

∑β h =1

Class h ( R j )

(4)

where,

∑β

h ≠ hˆ j

Class h ( R j )

(6)

That is, the winner rule has the maximum product of the compatibility and the certainty grade CF j . The classification is rejected if no fuzzy if–then rule is compatible with the input pattern x p (i.e., µ j ( x p ) = 0 for ∀ R j ∈ S ) . The generation of each fuzzy rule is accepted only if its consequent class is the same as its corresponding random pattern class. Otherwise, the generated fuzzy rule is rejected and the rule generation process is repeated. After generation of N pop fuzzy if-then rules, the fitness value of each rule is evaluated by classifying all the given training patterns using the set of fuzzy if– then rules in the current population. The fitness value of the fuzzy if–then rule is evaluated by the following fitness function:

where,

NCP ( R j )

(7)

denotes the number of

correctly classified training patterns by rule R j .

compatibility grades of the training patterns in Class h with the fuzzy if–then rule R j and N Class h is the number of training patterns which

B=

µ ˆj ( x p ) . CF j = max {µ j ( x p ) . CF j | R j ∈ S }.

fitness( R j ) = NCP( R j )

h =1, 2, . . ., c

where

winner rule R j in S , which is determined as follows: [7]

(1)

p = 1, 2, . . ., m

β Class h ( R j ) =

When a rule set S is given, an input pattern x p = ( x p1, x p 2 ,..., x pn ) is classified by a single

(c − 1)

(5)

2.2

Generation

A pair of fuzzy if-then rules is selected from the current population to generate new fuzzy if-then rules for the next population. Each fuzzy if-then rule in the current population is selected by the tournament selection procedure. A crossover operation is applied to a selected random pair of fuzzy if–then rules with a prespecified crossover probability. Note that the selected individuals for crossover operation should be different. In computer simulations of this paper, we used the one-point crossover. After performing the crossover operation, consequent classes of the generated individuals are determined. If these classes are the same as their parent classes then the generated individuals are accepted, otherwise the crossover operation is repeated until a prespecified iteration number for each individual

that its consequent class is not the same as its parents. We call this number X repeat .

number determines. The unit of age number is in milliseconds.

With a pre-specified mutation probability, each antecedent fuzzy set of fuzzy if–then rules is randomly replaced with a different antecedent fuzzy set after the crossover operation. After performing the mutation operation, consequent class of the mutated individual is determined. If the result class is the same as the class of the individual before the mutation operation, the mutated individual is accepted; otherwise, the mutation operation is repeated until a prespecified iteration number. We call this number M repeat .

2.4

The above procedure is iterated until a prespecified number of pairs of fuzzy if-then rules are generated.

2.3

Local Search

In order to improve the classification rate of the population, we consider a lifetime for each individual of the current population. We simulate this lifetime using a local search algorithm, which is based on a Particle Swarm Optimization heuristic. Before describing the details of the PSO-based local search procedure, which will be done in section 3, it is essential to introduce the concept of age for each individual. An individual can search its neighborhood (or can live) according to its age. The age of each individual depends on its fitness. In other words, fitter individuals have greater age or opportunity to live. This rule is based on the concept of “survival of the fittest” as Darwin mentioned [6]. To accomplish this idea we determine the age of each individual using equation (8).

Age( R j ) = w age

fitness ( R j ) N Class h

(8)

where, wage denotes a weight which depends on the CPU speed of the computer that the algorithm runs on (The faster the CPU the lower the wage ). Note that the division in the equation is for normalizing the fitness of R j . According to equation (13), each individual can live and improves its quality (fitness) as much as its age

Replacement

A pre-specified number of fuzzy if–then rules in the current population are replaced with the newly generated rules. In our fuzzy classifier system, PrepR percent of the worst rules with the smallest fitness values are removed from the current population and (100 - PrepR ) percent of the newly generated fuzzy if–then rules are added.

2.5

Termination Tests

We can use any stopping condition for terminating the internal cycle of the Michigan based fuzzy classification algorithm. In computer simulations of this paper, we used the total number of generations as a stopping condition. Fuzzy rules of the final population that their fitness is not zero are stored in the Fuzzy Rule Pool (FRP). This pool is updated until a prespecified number of iterations for the external cycle of the evolutionary PSO-Based algorithm.

3

PSO Heuristic for Local Search

PSO was developed by Kennedy and Eberhart [8]. The PSO is inspired by the social behavior of a flock of migrating birds trying to reach an unknown destination. In PSO, each solution is a ‘bird’ in the flock and is referred to as a ‘particle’. A particle is analogous to a chromosome (population member) in GAs. As opposed to GAs, the evolutionary process in the PSO does not create new birds from parent ones. Rather, the birds in the population only evolve their social behavior and accordingly their movement towards a destination. Physically, this mimics a flock of birds that communicate together as they fly. Each bird looks in a specific direction, and then when communicating together, they identify the bird that is in the best location. Accordingly, each bird speeds towards the best bird using a velocity that depends on its current position. Each bird, then, investigates the search space from its new local position, and the process repeats until the flock reaches a desired destination. It is important to note that the process involves both intelligence and social

interaction so that birds learn from their own experience (local search) and also from the experience of others around them (global search). PSO is similar to the other evolutionary algorithms in that the system is initialized with a population of random solutions. Each potential solution, call particles, flies in the Ddimensional problem space with a velocity, which is dynamically adjusted according to the flying experience of its own and its colleagues. The location of the ith particle is represented as X id . The best previous position ( which giving the best fitness value) of the ith particle is recorded and represented as Pid . which is also called pbest . The index of the best pbest among all the particles is represented by symbol g. The location Pgd is also called gbest . The velocity for ith particle is represented as Vid . PSO is initialized with a population of random solutions of the objective function. The position of each particle is updated by a new velocity calculated through (9) and (10): Vid ← w *Vid + c1 *η1 * ( Pid − X id ) + c 2 *η 2 * ( Pgd − X id )

X id ← X id + Vid

(9)

(10)

This section will present the PSO algorithm, which is used in the structure of the main evolutionary fuzzy system as a local search procedure.

3.1

Before presenting the PSO-Based Local Search algorithm, let us introduce the RAM concept, which we have used in this algorithm. Consider a fuzzy rule of the population with n antecedents, (11)

Here we define Rule Antecedent Modification (RAM) Operator, RAM (i, A j ) as replacing the current linguistic value of the rule i th precondition with j th linguistic value. Then we define a new fuzzy rule by using the RAMOperator according to (12).

(12)

Therefore, the plus sign “+” in (12) has its new meaning. It can be given a concrete example: Suppose there is a classification problem with five inputs, equation (13) shows a sample fuzzy rule for this problem:

R = ( S , ML, L, MS , L)

(13)

The RAM-Operator is RAM (5, A2 ) . The result is demonstrated in equation (14). R ′ = R + RAM (5, A2 ) = ( S , ML, L, MS , MS )

(14)

A Rule Antecedent Modification Sequence (RAMS) is made up of one or more RAM Operators: RAMS = ( RAM 1 , RAM 2 , . . . , RAM n )

(15)

where RAM 1 , RAM 2 , . . . , RAM n are RAM operators, here the order of the RAM Operators in RAMS is important. RAM Sequence acting on a solution means all the RAM Operators of the RAM Sequence act on the solution in order. This can be described by the following formula: R ′ = R + RAMS = R + ( RAM 1 , RAM 2 ,..., RAM n )

= (...(( R + RAM 1 ) + RAM 2 ) + ... + RAM n )

RAM Concept

R = (a i ) , i = 1,2,. . ., n

R ′ = R + RAM (i, A j )

(16)

Different RAM Sequences acting on the same solution may produce the same new solution. All these RAM Sequences are named the equivalent set of RAM Sequences. In the equivalent set, the sequence which has the least RAM Operator is called Basic RAM Sequence of the set or Basic RAM Sequence (BRAMS) in short. Several RAM Sequences can be merged into a new RAM Sequence; we define the operator ⊕ as merging two RAM Sequences into a new RAM Sequence. Suppose there are two RAM Sequences, RAMS1 and RAMS 2 , RAMS1 and RAMS 2 act on solution R in order, namely RAMs1 first, RAMS 2 second, we get a new solution R ′ , and there is another RAM Sequence RAMS ′ acting on the same solution

R , then get the Same solution R ′ , described as

follows: RAM S ′ = RAMS 1 ⊕ RAMS 2

(17)

RAMS ′ and RAMS1 ⊕ RAMS 2 are in the same

equivalent set. Suppose there are two solutions, R A and R B , and our task is to construct a Basic RAM Sequence BRAMS B → A which can act on R B to get solution R A , we define: BRAMS B → A = R A − R B

From here we can see that the bigger the value of α the greater the influence of Pid is (more RAM Operators in Pid − X id will be maintained). It is also the same as β * ( Pid − X id ) . Outline of the PSO-Based Local Search is presented in figure 2.

Step 1

Initialization: each of the particles positioned according to the current rule. The initial velocity of each particle is determined randomly (a random RAM operator).

(18) Step 2

Here the sign “-“ also has its new meaning. We can modify the nodes in R B according to R A from left to right to get BRAMS B → A . So there must be an equation R A = R B + BRAMS B → A .

If the lifetime of the current rule is finished, go to Step 5. Step 3

For all the particles in position X id , calculate the next position X id′ as follows:

For example, two solutions are presented in (19) and (20) R A = ( L, MS , L, S , S )

(19)

R B = ( L, L, ML, S , MS )

(20)

3.1 Calculate difference between Pid and X id , according to the method which is proposed, BRAMS A = Pid − X id . BRAMS A is a basic sequence. 3.2 Calculate BRAMSB = Pgd − X id which BRAMS B is also a basic sequence.

The BRAMS B → A is calculated as follows:

3.3 Calculate the velocity Vid according to equation (22), and then transform RAM Sequence Vid to a Basic RAM Sequence.

BRAMS B → A = ( RAM (2, A2 ), RAM (3, A5 ), RAM (5, A1 ))

3.4 Calculate the new solution using equation (16).

(21)

3.2

3.5 Update Pid if the new solution is superior

PSO-Based Local Search

to Pid .

Formula (9) has already no longer been suitable for our task, which is rule extraction for a classification problem. We update it as follows:

Step 4

Update Pgd if there is new best solution which is superior to Pgd . GOTO Step 2.

Vid ← Vid ⊕ α * ( Pid − X id ) ⊕ β * ( Pgd − X id )

(22)

where α and β are random numbers between 0 and 1. α * ( Pid − X id ) means all RAM Operators in Basic RAM Sequence Pid − X id should be maintained with the probability of α , it is the same as β * ( Pid − X id ) .

Step 5

Sketch the global best rule as the result of local search procedure.

Figure 2: Outline of the PSO-Based Local Search Procedure.

4

Experimental Results

Experiments were carried out using the 1998 DARPA intrusion detection evaluation data set prepared and managed by MIT Lincoln Labs. We used the subset of this dataset that was preprocessed by the Columbia University and distributed as part of the UCI KDD Archive [9]. The available database is made up of a large number of network connections related to normal and malicious traffic. Each connection is represented with a 41-dimensional feature vector. Connections are also labeled as belonging to one out of five classes, i.e., normal traffic, Denial of Service (DOS) attacks, Remote to Local (R2L) attacks, User to Root (U2R) attacks, and Probing attacks (PRB). Features were linearly normalized so that they took values within the range [0, 1]. The training data set contains 650 randomly generated points from the five classes, with the number of data from each class proportional to its size, except that the smallest class (U2R) is completely included. A different randomly selected set of 9600 points is used for testing the proposed fuzzy classifier system. This section will compare the performance of the proposed PSO-Based Evolutionary Fuzzy System with other approaches for the intrusion detection problem. Table I presents parameter settings for the PSOBased Evolutionary Fuzzy System, which is used in this paper. TABLE I Parameters specification in computer simulations Parameter population size (

N pop )

Value 20

crossover probability ( Pc )

90

mutation probability ( Pm )

10

age weight parameter ( w age )

10000

number of particles ( N particle )

20

replacement percentage ( Prep )

20

stopping condition of the internal algorithm cycle

100

stopping condition of the external algorithm cycle

4

The effect of PSO-Based Local Search procedure can be investigated according to figures 3 and 4. According to these figures, we can comprehend that our proposed local search procedure is capable of improving the performance of fuzzy classifier system when it is applied on strong enough rules. However, performing this procedure only on very strong rules will reduce the procedure efficiency. The classification accuracy of our hybrid algorithm is shown in Table II along with the performance achieved by a simple evolutionary fuzzy system (without local search step) and some of the other algorithms (EFRID [3], Winner entry [10]). As it is clear, the Evolutionary Fuzzy Systems (with or without local search procedure) are completely competitive to other approaches for intrusion detection. Furthermore, by comparing the classification accuracies of the first two rows of Table II, the hybrid evolutionary fuzzy system with PSO-Based local search outperforms the simple evolutionary system. Thus Table III compares the performance of proposed evolutionary algorithms against other methods (EFRID [3] and RIPPER [11]). Our approach resulted in both high detection rate and low false alarm rate. According to Table III we can conclude that using evolutionary fuzzy systems resulted in a more reliable intrusion detection system.

5

Conclusions

In this paper, a novel PSO-Based Local Search procedure is introduced. The proposed local search procedure is used in the structure of a Michigan based evolutionary fuzzy system. The capability of the resulted hybrid fuzzy system is investigated according to the intrusion detection classification problem. Computer simulations on DARPA datasets demonstrate high performance of PSO-Based evolutionary fuzzy system for intrusion detection. Our future work is to consider the comprehensibility of the final detection system as another objective for our classification problem. To accomplish this objective we should minimize number of antecedents of each fuzzy rule while maximizing the classification rate of it.

100

90

90

80

80

70 NORMAL U2R R2L DOS PRB

60

50

Classification Rate

Classification Rate

100

70 NORMAL U2R R2L DOS PRB

60

50

40

40 0

20

40

60

80

100

0

20

Figure 3: Evolution without the local search step

40

60

80

Generation

Generation

Figure 4: Evolution with the Local Search step

TABLE II

TABLE III

COMPARISON OF SOME ALGORITHMS

PERFORMANCE COMPARISON

CLASS

Algorithm

Detection Rate

False Alarm Rate

Algorithm

NORMAL

U2R

R2L

DOS

PRB

PSO+EFS

94.15

1.1

PSO+EFS

90.8

49.1

79.4

97.5

81.2

Simple EFS

88.13

0.11

Simple EFS

99.2

44

79.4

92.9

68.2

EFRID

98.15

7.0

EFRID

92.78

88.13

7.41

98.91

50.35

Winner Entry

94.5

13.2

8.4

97.1

83.3

RIPPER-Artificial Anomalies

94.26

2.02

References [1] J. Allen, A. Christie, W. Fithen, J. McHugh, J. Pickel, and E. Stoner. (1999). State of the practice of intrusion detection technologies. Technical Report CMU/SEI99 -TR-028. ESC-99028, Carnegie Mellon, Software Engineering Institute, Pittsburgh Pennsylvania, 1999. [2] S. Axelsson (2000). Intrusion detection systems: A survey and taxonomy. Technical Report No 99-15, Dept. of Computer Engineering, Chalmers University of Technology, Sweden, March 2000. [3] J. Gomez, and D. Dasgupta, "Evolving Fuzzy Classifies for Intrusion Detection," Proceedings of the 2002 IEEE Information Assurance Workshop, Jun. 2001. [4] M. Saniee Abadeh, J. Habibi, and C. Lucas, “Intrusion Detection Using a Fuzzy GeneticsBased Learning Algorithm,” Journal of Network and Computer Applications, In Press, 2005. [5] C. L. Karr, and E. J. Gentry, “Fuzzy control of pH using genetic algorithms,” IEEE Trans. on Fuzzy Systems, vol. 1, pp. 46-53, February, 1993.

[6] B. Carse, T.C. Fogarty, and A. Muntro, “Evolving fuzzy rule based controllers using genetic algorithms,” Fuzzy Sets and Systems, vol. 80, no. 3, 3, pp. 273-293, June, 1996. [7] H. Ishibuchi, and T. Nakashima, "Improving the Performance of Fuzzy Classifier Systems for Pattern Classification Problems with Continuous Attributes", IEEE Transactions on Industrial Electronics, vol. 46, no. 6, December 1999 [8] Kennedy J, Eberhart R. Particle swarm optimization. Proceedings of the IEEE international conference on neural networks (Perth, Australia), 1942–1948. Piscataway, NJ: IEEE Service Center; 1995. [9] KDD-cup data set: http://kdd.ics.uci.edu/databases/kddcup99/ kddcup99.html. [10] C. Elkan, “Results of the KDD_99 classifier learning,” ACM SIGKDD Explorations 1, pp. 63–64, 2000. [11] W. Fan, W. Lee, M. Miller, S. J. Stolfo, and P. K. Chan, “Using artificial anomalies to detect unknown and know network intrusions,” Proceedings of the First IEEE International Conference on Data Mining, 2001.

100