Approximate Packet Pre-filtering to Accelerate ... - Semantic Scholar

Approximate Packet Pre-filtering to Accelerate Pattern Matching Benfano Soewito and Ning Weng Department of Electrical and Computer Engineering Southern Illinois University Carbondale, Illinois, USA

Abstract Intrusion detection system is a promising technique to improve Internet security. A daunting challenge in the design of this system is the requirement of simultaneous matching of hundreds to thousands of attack patterns at full wire speed. This paper presents a novel scheme to accelerate pattern matching by adding a prefilter to the exact pattern matching engines. This prefilter serves as a fast path for the majority incoming packets, which dramatically reduces the workload of exact pattern matching engines. Our prefilter checks each packet based on its header and content. To reduce matching complexity, the prefilter uses a much smaller set of representatives than the set of patterns. Our prefilter is false negative free, with a possible false positive rate, which can be reduced by increasing the representative length. Experiment results show that our prefilter has improved system throughput in the order of 100 times.

Keywords Intrusion Detection System, Pattern Matching, Signature recognition systems, Network Security, and System Optimization

1 INTRODUCTION Network Intrusion Detection System (NIDS) [12], [18] is one of the most promising techniques to provide security of the Internet. The heart of signature-based NIDS is a string matching engine, which identifies suspicious activities by comparing network packets with predefined patterns. These predefined patterns are defined as a set of rules, there are 3305 such rules defined by Snort version 2.4. Each rule consists of two types of strings to be matched: one is header strings with determined positions in packet header (e.g., source/destination network address and source/destination

port number); another is payload strings with probabilistic positions in the packet payload (e.g., network worms and computer viruses). A suspicious activity is detected when both header strings and at least one of the payload strings are matched in the packet. Due to tens Gigabits per second network traffic and thousands of possible attack rules, this simple string matching engine can be the bottleneck of an NIDS. Further, the starting position of payload strings might be probabilistic, hence it is necessary to scan every byte of a packet. Currently, existing software-based NIDS can barely keep up with data rates at a few hundred Megabits per second. Hence, different hardware approaches [2], [5], [7], [9] have been proposed to this difficult problem. However, they are either lacking performance, scalability to traffic rate and attack rules, or are too complicated to design and operate. To address these concerns, this paper proposes a simple but efficient architecture based on a scalable high speed filter to pre scan the packets. As shown in Figure 1 a packet enters the NIDS, is first processed by the high speed filter (the high speed filter architecture shown in figure 2) and further processed by a verifier if necessary. Based on the decision, the packet ends up either discarder or forwarded. The key components are the filter and verifier. The filter scans the incoming packet to three categories: malicious, suspected or benign. The verifier is the second stage of our scenario, used to verify whether the suspected packet is a malicious one. If the packet turns out to be benign, this is a false positive, therefore, choosing the right representative to construct a high speed filter will have to balance the false positive rate and off-loading from the verifiers. The discussion of the verifier is not presented in this paper. This verifier design is purely an optimization hardware design which is beyond the scope of this article. However, our proposed high speed filter can be integrated with any intrusion detection engine to do exact string matching for verifying the suspected packet. The key components of the filter are a header checker and a payload filter as shown in figure 2. The header

Malicious

Verifier 1 Discarder Filter

Suspected

Verifier 2

…

…

Decoder

Incoming Packets

Verifier N

Forwarder

Benign

Figure 1. Two stages intrusion detection system architecture

checker scans the packet header at the exact position in the packet. The payload filter performs string matching for each byte in the packet payload. The throughput of the payload filter plays a vital role in the overall system performance of high speed filter. The payload of the packet is scanned and string matching is performed to compare with partial pattern of snort pattern. The partial patterns that have been selected must represent all of the snort patterns. One partial pattern can represent more than one pattern. Since only the representative of patterns and the partial patterns length that to be matched with packet payload are short, our payload filter scans the packet payload very fast. The advantages of our approach can be exhibited as follows: (i) reduce the number of patterns to be matched, (ii) fast to filter out the benign packets, (iii) find the pattern to be matched, (iv) point to the position of pattern in the suspected packet, (v) reduce the number of packets to be verified. The soundness of this architecture is based on the following observations from Snort 2.4, which is used in our experiments: 329 unique header rules; 172 rules have header string only; the maximal number of payload strings for particular group header strings is 97; and most packets (85%) are benign packets [3]. The rest of the paper is organized as follows. Section 2 discusses related work. In Section 3, we present our classifier architecture. Section 4 shows our experiment results and Section 5 concludes this paper.

have been proposed for string matching. These hardwarebased techniques employ commodity technologies such as Bloom Filter [7], Network Processors [8], TCAM [20] and FPGAs [14], [16], [7], [2]. Bloom Filter is a powerful technique to quickly isolate potential malicious packets. However, large fast on-chip memories are required to implement these multiple Bloom Filters to reduce its well-known high false positive rate. Meanwhile, Network Processors (NPs), programming multiprocessors optimized for packet processing, have been evaluated for multiple string matching. These NP-based approaches use hardware hashing engines provided by most NPs. However, their performance is not scalable due to their general purpose for simple packet processing and relative small on-chip memories. TCAM is very fast and particularly suitable for wild card patterns, however, it suffers from excessive power consumption and high cost. Our work is close to [10] and [15], however in [10], the speed up is dependent on the step size that can not be longer than window size. Increasing speed up will increase the window size and also have to increase the length of fingerprints which means this method will not work efficiently with pattern length less than 8 bytes. In [15] the author used FPGA for pre filtering that was known to exhaust most of the chip resources easily. Also the pre-filtering cost will increase with an increase in prefix length because in this work the prefix n bytes of each pattern from all snort rules was selected to conduct the pre-filtering. In our work, the partial patterns that have been selected to conduct pre filtering are not the prefix from each pattern of entire patterns in snort. The partial pattern was chosen from the most frequent of n bytes containing in the patterns. One partial pattern can be as a representative for more than one pattern. This will reduce the number of patterns to be match in our filter. Our filter uses a software approach, constructing a finite state machine (similar with aho corasick) from partial patterns that have been chosen. This method will not increase filtering load as partial pattern length increases.

2 RELATED WORK The most notable matching algorithms include BoyerMoore [4], Aho-Corasick [1], Commentz-Walter [6], and Wu-Manber [19] are either optimized for average performance, or need a large number of memory accesses, hence they are suitable only for software implementation. Tuck, et al. [17] extended the worst-case bound of Aho-Corasick using bitmap compression and path compression to reduce the amount of memory needed. This algorithm is very fast, however it requires an excessively large memory bus to eliminate memory access bottlenecks. Recently, interesting hardware optimized techniques

3 APPROXIMATE PACKET FILTER

The filter that we consider has 6 dimensions: source and destination addresses that require exact or prefix matches, source and destination ports that requires exact or range matches, a protocol number that require an exact match, and content that requires an approximate match. Compared with previous work, our filter is expected to be scalable, high performance, provide a low false-positive rate and be easily updated.

Table 1. Filter processing cost Length of Rep 1 2 4 6 8

Figure 2. Approximate packet filter

3.1

Filter Architecture

The filter architecture for the proposed NIDS is shown in Figure 2. A packet enters the filter, based on its content, it is classified into three categories: benign, malicious and suspected. The key components of this filter are the header checker and payload checker. The header checker examines the header information of the incoming packet and if no match is found, the packet is forwarded to the network. In cases where the header information matches the rule set, then the decision is made, based on the rule for which the match is found, whether the packet should be directly discarded or sent to the payload checker for further scanning of the payload to compare each byte of payload to signatures. In this process, we use the part of signatures called pattern representatives to speed up the processing. Based on the result of the payload checker, the packet is either discarded or forwarded to the verifier for further investigation.

3.2

Pattern Representatives Selection

The high-speed payload filter is achieved by only inspecting the payload contains segments of the signature snort rules to be examined and also the length of the segments is short. Comparing the incoming packets with only segments of snort rules reduces the operations that need to be performed by the filter and, hence, leads to high-speed performance. In this paper, we refer to the rule segments that are used in the filter to do matching operations as representatives. For a given set of patterns, the representative selection, the number of representatives R and the representative length r, all potentially affect the performance of the filter. Experimental results on this subject will be presented in the experimental result section of the paper. In the following discussion, we explain how to select the rule representatives according to selected R and r values. For a given pattern and a selected r value, we can simply use its r-byte prefix or any r-byte segments contained in the

Inst 1100 1131 1193 1255 1317

Match Packet Memory 49 51 55 59 63

Data Memory 416 429 455 481 507

Inst 1047 1068 1110 1152 1194

Non Match Packet Data Memory Memory 48 398 49 406 51 422 53 438 55 454

pattern as the representative. For example, assuming that the pattern is /bin and r is selected to be 2, the representative will be /b in the former case and will be one of the following strings, /b, bi, in, in the latter case. In the scenario that the pattern length is smaller than the selected r value, then entire pattern is used as the representative. In this work, we use the second approach, which gives us more freedom to optimize the selection of representatives. Once all the candidates for the representative of each pattern are generated, the next step is to select a set of representatives to cover all the patterns. This means that each pattern is represented (covered) by at least one representative. The optimization goal in this step is to use the minimum number of representatives to cover all the patterns. For example, consider the following 4 patterns: a) /bin b) invalid c) /booking d) uname If the r value is selected to be 2, all their representative candidates are listed below. a) /b, bi, in b) in, nv, va, al, li, id c) /b, bo, oo, ok, ki, in, ng d) un, na, am, me Now from the above representatives, we can select in to cover patterns a), b), and c). We can select one of representative candidates from pattern d): un , na , am or me as representation of pattern d) which leads to a solution that uses two representatives to cover all the four patterns. In our work, a method based on greedy algorithm is used to search the optimal set of representatives. Each selection of representation for group r is not dependent on the selection of group r-1. The representations of group r might be totally different than the representations of group r-1.

3.3

Filter Cost Characterization

An efficient payload checker is characterized by smaller processing cost and lower false positive rate, although both characteristic may result in an undesirable tradeoff. In order to find optimal tradeoffs, it is necessary to quantitatively evaluate both processing cost and false positive rate.

Table 2. False positive in random scenario Length of rep. 2 3 4 5 6 7 8

Table 4. Empirical parameters

Ratio of number of representatives over number of patterns 8% 9% 10% 12% 15% 20% 99.96% 99.96% 99.96% 99.97% 99.98% 99.98% 98.17% 98.36% 98.52% 98.72% 98.98% 99.23% 24.96% 32.94% 39.39% 47.47% 57.98% 68.48% 20.04% 28.54% 35.42% 44.03% 55.22% 66.42% 14.43% 23.53% 30.89% 40.10% 52.08% 64.06% 7.98% 17.77% 25.67% 35.58% 48.47% 61.35% 0.47% 11.06% 19.61% 30.33% 44.26% 58.20%

αins 21

β ins 10

αmem 9

β mem 6

Assume each alphabet among n has equal probability to appear in the packet, then the false positive rate of filter can be expressed as P · ¸pi X 1

Table 3. False positive special snort rules Length of Rep. 2 3 4 5 6 8 19

Number of Rep. 12 18 20 21 23 23 23

Number of match 4,454,886 261,504 24,638 11,000 10,245 10,244 10,244

FP Experiment 99.77% 96.08% 54.42% 6.87% 0.01% 0.00% 0.00%

FP Expected 99.78% 93.90% 24.95% 0.80% 0.02% 0.00% 0.00%

ρf = 1 −

The processing cost of scanning the payload can be characterized as number of instructions and number of memory accesses to process one byte of a packet. Table 1 shows the results based on simulation of Aho-Corasick algorithm for different representatives length, in both matching and non matching scenarios. The processing cost depends on the length of representatives and matching probability (for one packet, it is either 0 or 1); however the processing cost is independent of number of representatives. The number of representatives will impact storage requirements, here we ignore it. Both number of instructions and memory accesses are processing costs per unit byte of packet and unit length of representatives, which can be empirically expressed as I(ρ, r) = αins + β ins .r.ρ

(1)

M (ρ, r) = αmem + β mem .r.ρ

(2)

Here I is number of instructions and M is number of memory accesses required to process a unit byte of packet, r is the length of representatives, as explained in Section 3.2. αins , β ins , αmem , and β mem are empirical parameters whose values will be determined in the Result Sections. The matching probability ρ used in Equation 1 and 2 consists of true matching probability ρm and false positive probability ρf as expressed in Equation 3. ρ = ρm + ρf

(3)

The true matching probability is packet trace dependent, which is gained from trace profiling (we assume 10%), the calculation of false positive rate is explained as follows.

i=1

R

n £ 1 ¤r

(4)

n

P is the number of unique patterns, n is alphabet size, pi is the length of pattern i, and R is the number of unique representatives with length r. Obviously, the probability of false positives increases as R (the number of unique representatives) increases, and decreases as r (the length of representatives) increases. For a given group of patterns and fixed length of representative r, the false positive rate will be the worst when number of representatives R is the same as the number of patterns P. This worst case might generate a larger storage requirement, however it is guaranteed achievable. The results of false positive are presented in Section 4.

4 RESULTS We present experimental result of the packet filter with partial pattern technique in this section. First we used the ddos rule set in snort rules to evaluate equations 1, 2, 3, and 4. Second, the web-coldfusion rule set in snort rules was selected to be used in our experiment. This group of rules has only total 35 patterns and the shortest pattern is 10 bytes that will make the behavior of our filter easier to analyze. After we select the representative to cover all snort patterns for this particular group, we construct aho-corasick string matching algorithm as the engine to perform scanning packet payload. In order to evaluate the performance of our two stage intrusion detection system and to analyze the throughput and the effectiveness of the filter we have to feed the system using the packet trace that contains the patterns. We used the packet trace (orange file) from defcon10 [13], downloaded from http://cctf.shmoo.com/data/ . We show the false positive rate as discussed in 3.3 by comparing experimental and expected as shown in Table 3. This table shows how the false positive rate changes with length of representative for a special group of Snort rules. This real group of attack patterns from ddos rule consists of 23 patterns, pattern length is from 3 to 19, and number of unique characters is 41. The second column shows the minimal number of representatives required to represent all patterns for different representative lengths. The third

1400 1

orange1.5 0.9

orange2.6

1000

# representatives over # patterns

throughput speed up

1200

orange3.3

800 600 400 200 0

0.8 0.7 0.6 0.5 0.4 0.3 chat.rule smtp.rule web-client.rule web-iis.rule web-coldfusion.rule

0.2 0.1

2

3

4

length of representatives

0 0

5

10

15

20 25 30 35 length of representatives

40

45

50

Figure 3. Increasing throughput normalized to non filter approach

Figure 4. Percentage of representative to cover all patterns

column shows number of matches of representatives of different lengths for the same randomly-generated text. With matches, we can easily derive the false positive rate shown in the fourth column, and the fifth column shows the expected false positive rate using Equation 4. Both the experimental and expected results show that false positive rates will dramatically drop to a satisfactory value when the representative length equals 5. By Utilizing equations 1, 2, 3, and using the data from Table 1 and Table 3, we calculated the empirical parameters as shown in Table 4. Moreover, the data in Table 1 is generated using Packet Simulation Tools [11]. Table 2 shows the relation between false positive rate vs. length of representatives in a special distribution scenario, in which, pattern length is from 4 to 51. 80% of 8000 patterns are randomly distributed from pattern length 4 to 16 and the remaining 20% of patterns are randomly distributed between 17 to 51. The false positive rate is calculated using Equation 4. Since it is difficult to determine number of representatives R in this random scenario, we show a group of results with varying R from 8% to 20% of number of patterns. We selected three traces from orange file in defcon10 to evaluate the packet filter. The number of matching and the false positive rate is shown in Table 5. The packet filter can dramatically reduce the number of rules or strings to be matched. For example, if we select the r = 5, the number of representative R is 9 compared to the actual number of patterns which is 30. Table 5 also shows the relation between number of pattern representations matched with length of representations using trace files from defcon10 and patterns from webcoldfusion rule (subfile Snort 2.4). R is number of pattern representations and r is length of pattern representation.

The number of patterns matched for r=2 using trace file Orange1.5 is 746 matching and 49 for r=47. In this case we can see that the number false positives is 746 - 49 = 697 patterns or 0.934 over 21986 packets. The number of pattern representations matched does not depend on the length of pattern representation but exactly depends on the payload of the trace file that was unpredictable. The throughput is the parameter to measure the effectiveness of our filter which is shown in Figure 3. We compared the number of packets between the system with filtering process and the system without the filter. The discussion of the increasing of the throughput speed up is as follows. Trace orange1.5 has 21986 packets. After filter processing with r=4, the number of packets that have to be verified is 66. The system speed up is 333 (21986/66) times. The false positive rate and the throughput speed up are both dependent on the trace file but these two properties are independent. For example, for r=4, the false positive rate is 0.29 with trace orange1.5 and 0.797 with trace orange2.6. Even though the false positive rate of trace orange2.6 is higher than trace orange1.5, the throughput speed up for trace orange2.6 is 1267 (119155/94) times which is higher than throughput speed up for trace orange1.5 (333 times) as shown in Figure 3. The overall performance of our two stage intrusion detection system can be optimized by evaluating the trade off between the number of representatives and false positives with respect to the length of representative. Figure 4 shows that the longer the length of representative r, the bigger the number of representatives R. If we select the length of representative r as the length of the longest pattern, the classifier will become the exact string matching engine which is not our purpose. The purpose of our high speed filter is to

Table 5. Number of patterns matched with different trace in Defcon10. r and R are the length and the number of representatives. The longer the length of representative, the smaller the number of matching and the lower the false positive. Trace File Orange1.5

Num of Packet 21986

Orange2.6

119155

Orange3.3

121012

Length of Representation for web-coldfusion.rule (30 patterns ; min 10 bytes ; max 47 bytes) r=2 r=3 r=4 r=5 r=6 r=7 r=8 r=9 r=10 r=16 r=19 R=3 R=4 R=5 R=9 R=9 R=9 R=10 R=16 R=16 R=21 R=28 746 169 69 97 97 97 78 73 73 70 58 0.934 0.710 0.290 0.495 0.495 0.495 0.372 0.329 0.329 0.300 0.155 1280 368 158 198 180 178 69 69 62 38 32 0.975 0.913 0.797 0.838 0.822 0.820 0.536 0.536 0.484 0.158 0.000 3196 1068 240 221 192 189 145 144 79 57 48 0.988 0.963 0.837 0.824 0.797 0.794 0.731 0.729 0.506 0.316 0.187

cost for filtering the payload depends on r (equation 2 and 3). The longer r is, the higher the processing cost. In here we also have to consider false positives if we select a shorter value for r. The filter in our two stage intrusion detection system can dramatically reduce the number of packets to be processed in the exact string matching engine. The filter also can specify the position of the suspected string in the packet payload to be matched with the pattern. Reducing the number of packets to be matched will impact the performance of the overall system. We claim speed in the range hundreds to thousands of times.

1 orange1.5 orange2.6 orange3.3

0.9 0.8 0.7 false positive rate

r=47 R=30 49 0.000 32 0.000 39 0.000

0.6 0.5 0.4 0.3 0.2 0.1 0 0

5

10

15

20 25 30 35 length of representatives

40

45

50

Figure 5. False positive rate decreasing with representative length increasing

filter suspicious and malicious packets as fast as possible. On the other hand, if we select the length of the representative to be very short, it makes the false positive rate become higher as we can see in Figure 5. If we select r=2, the false positive rate is above 0.9. In this case there is a trade of between false positives and number of representatives.

5 CONCLUSION In this work, a high-throughput string matching system is presented. The use of a packet filter can significantly reduce verifier workload and hence improve the system throughput. The technique of selecting pattern representatives can reduce the number of patterns to be compared in the filter by more than 50% depending on the length of representatives. False positives have to be considered in selecting of length of representatives. This high speed filter also can point to the exact position of the suspected signature in the payload of the packet that needs to be verify. The filter processing

REFERENCES [1] A. Aho and M. Corasick. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18, 1975. [2] M. Aldwairi, T. Conte, and P. Franzon. Configurable string matching hardware for speeding up intrusion detection. SIGARCH Computer Architecture News, pages 99– 107, 2005. [3] M. Attig and J. Lockwood. Sift: Snort intrusion filter for tcp. In In Proceedings of 13th Symposium on High Performance Interconnects, pages 121–127, 2005. [4] R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communication of the ACM, 20(10):762–772, 1977. [5] L. Bu and J. A. Chandy. Fpga based network intrusion detection using content addressable memories. In FCCM ’04: Proceedings of the 12th Annual IEEE Symposium on FieldProgrammable Custom Computing Machines, pages 316– 317, Washington, DC, USA, 2004. IEEE Computer Society. [6] B. Commentz-Walter. A string matching algorithm fast on the average. In Proc. of the 6th International Colloquium on Automata, Languages and Programming, volume 71, 1979. [7] S. Dharmapurikar, P. Krishnamurthy, T. Sproull, and J. Lockwood. Deep packet inspection using parallel bloom filters. IEEE Micro, 24(1):52–61, Jan. 2004. [8] R.-T. Liu, N.-F. Huang, C.-H. Chen, and C.-N. Kao. A fast string-matching algorithm for network processor-based intrusion detection system. Trans. On Embedded Computing Sys, 3(3):614–633, 2004.

[9] P. Piyachon and Y. Luo. Effcient memory utilization on network processors for deep packet inspection. In In ANCS ’06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, pages 71–80, New York, NY, USA, 2006. ACM Press. [10] R. Ramaswamy, L. Kencl, and G. Iannaccone. Approximate fingerprinting to accelerate pattern matching. In In IMC ’06: Internet Measurement Conference, 2006. [11] R. Ramaswamy and T. Wolf. PacketBench: A tool for workload characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), pages 42–50, Austin, TX, Oct. 2003. [12] M. Roesch. Snort – lightweight intrusion detection for networks. In Proc. of the 13th Systems Administration Conference, 1999. [13] The Shmoo Group. Capture the Capture the Flag Data: Defcon10. http://cctf.shmoo.com/. http://cctf.shmoo.com/. [14] H. Song and J. W. Lockwood. Efficient packet classification for network intrusion detection using fpga. In FPGA ’05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, pages 238– 245, New York, NY, USA, 2005. ACM Press. [15] P. Sourdis, V. Dimopoulos, D. Pnevmatikatos, and S. Vassiliadis. Packet pre-filtering for network intrusion detection.

[16]

[17]

[18]

[19]

[20]

In In ANCS ’06: Proceedings of the 2006 ACM/IEEE Symposium on architecture for networking and communications systems, pages 183–192, New York, NY, USA, 2006. ACM Press. Y. Sugawara, M. Inaba, , and K. Hiraki. Over 10gbps string matching mechanism for multi-stream packet scanning systems. In Lecture Notes in Computer Science, volume 3203, pages 484–493. Springer-Verlag, 2004. N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proc. of the IEEE Infocom Conference, 2004. G. Varghese. Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices. Morgan Kaufmann, 1st edition, 2005. S. Wu and Manber. A fast algorithm for multi-pattern searching. Technical Report TR94-17, Department of Computer Science, University of Arizona, 1994. F. Yu, R. H. Katz, and T. V. Lakshman. Gigabit rate packet pattern-matching using tcam. In ICNP ’04: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP’04), pages 174–183, Washington, DC, USA, 2004. IEEE Computer Society.