Multilayer collaborative traceback technique based on

0 downloads 0 Views 642KB Size Report
inter-AS layer; intra-AS routing layer; controllable subnet layer. ... Yi Sun received her Master degree from Zhengzhou Information Science and Technology.
Int. J. Embedded Systems, Vol. 8, No. 1, 2016

1

Multilayer collaborative traceback technique based on net-flow fingerprint Cheng Lei* and HongQi Zhang Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan Province 450001, China and Henan Provincial Key Laboratory of Information Security, Zhengzhou, Henan Province 450001, China Email: [email protected] Email: [email protected] *Corresponding author

Yi Sun and XueHui Du Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan Province 450001, China Email: [email protected] Email: [email protected]

XueDong Jia Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan Province 450001, China and Henan Provincial Key Laboratory of Information Security, Zhengzhou, Henan Province 450001, China Email: [email protected] Abstract: Aimed at loop fallacy, indeterminate serialisation of suspicious nodes and local overload problems of suspicious information source traceback in net-flow exchange, this paper proposes a multilayer collaborative traceback technique based on net-flow fingerprint. The traceback is divided into controllable inter-AS layer, intra-AS routing layer and controllable subnet layer. Based on the characteristics of each layer, it achieves efficient suspicious path extraction in controllable inter-AS layer by BGP protocol properties. In intra-AS routing layer, it solves loop fallacy by directed graph transformation and indeterminate serialisation of suspicious nodes by local time relationship approach. In controllable subnet layer, it achieves precise location by using forwarding tables. What is more, by proposing multilayer collaborative approach, it improves the efficiency of suspicious path extraction and reduces local overload of traceback servers without compromising the accuracy of traceback. Finally, the correctness and computational complexity of NFCMT are proved, and the feasibility and correctness of this scheme are discussed by experiments. Keywords: net-flow exchange; multilayer collaboration; suspicious path extraction; controllable inter-AS layer; intra-AS routing layer; controllable subnet layer. Reference to this paper should be made as follows: Lei, C., Zhang H., Sun, Y., Du, X. and Jia, X. (2016) ‘Multilayer collaborative traceback technique based on net-flow fingerprint’, Int. J. Embedded Systems, Vol. 8, No. 1, pp.1–11. Biographical notes: Cheng Lei received his Bachelor degree from Zhengzhou Information Science and Technology Institute in Zhengzhou, Henan Province, China. Currently, he is a graduate student in the same school. His main research interest is computer and science technology. HongQi Zhang received his PhD from Zhengzhou Information Science and Technology Institute in Zhengzhou, Henan Province, China. Currently, he is a Professor in same institute. His main research interest is computer science and internet security.

Copyright © 2016 Inderscience Enterprises Ltd.

2

C. Lei et al. Yi Sun received her Master degree from Zhengzhou Information Science and Technology Institute in Zhengzhou, Henan Province, China. Currently, she is studying for doctoral degree. Her main research interest is computer science and network internet security. XueHui Du received her Doctor degree from Zhengzhou Information Science and Technology Institute in Zhengzhou, Henan Province, China. Her main research interest is computer science and network internet security. XueDong Jia received his Master degree from Zhengzhou Information Science and Technology Institute in Zhengzhou, Henan Province, China. Currently, his main research interest is computer science and network internet security. This paper is a revised and expanded version of a paper entitled ‘Multilayer collaborative traceback technique based on net-flow fingerprint’ presented at the Trusted Computing and Information Security Conference, Enshi, Hu Bei Province, 13 September 2014.

1

Introduction

With the widespread use of net-flow exchange, the security of net-flow exchange is becoming more and more important. No matter what their method is, the majority of attacks to central servers merge with stepping stones and anonymous traffic techniques so as to hide suspicious net-flow exchange path and suspicious information source (Al-Qudah et al., 2014; Xiang and Zhou, 2006; Zhang et al., 2013). The roots of these problems are unknown net-flow identity and untraceable suspicious path caused by it. Therefore, it is urgent to solve problems of extracting suspicious exchange path efficiently and locating suspicious information source accurately in net-flow exchange. As shown in Table 1, preliminary traceback researches are divided into three categories: traceback based on routers log (Hilgenstieler et al., 2010), traceback based on overlay network (Yang et al., 2009) and traceback based on marking (Goodrich, 2008; Yang and Yang, 2012; Wang et al., 2010b). Traceback based on routers log can be used to extract suspicious path only after attacks, and it requires network resources in a certain degree. Traceback based on overlay network has large management expenses and high computational complexity because of the complex and mutable router layer network topology trace servers (TS) holding. Traceback based on marking can be subdivided into traceback based on packet marking (Goodrich, 2008) and traceback based on net-flow fingerprint (Yang and Yang, 2012; Wang et al., 2010b). But neither of them can solve loop fallacy problem effectively. Besides, the former one has limitation on marking content because of its carrier capacity, and it suffers from fake packets and IP header substitution problems. What is more, because real-time interactive systems in e-government have the characteristics of packet encryption, low latency, multiple net-flows intersection and resource constraint, suspicious path traceback in net-flow exchange should have low bandwidth consumption, the capability of multiple net-flows traceback, extensive applicability and high robustness. Consequently, it can reduce the damage of target server and increase the difficulty of launching attack by extracting suspicious path efficiently and locating suspicious source accurately.

Table 1 Categories

Comparison of main traceback techniques Management Network Applicability Robustness expenses expenses

Based on log

High

Low

After attacks

High

Based on overlay network

High

High

In the process of attacks

Low

Based on packet marking

Low

Low

Both cases

High

Based on net-flow fingerprint

Low

Low

Both cases

High

2

Related work

Based on encoding and decoding net-flow fingerprint information accurately, the preliminary researches of traceback based on net-flow fingerprint mainly solve correlation problem of suspicious net-flows in one connection chain, which leads to having correlated suspicious nodes relationship information. By collaborating multiple detecting points, they use net-flow fingerprints to correlate suspicious nodes in order to extract suspicious path and locate suspicious sources. Wang et al. (2002) propose suspicious net-flows correlation for the first time. It comes up with four functions: MMS is sensitive to local details; STAT CPF can resist noises, but with high false negative; the performance of function NDP1 and NDP2 are between MMS and STAT CPF. Omar et al. (2004, 2008) use filtered packet character to correlate suspicious nodes, but it has high error rate by using only one kind of packet character especially in net-flow exchange. Aimed at this problem, Pyun et al. (2007) propose a scheme based on net-flow fingerprint, which is applicable in fake packets, flow mixing and packet loss in some extent. However, it is hard to trace multiple net-flows because of the conflicts in encoded marking labels. Wang et al. (2009) propose interval centroid watermark based on direct sequence spread

Multilayer collaborative traceback technique based on net-flow fingerprint spectrum. It has the advantages of low latency, the capability of parallel multiple decoding net-flows and low error rate. Meanwhile, the existing researches only focus on how to detect whether one node is stepping stone or not, but overlook problems of loop fallacy in extracting path and deterministic serialisation, especially having only part of suspicious nodes correlated relationship, which leads to drastic decline of correct rate. On the other hand, compared with router-level traceback, AS-level traceback has its advantages as follows: 1

Fewer hops: the maximum number of ASs in exchange is 8 hops (Afanasyev et al., 2010; Guo et al., 2011; National Laboratory for Applied Network Research, 2005), while the average number of routers is 15 and the maximum can be up to 30. Besides, compared with unorganised and considerable routers, IANA numbers ASs (Haas and Mitchell, 2013; Carmi et al., 2007). So, the computational complexity and the load of traceback servers are much less in AS-level traceback than that in router-level.

2

More stable topology: network topology is easy to get, and the AS-level topology is more stable (Dimitropoulos et al., 2007, 2009) than the router-level topology. What is more, it is beneficial of every border gateway in the same controllable AS to collaborate with each other owning to the share of BGP routing information (Zhou et al., 2009; Huang et al., 2011). Hence, it is feasible and efficient to get and maintain AS-level topology.

In short, aimed at the problems of slow extraction of suspicious exchange path, partial overload, loop fallacy and high uncertainty of serialisation suspicious nodes under the condition of having only part of suspicious nodes correlated relationship, this paper proposes a multilayer collaborative traceback technique based on net-flow fingerprint, NFMCT as short.

3

NFMCT scheme

NFMCT is based on net-flow fingerprint. By using multilayer collaboration method, it extracts suspicious path from controllable inter-AS layer, intra-AS routing layer and controllable subnet layer collaboratively. In controllable inter-AS layer, it constructs AS-level topology and correlates suspicious nodes by using net-flow fingerprint. Based on topology and correlation information, NFMCT uses controllable inter-AS extraction algorithm to quickly extract suspicious path without loops. In intra-AS routing layer, it uses intra-AS routing extraction algorithm to transform directed graph (DG) so as to solve loop fallacy under the condition of having the whole suspicious nodes correlated relationship, and to solve indeterminate serialisation in the case of having only part of suspicious nodes correlated relationship. In controllable subnet layer, it uses controllable subnet extraction algorithm to locate suspicious source. Therefore, NFMCT achieves accurate

3

suspicious source locating and efficient path extraction, and prevents partial overload by decentralising storage and declining computational complexity. Figure 1

The process of NFMCT

As is seen from Figure 1, NFMCT consists of controllable routers, TS and control centre. It is assumed that target server in ASk detects suspicious net-flow exchange. The process of NFMCT is as follows: 1

Target server detects (Mao and Zong, 2009) the existence of suspicious exchange net-flow and decodes fingerprint information from net-flow.

2

Target server sends net-flow fingerprints and alarm message to TSk in ASk.

3

TSk verifies net-flow fingerprints and alarm message. It sends request messages to controllable routers in ASk and control centre to extract suspicious exchange path.

4

Based on inter-AS topology, control centre uses controllable inter-AS extraction algorithm to extract path in controllable inter-AS layer.

5

After detecting suspicious connection chain passes other controllable AS (ASj), control centre sends traceback request message to TSj. At the same time, TSj sends request message to its corresponding controllable routers.

6

After receiving request messages, controllable routers in each AS use net-flow fingerprint to correlate (Houmansadr and Borisov, 2011) suspicious nodes, and use controllable subnet extraction algorithm to extract suspicious paths or to locate suspicious source in controllable subnet layer.

7

Controllable routers send suspicious paths or suspicious sources to their corresponding TSs (e.g., TSk, TSj).

4

C. Lei et al.

8

TSs integrate information of suspicious path subsequences in their own intra-AS and use intra-AS routing algorithm to extract suspicious path in intra-AS routing layer.

9

TSs send suspicious path in its AS to control centre.

Figure 2

Path produced by AS_SEQUENCE

Figure 3

Path produced by AS_SET

10 Control centre integrates all suspicious path subsequences in controllable ASs so as to get the whole suspicious path sequence. 11 Control centre reports extracted suspicious path and located suspicious source to administrator.

3.1 Controllable inter-AS extraction algorithm Since inter-AS topology is more stable, and border gateways in each AS share BGP routing information, it helps to extract path efficiently by constructing inter-AS topology firstly. Besides, AS_PATH attribute in BGP protocol has mechanism to prevent loops. Hence, with the help of constructing inter-AS topology by using BGP protocol properties, the proposed algorithm can not only prevents false path extraction caused by loop fallacy, but also improves efficiency of suspicious paths extraction. The process of controllable inter-AS extraction algorithm is as follows.

Having obtained Tdown(ASi), TSi will communicate with TSj in ASj ∈ Tdown(ASi). TSi will send Tup(ASj, ASi) to TSj. So the logical upstream adjacency table of ASj is Tup(ASj) = ∪ Tup(ASj, ASi), where ASi is {ASj ∈ Tdown(ASi), Tup(ASj, ASi)}. As is shown in Figure 4, control centre constructs the controllable inter-AS topology by using three tables above in TSs. Figure 4

Inter-AS topology

3.1.1 Inter-AS topology construction Every border gateway keeps neighbour table and BGP topology table, from which TS can get ASN (the number of AS) and IP of its AS as well as corresponding physical adjacent ASs. By preprocessing AS_PATH and Next_HOP attributes firstly, TS judges logical and physical adjacent ASs to its AS (e.g., ASi), and produces three tables: external interface table, recording ASN and IP of ASs which are physical adjacent to external interfaces of ASi; logical downstream table (Tdown(ASi)), recording ASN and IP of ASs which are logical downstream adjacent to ASi; logical upstream table (Tup(ASj, ASi | ASj ∈ Tdown(ASi))), recording ASN and IP of ASs which are logical upstream adjacent to each AS in Tdown(ASi) from the perspective of ASi. As for AS_PATH attribute, it has two types: a

AS_SEQUENCE: Ordered set of ASN to specific destination. AS_SEQUENCE indicates the ASN of controllable ASs passed by the net-flow from right to left. So, TSi will traverse AS_SEQUENCE from left to right. As is shown in Figure 2, if Tdown(ASi) = {AS7, AS3}SEQUENCE, the logical path is AS3 → AS7 → ASi. Therefore, the suspicious net-flow exchange path is AS3 → AS7 → ASi. The traceback path is ASi → AS7 → AS3.

b

AS_SET: Unordered set of ASN to specific destination. As is shown in Figure 3 Tdown(ASi) = {AS6, AS2}SET, AS2 and AS6 are upstream adjacent to ASi judging from the property of AS_SET. The remaining elements are downstream adjacency to ASi. Therefore, the suspicious net-flow exchange path is AS6 ↔ AS2 → ASi. The traceback path is ASi → AS2 ↔ AS6.

3.1.2 Decoding net-flow fingerprint information Net-flow is defined by five elements (Munz and Carle, 2007), which can be used to quickly identifying net-flows. Besides, for fear of net-flow transformation in net-flow exchange (Lei et al., 2013; Houmansadr and Borisov, 2011) can be used to encode and decode net-flow fingerprint so as to ensure the accuracy of net-flow identity at controllable AS border gateways. Net-flow identification information is listed in Table 2. Table 2 Name SendIP Length SrcIP DstIP SrcPort DstPort FstSeq LstSeq Protocol

Net-flow identification information Content Logical adjacency IP Length passing controllable AS IP of sending net-flow gateway IP of receiving net-flow gateway Port of sending net-flow gateway Port of receiving net-flow gateway Sequence of the fist packet Sequence of the last packet Protocols used in net-flow exchange

Accessing way Fingerprint Fingerprint IP header IP header IP header IP header IP header IP header IP header

3.1.3 Controllable inter-AS suspicious path extraction based on fingerprint When TSi gets the five elements of suspicious net-flow passed from ASj, it will decode the fingerprint so as to get

Multilayer collaborative traceback technique based on net-flow fingerprint the first and the last packet sequence and entrance border gateway interface (Rj–i) of ASi. Besides, TSi gets IP (SendIP) of sending border gateway of controllable AS, which is logical adjacent to ASi. Then, TSi will send repi–s (SendIP, Length, SrcIP, DstIP, SrcPort, DstPort, Protocol) to control centre. Control centre uses SendIP and SrcIP to judge whether the logical adjacent AS and physical adjacent AS are the same, and extract path by using repi–s and topology obtained by 1). If the logical adjacency AS and physical adjacency AS are the same, it means ASi is physical adjacent with ASj by Rj–i. Then, there will be P ′ = ASi _ R j −i + P, where P and P ′ represent suspicious controllable AS set. If the logical adjacency AS and physical adjacency AS are not the same, there will be P ′ = AS j + ASi _ R j −i + P.

5

Firstly, we will give the definition of DG and adjacent connection pair directed graph (PCA(DG)). The vertex of DG is: V = {v | ∃ < v, u >∈ C ∪ ∃ < u , v >∈ C}.

The edge of DG is: e | Start (e) ∪ End (e) ∈ C

Start(e) represents the node initiating connection e; End(e) represents the node ending connection e. While the vertex of PCA(DG) is: VPCA = {v | v = Start ( ei ) ∪ End ( ei ) ∪ End ( e j ) , < ei , e j >∈ PCA ) .

The edge of PCA(DG) is: Figure 5

Controllable inter-AS layer traceback

EPCA = {e | ∃ < e, e j > ∪ < ei , e >∈ PCA}.

Secondly, binary relation edge connectivity pair (ECP) is defined on EPCA, which represents by ∠ ECP :

As is shown in Figure 5, AS_PATH attribute of BGP protocol can prevent forming loops in the process of inter-AS path extraction. When suspicious path passes through uncontrollable ASs, control centre can use physical adjacency table and logical adjacency table in controllable AS to ensure the correctness of efficiency path extraction. At last: a

Tup(ASi) = ∅

b

Tup(ASi) ⊂ P

c

no elements in Tup(ASi) are in P, when any one of them above occurs, controllable inter-AS layer traceback finishes.

3.2 Intra-AS routing extraction algorithm Intra-AS routing extraction algorithm is also based on fingerprint correlation, which can extract suspicious path accurately under the cases of controllable routers physically disjoined and adjacent routers having no downstream or upstream IP. On the other hand, regarding adjacent connection pairs as edges, it uses DG transformation to solve problems of one router to many IP and loop fallacy. Besides, it uses local time relationship approach to deterministically serialise suspicious nodes without time synchronisation when having only part of the suspicious nodes correlated relationship.

1

∀ ∈ PEPCA, eiECPej.

2

∀ei, ej, ek ∈ EPCA, if there exists eiECPej and ejECPek, eiECPek is established.

From the above definitions, binary relation ECP in PCA(DG) is asymmetric and transitive. Therefore, ECP is partial order. We can equally transform serialisation problem into ∠ ECP being well order. Its sufficient conditions are: 1

PCA(DG) is vertex unidirectional link: ∀ei, ej ∈ EPCA, ei ∠ ECP e j ∪ e j ∠ ECP ei . The edge unidirectional link in DG is vertex unidirectional link in PCA(DG).

2

PCA(DG) is without self-loop: ∀e ∈ EPCA, e ∉ RSPCA, RS PCA (e) = {ei | e∠ ECp ei }. For ∠ ECP is asymmetric, PCA is without self-loop. Meanwhile, DG is without self-loop.

Connections are represented by different ‘edges’ in PCA(DG) at different time even though they have the same start point and end point. Therefore, the algorithm can solve loop fallacy and one router to many IPs problems. Consequently, the correctness and certainty of suspicious nodes serialisation will be ensured.

3.2.1 Under the condition of having all suspicious nodes correlated relationship Correlated adjacent connection pairs are connection pairs which have the relationship of time and causality. Intra-AS routing algorithm uses fingerprint to judge the relationship of adjacent connections. Based on this, it transforms DG to PCA(DG) so as to gain partial-ordered suspicious nodes sequence. The process of algorithm is: 1

For new ingress net-flow Ii or egress net-flow. Oi, if there is no self-loop, join it into the queue Q.

6

C. Lei et al.

2

Determine the correlated net-flows of ci in Q based on net-flow fingerprint.

3

Generate a new queue QC, in which all correlated net-flows are ordered by sequence.

4

Assuming that QC = c1, c2, …, cm, if ci ∈ Qc is ingress net-flow, the sequence of correlated connections is {, … }. If ci ∈ Qc is egress net-flow, the sequence of correlated connections is {, … }.

3.2.2 Under the condition of having part of suspicious nodes correlated relationship

3.3 Controllable subnet extraction algorithm Subnet consists of switches and repeaters, which use MAC address to site and forward net-flows. When suspicious source is within one controllable subnet, the MAC address of suspicious source can be determined by using forwarding table. Besides, Hub can be treated as transparent intermediary.

4

Theoretical analysis

4.1 The correctness of controllable inter-AS extraction algorithm analysis

While having only part of suspicious nodes correlated relationship increases the uncertainty of serialisation, the intra-AS routing extraction algorithm proposes local time relationship approach to reduce the uncertainty in nodes serialisation and improve the accuracy of path extraction. There exist three kinds (Tian et al., 2009; Liu and Liu, 2007) of relationship in one suspicious path. They are causality, parallel and choose relationship. As is shown in Figure 6, PCA(DG) treats nodes existing parallel and choose relationship as two different edges. Therefore, correlated connections in any exchange nodes are in pairs in PCA(DG).

By using physical adjacency table and logical adjacency table to construct inter-AS topology, NFMCT can increase the certainty and efficiency of path extraction in the case of multiple paths. Besides, by using AS_PATH attribute with the characteristic of loop free in BGP protocol, it can improve the correctness of path extraction. Assuming that the redundancy of fingerprint is r, the probability of decoding fingerprint correctly is p1, …, pk, …, pr, and the number of controllable ASs this net-flow passing through is Length. When control centre extracts suspicious path by using topology, the probability of traceback the xth controllable AS correctly is divided into two cases according to controllable inter-AS extraction algorithm:

Figure 6

1

Transformation of choose and parallel relationship

As is shown in Figure 7, the xth and the (x–1)th controllable AS are just logical adjacency. Therefore, it needs to decode at least one fingerprint successfully under the condition of the (x–1)th being traced correctly. The correct probability is: Prx = Prx −1

rx

∏ ⎡⎣1 − (1 − p ) i

i =1

In the absence of having all suspicious nodes correlated relationship, the set of suspicious nodes with only part of correlated relationship can be divided into multiple subsets which have all related information of suspicious nodes. Since all subsets are well-ordered, each can be converted into a subsequence of connections uniquely. The uncertainty of any two subsequences can be classified into two categories: the uncertainty of two disjoined ingress net-flows or two disjoined egress net-flows; the uncertainty of disjoined ingress and egress net-flows. The first case can be converted to the second case because ingress net-flows or egress net-flows are in pairs in the process of net-flow exchange. Local time relationship approach describes the tightness of any two subsequences in one suspicious path. ∀ei,x ∈ {ei,1, …, ei,s} and ∀ej,y ∈ {ej,1, …, ej,t}, if ei,x and ei,y is partial order, {ei,1, …, ei,s, ej,1, …, ej,t } is full order.

2

rx

⎤ ⎦

(1)

As is shown in Figure 8, the xth and the (x–1)th controllable AS are not only logical adjacency but also physical adjacency. Therefore, it just needs to traceback the (x–1)th controllable AS correctly. The correct probability is: Prx = Prx −1

Figure 7

The xth and the (x–1)th is only logical adjacent

(2)

Multilayer collaborative traceback technique based on net-flow fingerprint Figure 8

The xth and the (x–1)th is both logical and physical adjacent

7

vi ∠ PC vn , there is {vi, vn, …, v1}. This contradicts to the second condition in Theorem 1. Therefore, there exists PC-minimal, and ∠ PC is well-ordered when conditions 1 and 2 are both satisfied.

b

Necessity: ∠ PC is total order, because it is well order. Therefore, ∀s, t ∈ V, s ≠ t, ∃s∠ PC t ∪ t ∠ PC s, DG is unidirectional link. Assuming that there is loop in DG, there is a non-empty set V ′ ⊆ V , in which no PC-minimal is in V ′. It contradicts to the condition that ∠ PC is well order.

In conclusion, when the number of physical disjoined AS is k, the correct probability of suspicious path traceback on the whole controllable inter-AS is: PrLength =

rj ⎤ ⎡1 − (1 − pi )r1 ⎤ ⋅ ... ⋅ ⎡1 − (1 − pi )rj ⎤ ⋅ ...⎥ (3) ⎣ ⎦ ⎣ ⎦ ⎥ i =1 i =1 

⎦ r1





Length − k

4.2 The correctness of intra-AS routing extraction algorithm analysis 4.2.1 Under the condition of having all suspicious nodes correlated relationship From the characteristic of net-flow exchange, there does not exist self-loop in DG. Therefore, DG is irreflexive. The binary relation PC in DG is represented by ∠ PC , whose definition is : 1

If ∀ ∈ E, ∃PC, there exists sPCT.

2

If ∀s, t, w ∈ V, if sPCT ∩ tPCW, there exists sPCW.

Theorem 1: ∠ PC is well order if and only if the following two conditions are satisfied: 1

DG = is vertex unidirectional link.

2

There are not any loops in DG.

Proof: a

Sufficiency: For a given DG, there is no loop. So ∠ PC is asymmetric. If ∀s, t ∈ V, s∠ PC t , there exists ¬t ∠ PC s. If DG is unidirectional link, ∠ PC is transitive. Therefore, ∠ PC is partial order in V. Besides, if DG is unidirectional link, there is ∀s, t ∈ V, s ≠ t, ∃s → t ∪ t → s. In conclusion, ∠ PC is total order based on V. Assuming that ∠ PC is not well order, there is a non-empty set V ′ ⊆ V , where no PC-minimal is in V ′. While ∠ PC is total order, there is ∀s ∈ V ′, ∃u ∈ V ′. Enumerating all elements in V ′, if vi +1∠ PC vi , vi +1 ∉ {vi , ..., v1}, put vi+1 to the left side of vi. Finally, it can get {vi+1, vi, …, v1}. However, if ∃vi,

Theorem 2: The necessary and sufficient condition of deterministic serialisation of elements in VPEA is ∠ PEC being well-ordered. Proof: From the definition of PCA(DG), it is irreflexive and asymmetric. From the definition of ∠ ECP , it is irreflexive, asymmetric and transitive. Therefore, ∠ ECP is irreflexive partial order. As is known from Theorem 1, the equivalent condition of ∠ ECP being well order in EPCA is that PCA(DG) is vertex unidirectional link and there does not exist loops in PCA(DG)). Since PCA(DG) is vertex unidirectional link, it corresponds to edge unidirectional link in DG. Therefore, ∠ ECP is total order in EPCA. As for PCA(DG) = , where V = EPCA, E = PCA, ∠ ECP being total order in EPCA is equal to ∠ PC being total order in V. So, PCA(DG) is unidirectional link. As PCA is asymmetric, there is ∀v ∈ V , v → v in PCA(DG). It means there is no loop in PCA(DG).

4.2.2 Under the condition of having part of suspicious nodes correlated relationship Theorem 3: For any two of well-ordered subsequences {ei,1, …, ei,s} and {ej,1, …, ej,t} in one connection chain, ∀ei,x ∈ {ei,1, …, ei,s} and ∀ej,y ∈ {ej,1, …, ej,t}, if there exists partial order relationship in ei,x and ei,y, {ei,1, …, ei,s, ej,1, …, ej,t} is well-ordered. Proof: Assuming that ei,x happens before ei,y, if ∀ei,x ∈ {ei,1, …, ei,s}, ∈ EPCA, there is ∈ EPCA. Therefore, from the definition of partial order, {ei,1, …, ei,s, ej,1, …, ej,t} has partial order relationship. Assuming that {ei,1, …, ei,s, ej,1, …, ej,t} is not total order. From the definition of total order, ei,x+m and ei,y is not partial order. It means ei,y might happen before ei,x+m or simultaneously. If ei,y happens before ei,x+m, it means vy,x+m has outbranch, which contradicts to the condition that ingress and egress net-flows are in pairs in any nodes. If ei,y and ei,x+m happens simultaneously, there are two paths in one node at the same time, which contradicts to the premise.

8

C. Lei et al.

Therefore, for ∀ei,x ∈ {ei,1, …, ei,s} and ∀ej,y ∈ {ej,1, …, ej,t}, if ei,x happens before ei,y, all elements in {ei,1, …, ei,s} happen before ones in {ej,1, …, ej,t}. There exists similar situation when ei,y happens before ei,x. In conclusion, if the relationship of ei,x and ei,y is partial order, elements in {ei,1, …, ei,s, ej,1, …, ej,t} are well-ordered.

Controllable AS 5459. The IP of Target Server is 207.45.223.20, which is in Controllable AS 6453. Figure 9

Experimental topology and suspicious net-flow path

4.3 The computational complexity of NFMCT In controllable inter-AS layer, assuming that the number of ingress net-flow is m, and the number of egress net-flow is n, the computational complexity of controllable inter-AS extraction algorithm is O(n). In intra-AS routing layer, it is assumed that the number of suspicious nodes is v, and the number of unidirectional link is e in every TS. The number of vertex is e, and the number of edge is v/2 after DG transformation. The computational complexity of sequencing vertexes in PCA(DG) by using DG topological sequencing algorithm is O(e + v/2). On the other hand, if there are k subsequences, where the number of subsequence is v/2 at worst, to be serialised by using local time relationship approach, the computational complexity is O(k · log k). Hence, the computational complexity of intra-AS routing extraction algorithm is O((e + v/2)k · log k). For each controllable router, the computational complexity of addressing by using forwarding table is O(1). So the computational complexity of controllable subnet extraction algorithm is O(1). In conclusion, the computational complexity of NFMCT by using multilayer collaborative trackback approach is O(n + (ek + vk/2) · log k), which is much less than O(ne(e + v) · log e), the computational complexity of preliminary research (Wang et al., 2009).

5

Experimental design and results

5.1 Experimental design Experimental data used in controllable inter-AS layer comes from RouteViews (http://www.routeviews.org) project and RIPE RIS project (RIPE RIS Raw Data, http://www.ripe.net/data-tools/stats/ris/ris-raw-data), which contains RIS and BGP routing tables. The raw data is collected by RRC and BGPMon, and the format is standard routing protocol output format (.MRT). The raw data can be converted into ASCII by libbgpdump and route_btoa tools. Net-flow data used in intra-AS routing layer is collected from Cisco routers by flow-tools, and the format is arts++. Forwarding tables used in controllable subnet layer are collected from Cisco switches by using ‘display mac-address’ command line. NFMCT experiment is simulated by MATLAB. As is shown in Figure 9, the whole suspicious net-flow exchange path is suspicious source → s1 → e30 → e31 → e32 → A → B → e2 → e3 → D → e5 → subnet A → e6 → D → …… → e10 → subnet C → C → F → G → target server. The IP of suspicious information source is 195.219.96.54, which is in

5.2 Experimental results 5.2.1 Path extraction in controllable inter-AS layer TS collects neighbour tables and routing tables from border gateways in each controllable AS. As is shown in Table 3, control centre constructs inter-AS topology based on three kinds of tables from TSs. Based on constructed inter-AS topology, TS decodes fingerprint of suspicious net-flow after getting traceback request from target server. The decoded net-flow fingerprint is shown in Table 4. When there are multiple paths between two logical adjacent controllable ASs, net-flow fingerprint and inter-AS topology are used to judge which path the suspicious net-flow passes. As is shown in Figure 10, Controllable AS6453 and AS1221 are logical adjacency. There are two paths between them, that is 6453 → 1221 → 4513 and 6453 → 2497 → 4513. Controllable inter-AS extraction algorithm uses DstIP and SrcIP information to judge the path, which is G → F → C. At last, the trackback suspicious path in Controllable Inter-AS layer is Target Server → G → F → C → B → A (6453 → 1221 → 4513 → 5459). The actual suspicious connection chain in AS-level is 5459 → 4513 → 1221 → 6453. Table 3

Part of inter-AS topology Physical adjacent AS

Logical adjacent AS

5459

6453

6453

6453

5459

5459

6453

1343

1221

6453

1227

1221

1221

1343

6453

1221

1227

6453

Controllable ASN

Multilayer collaborative traceback technique based on net-flow fingerprint Table 4

Part of net-flow information

Detecting node

SendIP

Length

SrcIP

DstIP

B

A

1

A

B

G

C

2

F

G

subnet A → e6 → D → … → D → e10 → subnet C → C → …. Table 5

Information of PCA(DG) Time

EPCA

StartVPCA

f1

t1

e2

cB-2

c2–3

f1

t2

e3

c2-3

c3–D

f1

t3

D

c3–D

cD–5

Subsequence Figure 10

Path extraction in controllable inter-AS layer

5.2.2 Path extraction in intra-AS routing layer Learning from IP address tables (ipAddrTable) and routing tables (ipRoutingTable) in controllable routers (Wang et al., 2010a), there are two kinds of link relationship of ipRouteType entry. One is router-router relationship, which means the next hop is router, represented by indirect(4). The other one is router-subnet relationship, which means the next hop is hosts in subnet, represented by direct(3). The case of disjoined controllable routers can be divided into controllable router – stepping stone host – controllable router and controllable router – uncontrollable router – controllable router. In the first condition, ipRouteType is direct(3). In the second condition, ipRouteType is indirect(4). In both conditions, the previous controllable router encodes fingerprint as well as forwards it. The controllable router, receiving the net-flow with fingerprint firstly, decodes the fingerprint so as to extract suspicious path of controllable router – controllable router. In Controllable AS 4513, the intra-AS routing extraction algorithm collects suspicious routers or subnets and their correlation relationship. Vertex set of DG is made up of suspicious nodes, and edge set is made up of correlation relationship. Then intra-AS routing extraction algorithm transforms DG to PCA(DG) and gets information shown in Table 5. The subsequence f1 in PCA(DG) is D → e6 → subnet A → e5 → D → e3 → e2. The subsequence f2 in PCA(DG) is subnet C→e10→D. For subsequence f1 and f2 are in the same connection chain, the only correlated relationship between f1 and f2 is Df1 → Df2. Therefore, the trackback suspicious sequence in PCA(DG) is subnet C → e10 → D → … → e6 → subnet A → e5 → D → e3 → e2 by using local time relationship approach. As is shown in Figure 11, by transforming PCA(DG) into DG, the trackback suspicious path of AS 4513 is C → subnet C → e10 → D → … → D → e6 → subnet A → e5 → D → e3 → e2 → B. The actual suspicious connection chain in Controllable AS 4513 is … → B → e2 → e3 → D → e5 →

9

EndVPCA

f1

t4

e5

cD–5

c5–subnetA

f1

t5

Subnet A

c5–subnetA

csubnetA–6

f1

t6

e6

csubnetA–6

c6–D

f1

t7

D

c6–D

cD–

f2

t8

D

c–D

cD–10

f2

t9

e10

cD–10

c10–subnetC

f2

t10

Subnet C

c10–subnetC

csubnet C–C

Figure 12

Path extraction in controllable subnet layer

5.2.3 Path extraction in controllable subnet layer As is shown in Figure 12, when suspicious net-flow passes router e20, it comes into the corresponding subnet 195.219.96.0/23. By inquiring forwarding tables of switch s1 and s2, controllable subnet extraction algorithm can get MAC address of suspicious source, which is 20-CF-30-18-B8-85. Finally, NFMCT locates the suspicious source IP, which is 195.219.96.54. The suspicious path in controllable subnet is e30 → s1 → 195.219.96.54. In conclusion, the whole trackback path by using NFMCT is 207.45.223.20 → G → F → C → subnet C → e10 → D → … → D → e6 → subnet A → e5 → D → e3 → e2 → B → A → e32 → e31 → e30 → s1 → 195.219.96.54, which is consistent with the real suspicious path given above.

6

Conclusions

Under the circumstances of packet encryption, low latency, multi-flow intersection and constraint network resource in net-flow exchange, NFMCT achieves suspicious

10

C. Lei et al.

information source trackback correctly and efficiently based on obtaining correlated relationship of suspicious nodes by using net-flow fingerprint. Section 1 mainly introduces the background of trackback technique of stepping stones and anonymous systems. Section 2 expounds related work of trackback techniques based on net-flow fingerprints. Section 3 proposes the NFMCT scheme. Aimed at local overload problem caused by high computational complexity, NFMCT proposes multilayer collaboration method, which traces suspicious path from controllable inter-AS layer, intra-AS routing layer and controllable subnet layer, to decentralising computational complexity of traceback server. Aimed at loop fallacy and one router to many IPs problems, controllable inter-AS extraction algorithm prevents form loops based on BGP protocol properties, and intra-AS routing extraction algorithm solves those problems by DG transformation. For indeterminate serialisation problem in suspicious path extraction, controllable inter-AS extraction algorithm extracts inter-AS suspicious path based on constructed inter-AS topology, and intra-AS routing extraction algorithm solves that problem, especially under the condition of having only part of suspicious nodes correlated relationship, by local time relationship approach. What is more, NFMCT uses controllable subnet extraction algorithm to accurately locate suspicious source. In Sections 5 and 6, theoretical analysis and experiments prove the high accuracy of locating suspicious source and correctness of suspicious path extraction of NFMCT. We further considered the design of net-flow fingerprint and the deployment of encoders and decoders. We demonstrate that: 1

The design of net-flow fingerprint should be focused on not only the content of it, but also the format of netflow fingerprint. Owing to the capacity of different carriers of net-flow, the format should be generality. Besides, aimed at different layer, the content of fingerprint should be different.

2

The research of deployment of encoders and decoders has not carried out yet. There should be some rules and policies of deployment, which will be pertinent to different situation, such as anonymous linkage, stepping stone detection and so on.

In conclusion, our analytical results reveal some new insights into the difficulty of trackback in net-flow exchange.

Acknowledgements We would like to thank the help of Mr. Houmansadr Amir for his suggestions on an earlier version of the paper. This work is supported by the National Basic Research Program of 973 Program of China (2011CB311801); the National High Technology Research and Development Program of China (863 Program) (2012AA012704) and Zhengzhou Science and Technology Talents (131PLKRC644).

References Afanasyev, A., Tilley, N., Longstaff, B. et al. (2010) ‘BGP routing table: trends and challenges’, Proc. of High Technologies and Intellectual System Conference. Al-Qudah, Z., Al-Duwairi, B. and Al-Khaleel, O. (2014) ‘DDoS protection as a service: hiding behind the giants’, International Journal of Computational Science and Engineering, Vol. 9, No. 4, pp.292–300. Carmi, S., Havlin, S., Kirkpatrick, S. et al. (2007) ‘A model of internet topology using k-shell decomposition’, Proceedings of the National Academy of Sciences, Vol. 104, No. 27, pp.11150–11154. Dimitropoulos, X., Krioukov, D., Fomenkov, M. et al. (2007) ‘AS relationships: inference and validation’, ACM SIGCOMM Computer Communication Review, Vol. 37, No. 1, pp.29–40. Dimitropoulos, X., Krioukov, D., Vahdat, A. et al. (2009) ‘Graph annotations in modeling complex network topologies’, ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 19, No. 4, p.17. Goodrich, M.T. (2008) ‘Probabilistic packet marking for large-scale IP traceback’, IEEE/ACM Transactions on Networking, Vol. 16, No. 1, pp.15–24. Guo, H., Yang, B., Lan, J. et al. (2011) ‘Hierarchy analysis and modeling on the internet AS-level topology’, Journal on Communications, Vol. 32, No. 9, pp.182–190. Haas, J. and Mitchell, J. (2013) Last Autonomous System (AS) Reservations [online] http://tools.ietf.org/html/jhjm-idr-lastas-reservations-00.txt (accessed 1 January 2014). Hilgenstieler, E., Duarte Jr., E.P., Mansfield-Keeni, G. et al. (2010) ‘Extensions to the source path isolation engine for precise and efficient log-based IP traceback’, Computers & Security, Vol. 29, No. 4, pp.383–392. Houmansadr, A. and Borisov, N. (2011) ‘SWIRL: a scalable watermark to detect correlated network flows’, NDSS. Huang, Q., Xiong, W., Yang, X. et al. (2011) ‘Hierarchical stateless single-packet IP traceback technique’, Journal of Communications, Vol. 32, No. 3, pp.150–157. Lei, C., Zhang, H., Sun, Y. et al. (2013) ‘Survey of network flow identification technology’, Application Research of Computers, Vol. 30, No. 10, pp.2891–2895. Liu, M. and Liu, Y. (2007) ‘Stepping stone attack source traceback technique based on event integration’, Journal of Computer Security, Vol. 2, No. 2, pp.4–8. Mao, G. and Zong, D. (2009) ‘An intrusion detection model based on mining multi-dimension data streams’, Journal of Computer Research and Development, Vol. 7, No. 4, pp.602–609. Munz, G. and Carle, G. (2007) ‘Real-time analysis of flow data for network attack detection’, 10th IFIP/IEEE International Symposium on Integrated Network Management, 2007, IM’07, IEEE, pp.100–108. National Laboratory for Applied Network Research (2005) AS Path Length [EB/OL] [online] http://moat.nalanr.net/ASPL (accessed 19 June 2013). Omar, M.N., Maarof, M.A. and Zainal, A. (2004) ‘Solving time gap problems through the optimization of detecting stepping stone algorithm’, The Fourth International Conference on Computer and Information Technology, 2004. CIT’04, IEEE, pp.391–396.

Multilayer collaborative traceback technique based on net-flow fingerprint Omar, M.N., Siregar, L. and Budiarto, R. (2008) ‘Dropped packet problems in stepping-stone detection’, International Journal of Computer Science and Network Security, Vol. 8, No. 2, pp.109–115. Pyun, Y.J., Park, Y.H., Wang, X. et al. (2007) ‘Tracing traffic through intermediate hosts that repacketize flows’, INFOCOM 2007. 26th IEEE International Conference on Computer Communications, IEEE, p.634. RIPE RIS Raw Data [online] http://www.ripe.net/datatools/stats/ris/ris-raw-data (accessed 13 July 2013). RouteViews, O. University of Oregon RouteViews Project, Eugene, OR [online] http://www.routeviews.org (accessed 3 September 2012). Tian, Z., Zhang, Y., Zhang, W. et al. (2009) ‘An adaptive alert correlation method based on pattern mining and clustering analysis’, Journal of Computer Research and Development, Vol. 9, No. 8, pp.1304–1315. Wang, T., Li, J. and Chen, H. (2010a) ‘Algorithm for physical topology discovery in multi-subnet networks’, Journal of Chinese Computer Systems, Vol. 2, No. 2, pp.107–112. Wang, X., Luo, J. and Yang, M. (2010b) ‘A double interval centroid-based watermark for network flow traceback’, 2010 14th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, p.146.

11

Wang, X., Luo, J. and Yang, M. (2009) ‘An interval centroid based spread spectrum watermark for tracing multiple network flows’, IEEE International Conference on Systems, Man and Cybernetics, 2009, SMC 2009, IEEE, p.4000. Wang, X., Reeves, D.S. and Wu, S.F. (2002) ‘Inter-packet delay based correlation for tracing encrypted connections through stepping stones’, Computer Security – ESORICS 2002, Springer, Berlin, Heidelberg, pp.244–263. Xiang, Y. and Zhou, W. (2006) ‘Protecting information infrastructure from DDoS attacks by MADF’, International Journal of High Performance Computing and Networking, Vol. 4, Nos. 5/6, pp.357–367. Yang, F., Zhou, X., Zhang, Q. et al. (2009) ‘A practical traceback mechanism in wireless sensor networks’, Acta Electronica Sinica, Vol. 37, No. 1, pp.202–206. Yang, M.H. and Yang, M.C. (2012) ‘RIHT: a novel hybrid IP traceback scheme’, IEEE Transactions on Information Forensics and Security, Vol. 7, No. 2, pp.789–797. Zhang, J., Mao, J. and Xu, Y. (2013) ‘On the security of an ID-based anonymous proxy signature scheme and its improved scheme’, International Journal of Embedded Systems, Vol. 5, No. 3, pp.181–188. Zhou, M., Yang, J., Liu, H. et al. (2009) ‘Modeling the complex internet topology’, Journal of Software, Vol. 20, No. 1, pp.109–123.