A Novel Visualization Approach for Efficient Network-wide Traffic ...

A Novel Visualization Approach for Efficient Network-wide Traffic Monitoring Taghrid Samak, Adel El-Atawy, Ehab Al-Shaer,

Mohamed Ismail

School of Computer Science DePaul University Chicago, Illinois 60604 Email: {taghrid,aelatawy,ehab}@cs.depaul.edu

Computer and Systems Engineering Faculty of Engineering Alexandria University Alexandria, Egypt

Abstract— Network traffic visualization provides very effective means for monitoring anomalous activities as well as detecting large scale network attacks. This work proposes a novel and flexible technique for representing traffic activities that reside in network flows and their patterns. The technique utilizes a set of different SpaceFilling Curves (SFC) to map the collected statistics to images that emphasize traffic patterns. Our approach to use the enhanced locality of SFC clustering property makes anomalies such as large scale DDoS attacks and scanning activities easily identifiable, compared to other traditional techniques. Also, widely dispersed communication patterns are rendered easier to understand using our proposed traffic-to-image mappings. This new representation preserves traffic properties leading to more accurate and robust anomaly detection even if aggressive compression is performed on the resulting images. In addition, using our proposed technique, the relation between multiple packet fields can be easily obtained to analyze correlated attacks.

I. I NTRODUCTION As network based attacks become more and more distributed, diverse and cunning, the need for more intelligent and dynamic ever changing analysis tools to detect large-scale attacks becomes more crucial. Network traffic analysis is a cornerstone in the analysis and detection of network attacks as well as traffic anomalies. One of the most pressing challenges in traffic analysis is real-time processing of large volume of traffic data in order to obtain an intuitive understanding of the network patterns. The large amount of traffic through the network makes it hard to analyze as is. The need for efficient representations for traffic statistics is invaluable for having practical analysis techniques. However, selecting an appropriate mapping is a key issue for identifying traffic anomalies particularly large-scale and low-bandwidth DDoS attacks and scanning activities.

Traffic is composed of a set of flows. Each flow is characterized by a set of flow parameters (i.e., source/destination address and port). The source port is rarely (if ever) used to identify a flow/session, but the rest of the parameters have a high importance in understanding the traffic distribution and the clients’ activities. Each flow will in turn be composed of one or more packets, not necessarily the same size, but all will contain the same set of identifying parameters of their flow. In this work we represent network traffic as images or video frames, sampled over short time intervals. A set of packets that belongs to the same time slot constitute a single image/frame and will count towards forming the intensity of the image pixels. The traffic analysis is performed over a set of sequential frames (time slots). The dimensions of an image are dynamically selected according to the anomaly type/test to be performed (e.g., source vs destination address to detect DDoS attacks). Each pixel in the image corresponds to a specific value in one (or a vector of) packet header fields. The intensity of such a pixel will represent the number of packets that arrived during the time slot investigated and carrying the pixel’s designated field values. The mapping from packets’ flow properties to pixel coordinates is performed using space filling curves described in [12]. Those curves were chosen to preserve locality in converting traffic to intensities. The locality property is a key to make attacks such as address scanning or DDoS identifiable even if they are dispersed and not perfectly contiguous. First, we study a trivial mapping which just scan the dimension from left to right [4]. Another simple mapping will be considered also (C-Scan) which will preserve continuity of traffic. Then, we use Lebesgue as well as Hilbert curves in our traffic mapping. We finally show and justify the use of SFC in robust attack detection techniques.

Our new approach enables a sensing/monitoring agent to generate images that will preserve the main properties even after being aggressively compressed and scaled down. When representing multi field statistics, the resulting images are huge in size and communicating them every reasonable amount of time to the analyzing engine will impose a huge bandwidth overhead. Simple scaling down of resulting images will dilute the activities if locality is not preserved over most (if not all) dimensions of the collected data. Thus, using a continuous mapping as SFCs will guarantee to a great extent that when scaling down/averaging images, it will be mostly the case that common activities will be aggregated together, causing the whole image to preserve most of its features. The advantage of our technique is strongly manifested when these images are analyzed in the frequency domain. Images that will result with naive mapping techniques will tend to have high frequency content, which will add difficulties in processing and loss of information when lossy compression is used. The compression availability will not only help processing large scale traffic, but it will also be used in distributed processing. Agents are with a limited view and are unable to see compound attacks. Therefore, compressing the agent results as images, and sending them over the network for realtime centralized processing is a must, and efficient compression of these images is a crucial step for the feasibility of such approach. The rest of the paper is organized as follows. Section II presents some of the related work in traffic visualization along with other traffic representations. In section III we present the proposed system. A formal description of our approach will be discussed in section IV along with our proposed traffic-to-image mappings for single-field and multi-field visualization respectively. In section V the different mappings will be compared and evaluated in the case of single and multi-field mappings. Our conclusion and future work are presented in section VI. II. R ELATED W ORK Anomaly detection along with traffic representations have been studied widely in the literature, [5]and [14]. Here we will focus on visual representations of network traffic described mostly in visualization tools. CISCO NetFlow Analyzer: It was designed to analyze NetFlow data. It aimed mainly at analyzing bandwidth utilization. It has the capability to create reports and generate alerts based on bandwidth usage which may be an indicator to an attack.

NetViewer: A visualization tool with detection mechanism. It maps the network traffic to images in realtime. The mapping used is quite simple (naive sweeping). The featured detection mechanism uses image processing analysis for object detection and tracking as well as future predictions. The details of this approach can be found at [4]. Other visualization tools have been developed, few of them considered the locality property which is important in detecting scanning activity. In [10] a visualization methodology was proposed to characterize scanning activity. They used Wavelet analysis to improve the representation. The root polar coordinate mapping was used in [2] to better represent the concentricity of IP distribution, considering inner most positions to be home and trusted. The trust value decreases with large radius. Visualization for internet routing anomaly detection was investigated in [13]. This paper integrated both visual and automated data mining methods for discovering BGP anomalies. Port scan activities visualization was studied in detail in [11]. Many visualization tools used grid representation to visualize source/destination IP relation; [1], [6], [9]. In [3], space filling curves among other techniques were used to investigate the possibility of displaying large amount of data on the limited screen display while avoiding over-plotting. SFC were used to find the nearest unoccupied pixel to place the overflowed data points. However, the main difference between their use of SFCs for visualization and our approach is that in our case the data points have a predefined size and there is no overplotting problem, but the order of data points as well as their locality properties have to be strictly maintained. As opposite to the previously discussed work that places hosts on a grid and represent communication as a link between those nodes (graph based representation), our visualization is based on constructing a grid of source and destination data on x − axis and y − axis respectively. The communication or amount of traffic is projected on the grid by the intensity value of the < x, y > point. High intensities correspond to heavy traffic, and vice verse. In our case, there is no collision in placing hosts or calculating neighboring nodes. Visualizing multi-dimensional data would be easier in our approach as well. III. S YSTEM D ESIGN The system in its final form will be composed of a centralized processing and analysis engine, and a network of sensor agents. Every agent is responsible for

collecting traffic statistics from its local position, digest it into an image, compress it and ship it to the engine. Preprocessing Module

Decision/ Analysis Module

Video Generation Module

Input Aggregation Module

Repository

IV. T RAFFIC - TO -I MAGE M APPING The general form of the mapping should convert the flow properties in the form of a series of one or more dimensional tuples (source IP, destination IP, source port, destination port...) to a final two dimensional form. Figure 2 shows module input/output data. The mapping module converts n-Field data to 2D image (n-D data to 2-D). v1 v2

Mapping Module

n Fields Data Collecting Agent

Data Collecting Agent

vn

Data Collecting Agent

Fig. 2. Fig. 1.

2D Image

Sensor-side block diagram.

Overall System Design

The system components are shown in figure 1: a) Data Collection Agent: The data collection agents are the sensing modules that will collect real time statistics from the link where each one is installed. Over specific time windows the statistics are visualized as images, then aggressively compressed. The compressed images are then sent to the engine for analysis. b) Aggregation Module: This module receives data (i.e., compressed images) from the sensors and prepare them for future analysis. The preprocessing task includes, intensity adjustment and normalization if needed, scale unification, and time synchronization. c) Video Generation Module: This module converts the given multiple streams of images into a single video stream that represents the state of the whole system. Also, byproducts of the video formation process are passed along, as scene change, peak intensity locations, etc. d) Preprocessing Module: Further annotations are attached to the video stream in this step. Information about objects in the video are extracted, as object location/trajectory/speed, brightness, size, etc. e) Decision/Analysis Module: Using all of the provided information from previous modules, we can assess the activity of being normal or anomalous. The engine will need a repository for saving known patterns, special cases, system history, etc. In this work, we focus on the visualization phase of the system. We will present the traffic-to-image mapping formalization and visualization samples.

The series of samples that represent packet information is aggregated to form a histogram representing the frequency of occurrence of each one of the possible tuple values (output from aggregation module). Then, to perform the overall mapping, we need to convert the ndimensional data (frequency table) to one dimensional representation. Single field (serialized) representation may then be easily converted to 2D representation (image). The purpose of this mapping is to calculate the position of each element of the field domain values on the 2D image. After each position is identified, the intensity value of this pixel is defined as the number of data samples (field value vectors) that contained the set of values of this position. Calculating (x, y) position on the image is performed via the following onto mappings: M1 : V1 × V2 × V3 × . . . × Vn −→ S

(1)

M2 :

(2)

S

−→ X × Y

where: n is the number of dimensions, Vi is the domain of dimension i, S = 0, 1, . . . , N − 1, N = |V1 | × |V2 | . . . × |Vn |, and (X, Y ) is the pixel position on the image The intensity value of each pixel on the image is calculated by: I(x, y)

=

C(v1 , v2 , v3 , . . . , vn )

(3)

where vi ∈ Vi and C(. . .) is the number of data samples that contained the set of values < v1 , v2 , v3 , . . . , vn > Space-Filling curves can be used for both M1 and M2 .

A. Space Filling Curves A Space-Filling Curve (SFC) is a way for mapping multidimensional space into one-dimensional space and vice verse [12]. One of the most desired properties from such mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. We use four types of SFC to convert network traffic measurements to 2D images as shown in figure 3 1) Naive Scan Curve 2) C-Scan Curve 3) Lebesgue Curve 4) Hilbert Curve Other SFC (e.g., Peano) have also the same properties but traverse the space on base-3 values (i.e., resulting in images 3k × 3k ), which will not be appropriate for our application where we need the size of the images to be of the form 2k × 2k .

robust against down-scaling and lossy compression operation which is highly desirable for communicating the collected/generated images amongst sensing modules. In a more complex system, the images collected from multiple sensing/mapping modules will be scaled down/compressed and communicated to a centralized analysis module that can correlate light unnoticed activities from single sensors into a near complete comprehension of the network traffic activity status. The following two subsections apply the mappings described above to visualize network flows. The dimensions to be mapped are the packet header fields. Each header field will represent one dimension (Vi ) in equation 1. The traffic is analyzed on a window-basis. Each window represents an image with consecutive images as a video stream. The intensity value of pixel (xi , yi ) is calculated from equation 3, with C(. . .) is the number of packets that contained the values < v1 , . . . , vn > normalized over the window period. The selection of M1 and M2 depends on the fields being visualized along with which activities being monitored. B. Single Field Visualization

(a) Naive Scan.

(b) C-Scan.

(c) Lebesgue.

(d) Hilbert.

Fig. 3.

Space-Filling Curves.

In this section, the mapping (and the analysis after that) is performed on each of the dimensions of the original data independently. Each of the flow properties (source IP/port, destination IP/port) should be visualized independently. However, analyzing the protocol field does not need real visualization because of the very limited values that can be assigned to that field. Since we are concerned only with one dimension, only equation 2 will be used (i.e., statistics already serialized). A Space-Filling mapping is used in this case for M2 (M2 ≡ SF C ). SF C : V −→ X × Y

Studying the clustering properties of these curves shows that Hilbert curve preserve the locality more than other presented curves [8]. The performance of spacefilling curves was studied in [7]. Hilbert curve was shown to have the best contiguity property with the fewest jumps, which makes it a better candidate to asses the locality of the traffic. As consecutive values are represented by clusters in those mappings, they will be easier to capture visually. As will be shown later in the results, the preservation of locality in a clustering manner is much better than mere preservation of the continuity of the curve. This property renders the image’s main features more

(4)

where: V is the domain of the field. Each entry of the histogram will become the intensity of the corresponding pixel in the generated image. In Figure 4, we can see how consecutive values will be mapped from the one-dimensional histogram into the two dimensional plane. Using the same approach mentioned in [4], we will visualize each byte of the IP address separately, each as an image of size 256 (16×16). Splitting the image representing the IP address into four quadrants, simplifies the analysis. It also involves much lesser space consumption. Figure 5 shows the IP-separated space.

5%

5

6

9

10

5

11

4

7

15

13

4%

8

7

12 6

3

2 13

12

0

1 14

15

Percentage of traffic

4

14

1

3

0

2

9

8

11

10

3%

2%

1%

(a) Hilbert SFC Fig. 4.

(b) Lebesgue

0%

50

100 150 Source IP (byte 3)

Mapping serialized data into the 2D plane.

Byte 3

Byte 1

Byte 2

250

(a) Histogram 2

2

4

4

6

6

8

8

10

10

12

12

14

14

Byte 0 16

16 2

Fig. 5.

200

IP Bytes Separation.

The port number is handled in the same way by splitting it into two images, the high and low-byte images. In most cases, the image that corresponds to the lowest byte in both the port number and IP address is the only one with interesting features, the same observation mentioned in [4]. C. n-Field Visualization Visualizing multiple packet header fields is a direct application of both equations 1 and 2. SFC will be used for M2 as in the case of single field. For M1 , SFC will also be used to serialize the multi-field data. In practice, visualizing the source vs destination address are of the most interest to administrators. Adding to the two of them information about the port number can also be used. However, as the dimensions used increase, the size of the final image explodes. As mentioned earlier, the mappings used will facilitate the scaling down of the final images without dramatic loss of information. This will be evident in the examples shown in the evaluation section (Section V). V. E VALUATION A. Single Field Results This section presents the results of visualizing single field statistics, each one independently of all the others. The visualization is performed using Naive Scan, C-scan, Hilbert and Lebesgue mappings. In figure 6, a snapshot of normal traffic was visualized over source IP. It can be seen that in any of the four representations, there is

4

6

8

10

12

14

16

2

(b) Naive Scan mapping

6

8

10

12

14

16

(c) C-Scan mapping

2

2

4

4

6

6

8

8

10

10

12

12

14

14

16

4

16 2

4

6

8

10

12

14

(d) Hilbert mapping Fig. 6.

16

2

4

6

8

10

12

14

16

(e) Lebesgue mapping

Example of Normal Traffic Activity.

no seen objects. However, in figure 7 we can see clear objects in the four representations. These objects (lines in naive scan, and C-scan, and blobs in Lebesgue and Hilbert) represent the high intensity traffic that is visible in the histogram in figure 7(a). The clustering property is visible in the images using SFC mappings. Object recognition techniques will perform better with these objects than the corresponding lines that appear in the case of the scan mappings. Moreover, note the high vertical frequency that will be generated if such images are converted to the frequency domain. The sharp rise and fall that will take place in the images with the scan mappings will be problematic if lossy compression is to be applied to them. B. Multi Field Results We applied the proposed mapping methodology to the same two traces used when visualizing single field. Here, we use two fields, the source and destination IP. As mentioned earlier, the lowest byte is the one with the

7%

Percentage of traffic

6% 5%

50

50

100

100

150

150

200

200

4% 3% 2% 1% 0%

50

100 150 Source IP (byte 3)

200

250

2

2

4

4

6

6

250

250

(a) Histogram

50

100

150

200

50

250

(a) N-Scan-N-Scan mapping

8 10

12

12

14

14

1

1

16

16

2

2

3

3

4

4

5

5

6

6

7

7

6

8

10

12

14

16

2

(b) Naive Scan mapping 2

200

250

Fig. 8. Example of Anomalous Traffic Activity using two-field visualization.

8

4

150

(b) N-Scan-Hilbert mapping

10

2

100

4

6

8

10

12

14

16

(c) C-Scan mapping 2

4

4

6

6

8

8

8

10

10

12

12

14

14

16 4

6

8

10

12

14

(d) Hilbert mapping Fig. 7.

16

4

6

8

(a) N-Scan-N-Scan mapping

16 2

8 2

2

4

6

8

10

12

14

16

(e) Lebesgue mapping

2

4

6

8

(b) N-Scan-Hilbert mapping

Fig. 9. Example of Anomalous Traffic Activity using two-field visualization after scaling down.

Example of Anomalous Traffic Activity.

most interesting characteristics. So, our space is now a 256 × 256, resulting in a 256 × 256 pixel image. Obviously, space is becoming a serious issue with such a relatively large image. Processing will be more time consuming as well as transmitting the image periodically to the central analyzing engine. Sending whole images to the engine from multiple agents will be like an attack on the engine from within. First, we visualize the source vs destination addresses using two combinations for the M1 and M2 mappings. We selected naive scan for serialization for both examples, followed by another naive scan in the first example, and by Hilbert SFC in the second one. The first image can be thought of as the raw 2D Histogram of the source×destination statistics. As shown in figure 8 , the clustering property is very evident in the case of the second image. Both images show the anomalous activity clearly. However, when we scale down these images into

a dramatically lower resolution: 256 × 256 → 8 × 8 (Figure 9), the second image still retains its main feature, while the first nearly diluted the anomaly beyond definite recognition. Even after this decreasing the space requirements for the image by a factor of 1024, the SFC mapping made the image robust enough and capable of resisting this ruining degradation of quality. More gains will be obtained if more than two fields are to be combined together. VI. C ONCLUSION

AND

F UTURE W ORK

In this paper, we presented a new approach for representing network traffic using Space-Filling Curves. Single-Field, as well as multi-field, statistics about the network traffic packets can be visualized into 2D images that have superior properties over those visualized directly from the activity matrix (e.g., between source and destination IP). The generated images have two important properties that improve anomaly detection; (1) locality of activities is enhanced, instead of horizontal/vertical lines activities tend to be clustered in more easily perceived blocks (objects) by humans

as well as image processing techniques, (2) avoiding generating lines was shown to be effective against strong lossy compression techniques, averaging/scaling down, as well as, against noise removal preprocessing algorithms. This work can be extended in different directions: 1) The decision and analysis can be automated by using image processing techniques to perform object detection on single image and tracking objects for attack prediction. 2) Using frequency domain analysis tools can yield more effective data transfer between the sensors and the engine. Also, spectrum analysis of images generated using different SFCs might provide interesting insights to the behavior of these mappings. 3) Investigating the trade-off between bandwidth usage and representation accuracy. As agents save their link bandwidth by aggressively compressing their result images, the analysis agent gets less information to work with. 4) Investigating possible ways to perform the image combination phase in a distributed fashion to enhance the scalability of the system.

R EFERENCES [1] CCS Workshop on Visualization and Data Mining for Computer Security. PortVis: A Tool for Port-Based Detection of Security Events, 2004. [2] G. A. Fink and C. North. Root polar layout of internet address data for security administration. In VizSEC. IEEE Computer Society, 2005.

[3] Daniel A. Keim and Annemarie Herrmann. The gridfit algorithm: An efficient and effective approach to visualizing large amounts of spatial data. vis, 00:181, 1998. [4] S.S. Kim and A.L. Reddy. A study of analyzing network traffic as images in real-time. In INFOCOM, 2005. [5] Anukool Lakhina, Mark Crovella, and Christophe Diot. Mining anomalies using traffic feature distributions. In Proceedings of SIGCOMM 2005, pages 217–228, Philadelphia, PA, August 2005. [6] K. Lakkaraju, W. Yurcik, and A. J. Lee. Nvisionip: netflow visualizations of system state for security situational awareness. In VizSEC/DMSEC ’04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, pages 65–72, New York, NY, USA, 2004. ACM Press. [7] M. F. Mokbel, W. G. Aref, and I. Kamel. Performance of multidimensional space-filling curves. In GIS ’02: Proceedings of the 10th ACM international symposium on Advances in geographic information systems, pages 149–154, New York, NY, USA, 2002. ACM Press. [8] B. Moon, H.V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. In IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, volume 13, pages 124–141, January/February 2001. [9] Robert J. Moorhead, Markus Gross, and Kenneth I. Joy, editors. Case Study: Interactive Visualization for Internet Security. IEEE Visualization 2002 Conference, IEEE Computer Science Press, 2002. [10] C. Muelder, K. Ma, and T. Bartoletti. A visualization methodology for characterization of network scans. In VizSEC. IEEE Computer Society, 2005. [11] C. Muelder, K.L. Ma, and T. Bartoletti. Interactive visualization for network and port scan detection. In RAID, pages 265–283, 2005. [12] Hans Sagan. Space-Filling Curves. Springer-Verlag, 1994. [13] S. T. Teoh, K. Zhang, S.M. Tseng, K.L. Ma, and S. Felix Wu. Combining visual and automated data mining for near-real-time anomaly detection and analysis in bgp. In VizSEC/DMSEC ’04: Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, pages 35–44, New York, NY, USA, 2004. ACM Press. [14] Yin Zhang, Zihui Ge, Albert Greenberg, and Matthew Roughan. Network anomography. In ACM/Usenix Internet Measurement Conference (IMC) 2005, Berkeley, CA, October 2005.