Anomaly Prediction in Network Traffic using Adaptive Wiener Filtering ...

4 downloads 944 Views 3MB Size Report
FlowScans are becoming a more efficient way to monitor network traffic. Gong [8] suggests methods in which the. NetFlows can be used to detect worms and ...
Anomaly Prediction in Network Traffic using Adaptive Wiener Filtering and ARMA Modeling Mehmet Celenk, Thomas Conley, James Graham, and John Willis School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701 USA {celenk, conleyt, jg193404, jw174304} @ohio.edu Abstract— Fast and efficient detection of anomalies is essential for maintaining a robust and secure network. This research presents a method of anomaly detection based on adaptive Wiener filtering of noise followed by ARMA modeling of network flow data. We dynamically calculate noise and traffic signal statistics using network-monitoring metrics for traffic features such as average port, high port, server ports, and peered ports. The underlying approach is tested on near-real-time Internet traffic in the widearea network (WAN) of Ohio University. The average port feature is determined to be the most informative measure in the estimation process. High port, server ports, and peered ports are used for confirmation of the anomaly detection result. We empirically determine that most of the network features obey Gaussian-like distributions. Experiments reveal that the method is highly effective in predicting anomalies in network traffic flow and preventing any hazard that they may cause. Keywords—Network anomalies, network security, Wiener filtering, ARMA modeling, adaptive digital anomaly predictor, majority voting

I. INTRODUCTION Firewalls and intrusion detection devices are the primary way of protecting today’s modern enterprise networks from a host of network anomalies such as viruses, worms, scanners, and denial of service from botnets. The defenses rely on detection of attacks after they have begun affecting the targeted network. Existing methods are able to identify specific packets, which match a known pattern or originate from a known location but these signature-based systems fail to detect unknown anomalies. An anomaly might be old an attack that has changed in some way, to avoid detection, or it could be a completely new form of attack. Significant research has been devoted to the task of identifying network anomalies using methods from statistical signal analysis and pattern recognition theory. Relevant work includes papers by Kwitt and Hoffman [1] and Shen et al. [2]. Their papers dealt with anomaly detection using a robust PCA (principal component analysis) and metrics of aggregated network behavior. Additionally, the approach undertaken by Karasaridis et al. [11] deals primarily with the detection of botnets. The work described in [10] considers the detection of network intrusions in covariance space using pattern recognition methods. Sang and Li [12] describe how far into the future one can predict network traffic by employing ARMA (auto-regressive moving average) as a model. Similarly, Cho et al. [6] describes a method in which near-real-time network traffic data can be measured and filtered utilizing the Patricia tree and LRU (least recently used)

replacement policy. In the work of Pang et al. [9], they examine known anomalies and possible ways for detecting them by filtering data to reduce load on the system. Feldman et al. [3] proposed a cascade-based approach that dealt with multifractal behavior in network traffic. The paper also describes a way of detecting network problems using their system. Yurcik and Li [4] and Plonka [5] demonstrate how using NetFlows and FlowScans are becoming a more efficient way to monitor network traffic. Gong [8] suggests methods in which the NetFlows can be used to detect worms and other types of intrusion into a network. Although network flow data is easily attained and contains a useful set of features, there is often too much information to maintain for a long time, so it is essential to analyze this data “on-the-fly” or in near-real-time mode. None of the studies described above, provides a method of predicting an attack before it occurs. In this work, we aim to predict network anomalies before they are detectable by existing methods. To this end, we statistically analyze network flow data and apply Weiner filtering to reduce normal traffic. This, in turn, helps us identify the signal corresponding to network anomalies in the selected feature measurement, which characterizes the network flow in that dimension. By estimating the auto-correlation function of normal traffic, the ARMA (auto-regressive moving average) predictor [13] is devised using the well-known YuleWalker regression. [14] In the following sections we describe the approach and the results achieved in our attempt to predict network traffic anomalies. Section 2 is devoted to the methods undertaken, and describes the necessary background and structure of the proposed algorithm. The experimental test-bed and computer results are described in detail in Section 3. And remaining discussion is devoted to conclusions, future work, and potential applications. II.

DESCRIPTION OF THE OVERALL APPROACH

In this work, propose a method of defense based on the prediction of anomalies before they can have their adverse affect. The mechanism is intended to function out-of-band on a network connection carrying massive amounts of traffic and a large number of connections. Research was conducted using network data captured on the Internet connection of a heavily populated /16 and /18 network using a network monitoring tool Argus [7], which provides real time measurements for flow, connectivity, capacity, demand, loss, delay, and jitter on a per transaction basis. [5] This traffic information can be simply



measured used or it can be enhanced with specific knowledge about the network. For instance, when measuring port usage an engineer may weight some ports higher than others. This heuristic information is factored into the weighted parameters of the system.



In general, a network anomaly tracking metric is a p T dimensional vector, x = [xi , x2 ,..., x p ] where T denotes matrix transposition and p is set at the discretion of the network engineers. We illustrate this in the following sections by describing adaptive anomaly detection on a single dimension xi using Wiener  € filtering and ARMA modeling, and extend this to the set X using weighted majority voting. It is preferable to predict an anomaly and prevent the attack before its onset rather than to wait for the attack to have an adverse affect. We show that the network flow features mentioned earlier can be used to predict network anomalies at the reconnaissance or preparatory phase, or very early at the onset of the attack. € Our research focuses on a limited set of features (see Table I) pertaining to overall port usage and throughput. Port usage is € considered a primary indicator of type of activity on the network [9] and throughput statistics are used to measure the € magnitude of any network event. These characteristics were € € purposefully chosen because they are not highly specific and do not target specific addresses or ranges. We theorize that by concentrating on discriminating, but general, features we will be able to predict an anomaly without a priori knowledge of any specific activity.



A. Adaptive digital anomaly predictor (ADAP) First, we consider a single component, xi , of the pdimensional traffic feature set as defined above. This measured input is modeled as a linear combination of normal€traffic signal, si (n) and anomalous traffic noise, η(n) , expressed mathematically by €





xi (n) = si (n) + η(n)



(1)

The approach undertaken herein is to extract the normal network flow ( si ) and use it to predict anomalies, as shown in Figure 1.



The Wiener filter removes the noise from the signal and outputs the estimate of the normal traffic flow. To achieve a balanced estimate, we adjust the window size and associated coefficients in the Wiener filter the ARMA predictor, and thereby achieve better results. The feedback control channel shown by a dashed-line in Figure 1 lets the algorithm adapt to a changing network signal waveform. This is made in accordance with the adaptable Wiener filter implementation method proposed in [16] as

sˆi(n) = msˆi (n) +

σ s2ˆi (n) ⋅ ( xi (n) − msˆi (n)) σ s2ˆi (n) + σ η2 (n)

where sˆi (n) is the Wiener filter output in discrete time domain, msˆi (n) is the mean value of the normal flow, and

σ s2ˆi , σ η2 are the respective variances of the measured traffic

and anomaly computed in a window of size M . They are calculated as n+ M

msˆi (n) =

∑ x (n) i



(3)

k=n−M

σˆ 2 (n) − σˆ η2 (n), if (σˆ 2x (n) − σˆ η2 (n)) > 0 i σˆ s2ˆi (n) =  xi (4) 0, otherwise  n+m 1 2 σ xi (n) = (xi (k) − msˆi (k))2 (5) ∑ (2 ⋅ M +1) k=n−M The estimated signal sˆi (n) is then applied to the ARMA unit, which estimates the next value sˆi (n +1) of sˆi (n) . This is similar to what has been done in [12] with the exception that, they predict only network traffic but ignore any anomaly that may exist€at the time of measurement. Their prediction is based on the assumptions € of stationary € Gaussian white noise with unit variance and the Gaussian nature of the network traffic without any empirical justification. On the other hand, in this research we have no restriction on network flow, nor do we have restriction on noise. Noise is considered to be the combination of network anomalies and the traditional white noise as described in [12]. ARMA starts the process of estimation by calculating the auto-correlation function for sˆi (n) as in equation (6). The auto-correlation function is then used in the 3rd order predictor as in equation (7).

€ Figure 1. Block diagram of adaptive digital anomaly predictor (ADAP)

(2)

 Rsˆ (n +1)   Rsˆ (n + 0)Rsˆ (n +1)Rsˆ (n + 2) α1  i i  i   i   R (n + 2) = R (n −1) R (n + 0)R (n +1)  ⋅ α 2  ˆ ˆ ˆ ˆ s s s s  i   i i i  Rsˆ (n + 2)  Rsˆ (n − 2) Rsˆ (n −1) Rsˆ (n + 0) α 3  i i i i

(6)

where ε represents the prediction or estimation error associated€ with the autoregressive moving average unit. € (17) enable us to come up with a Equations (14) through

€ sˆi (n +1) = α1 ⋅ sˆi (n) + α 2 ⋅ sˆi (n −1) + α 3 ⋅ sˆi (n − 2)





and (7), the value of sˆi (n +1) is predicted by the ARMA and compared to measured signal xi (n +1) as the output, q(n +1) , of the ADAP function. Hence, we have



(7)

α i represents the predictor coefficient and Rsˆi (i) denotes the auto-correlation function value at i . Pη (ω )€is the Here,

power spectrum (i.e., the Fourier transform) of the autocorrelation of the network anomalous signal η(n)€, and € Psˆ (ω ) is the power spectrum (i.e., the Fourier € transform) of i



€ the auto-correlation of the normal network flow sˆi (n) €. We measure these stochastic signal signatures is carried out using €





Rη (n) =

1 ∑η(k) ⋅ η∗ (k − n) Μ n=−∞

1 ∞ ∗ Rsˆi (n) = ∑ sˆi (k) ⋅ sˆi (k − n) Μ n=−∞



∑ R (n) ⋅e

− jωn

η

n=−∞



∑ Rsˆi (n) ⋅ e− jωn

(9)

(11)

periodogram approach described by [16] and summarized by

Psˆi (ω ) =



2 1 ˆ Si (ω ) Μ

€ (12)

(13)

where equations (12) and (13) represent the estimated power spectrums using periodograms of the noise η(n) and the normal traffic signal sˆi (n) , respectively, and M is the number of measurements.



(16)

q(n +1) = ε − η(n +1)

(17)



If sˆ i (n +1) = x i (n +1) , then q(n +1) = 0 ; hence, no anomaly



If sˆ i (n +1) = s i (n +1) , then q(n +1) = η(n +1) ; hence, anomaly predicted€



If sˆ i (n +1) ≠ s i (n +1) , then q(n +1) = η(n +1) + ε ; hence, anomaly plus € error predicted



€ of the power spectrum is Since direct measurement € it using the € complex and costly, it is desirable to predict € 1 2 Pη (ω ) = N(ω ) Μ

q(n +1) = [ si (n +1) − sˆi (n +1)] + η(n +1)



where equations (8) and (9) represent the estimated autocorrelation functions of the noise signal η(n) and the normal traffic signal sˆi (n) , respectively, and in a window of size M , equations (10) and (11) represent the power spectrum of the noise signal η(n) and normal traffic signal sˆi (n) , respectively.



(15)

decision predicate expressed as

(8)

n=−∞



q(n +1) = si (n +1) + η(n +1) − sˆi (n +1)





Psˆi (ω ) =

(14)

for identifying any anomaly. This structured predicate extends the work of [12] by directly predicting near-real-time network (10) € attacks. €



Pη (ω ) =



q(n +1) = xi (n +1) − sˆi (n +1)

Notice that the auto-correlation function is even; i.e.,

€ maximum at the origin, Rsˆi (n + i) = Rsˆi (n − i) , and has its

that is, R€ sˆi (0) ≥ Rsˆi (n) for all values of n. Using equations (6)

B. Adaptive digital anomaly predictor using multiple features We now extend the single ADAP functionality, described above, to incorporate information from multiple features. It is important to consider a diverse feature set in order to detect signatures of complex anomalies. Malicious activities are specifically designed to spread out their effect over multiple features in order to avoid detection by narrowly focused detectors. A cyber-attack, for instance, may try to reduce its visibility by using a common, well known port and cause only a slight anomalous increase and no alarm. However, even the slightest increase in a single feature, when combined with anomalous activity in other features, can have a synergistic effect which makes possible the prediction of that anomaly. By using a multiple feature space, this research is able to tap the additive power of the rich network traffic feature set. As a way of dealing complex anomalies without loosing T generality, we use the feature set X = [x1 , x2 , x 3 , x 4 ] ,where x1 is ‘average port, x2 is ‘high ports’, x3 is ‘server ports’, and x4 is ‘peered ports’, as a representative experiment. Since these individual characteristics may be correlated, this feature space can lead to a wrong conclusion, unless the underlying features € are first uncorrelated and the resultant feature space has a Euclidean metric. This is achieved using well-known transformation methods such as Karhunen-Loeve (PCA or discrete Hotelling) [13]. Let C XX be the cross-correlation matrix for







 X given by



C XX

C x1 x1 C x1 x2 C x1 x3 C x1 x4      T C x2 x1 C x2 x2 C x2 x3 C x2 x4  = Ε{ X ⋅ X } = C x3 x1 C x3 x2 C x3 x3 C x3 x4    C x4 x1 C x4 x2 C x4 x3 C x4 x4 

(18)

where the diagonal elements are the auto-correlation functions of the features and off-diagonals are cross-correlations of respective feature pairs. Notice that C XX is a symmetric square matrix leading to a 4th order characteristic equation and results in 4 respective eigenvectors. By normalizing these eigenvectors we generate a 4 dimensional space in which the selected features are uncorrelated. Hence, the ADAP functon can filter features individually by€adaptively computing the mean and variances as the predictor output starts to deviate significantly from actual measured values. Here we refer to the uncorrelated xi’s as yi, and the probability density function (pdf) of the   uncorrelated version of X , represented by Y , is given by



pY = p y1 ⋅ p y2 ⋅ p y ⋅ p y4

(19)

3

By emperical data analysis, we have verified that each feature € € in X is normally distributed. Hence, we write



4

pY = ∏



i=1

− 1 ⋅e 2 πσ i

where mean mi and





( yi −mi ) 2 2σ i2

(20)

€ σ i2 are the mean and variance of yi .

For the ease of implementation, we only consider auto€ correlation for the diagonal elements of the correlation matrix of (18). As a result, the anomaly prediction decision is made in the [Rx € a weighted majority, Rx2 x 2 , Rx3 x 3 , Rx4 x 4 ] space using € 1 x1 voting scheme as shown in Figure 2. The ADAP function processes each single feature in parallel and provides a prediction output. We present the overall result A(n +1) , as a linear combination of the individual channels with weights corresponding to the maximum value of each feature’s autocorrelation function. Thus

€ ⋅ Rx2 (0) A(n +1) = q1 (n +1) ⋅ Rx1 (0) + q2 (n +1) +q3 (n +1) ⋅ Rx3 (0) + q4 (n +1) ⋅ Rx 4 (0)

(21)

If A(n +1) exceeds a predetermined empirical anomaly threshold, then the activity represented by the feature set X is deemed an anomaly. The system can then be directed to respond at the time instant of n+1.







Figure 2. Detecting network anomalies with ADAP and weighted majority voting.

The anomaly detector’s effectiveness is driven by a set of parameters chosen by the researchers. By fine-tuning the set of features, the Wiener filter and auto-correlation window sizes and the majority voting thresholds, we intend to find an optimal feature space in which characteristics are uncorrelated and yet still contain all the information associated with network traffic and attacks. C. Normal density approximation for network traffic There has been considerable research in the statistical analysis of network traffic data. However, previous work has assumed the data to be normally distributed without supporting this assumption [10]. In our observations, the Gaussian nature of the traffic is determined by the bell-shaped frequency histogram of the features used in this study. Additionally, we examine the periodogram graph for similarity with graphs of generated Gaussian data, with a matching mean and variance. The periodogram is also an indicator of correlation. [17,18] In addition to visual verification, the mean square error (MSE) between the measured and generated Gaussian shaped density is computed using

(data(i) − norm(i))2 %MSE = (mean(data))2

(22)

where data(i) denotes the normalized histogram of the feature xi , norm(i) is the value of the respective normal density, and mean(data) is the mean value of the histograms all the features used. We have experimentally shown that the random Gaussian nature of these features does not adversely affect their ability to discriminate network anomalies. In fact, the feature, which shows the lowest MSE, as shown in Table I, is “average port” which turns out to be the most discerning data feature. The average port is an average of all the port numbers seen on the network and should not change drastically under normal conditions. On the other hand, “peer factor” is the feature with the highest MSE in terms of Gaussian density. This is a measurement of connections to the same port on both the source and destination side, which is very unlikely to happen at random. This indicates a specific a priori agreement between the two peer computers. The probability of two computers picking the same port at random is 1/(65535)2 =2.33x10-10. TABLE I.

MSE SORTED FEATURE SET USED FOR NORMAL DENSITY APPROXIMATION AND ANOMALY DETECTION.

% MSE Measured Feature Description 0.014403 Average port Average port number as indicator of usage 0.055449 High-ports Percentage of port numbers > 10000 0.105316 Total ports Number of ports seen 0.105888 Flow records Count of flow records 0.119697 Total bits Bits per second load on network 0.137958 Destination bits Destination bits per second load 0.148073 Source bits Source bits per second load

III.

RESULTS AND DISCUSSION

A. Experimental Test Bed Raw network flow data is captured at the Internet border of Ohio University and analyzed using the network monitoring tool, Argus [7] which produces a stream of network flow connections records. A connection is loosely defined as a bidirectional series of packets identified by a protocol type, source IP, source port, destination IP, and destination port. In order to process data as a time series, we collect all the records for the same time period into a single record representing one second in time. Cumulative statistics are gathered using a C++ program and the data are analyzed in MatLab functions based on equations 9.44 - 9.46 in reference [16]. Multiple experiments are run on various feature sets and parameter settings, in order to identify a system configuration which has the most discriminating power B. Results This section describes the results of the normal density approximation study and the prediction algorithm results. Figure 3 shows the periodogram, correlation results, and histogram of a selected feature set. Similarity between the measured feature values and generated normal Gaussian data for all features are clearly visible except for the attribute x 4 . The significant variation in Px4 may be caused by the highly stochastic nature of the traffic.



0

2

0

7

Autocorrelation

2e+13

2e+13

0e+00 −1

0e+00 1 −1

0

0

x 10

1

1

2

3 4

x 10

0

0e+00 2e+03 1e+03 0e+00

0

1

2

3 4

x 10

5

0

5

4e+11 2e+11 0e+00 −1

4

x 10 (D)

0

1

4e+11 2e+11 0e+00 −1

€ 0

5

1 5

x 10

x 10

(E) 2e+03

2

5

x 10

x 10 (F)

(B) 2e+06 1e+06 5e+05 −4 −2

(C)

5

(E)

0

2e+06 1e+06 5e+05 −5

x 10 (D)

5

Histogram

4

(A)

7

x 10 (C)

3e+03 2e+03 1e+03 0e+00

2

Periodogram



2e+08 2e+08 1e+08 5e+07 4 −4 −2

Autocorrelation

2e+08 2e+08 1e+08 5e+07 −4 −2

High Ports (B)

Histogram

Periodogram

Average Port (A)

(F) 2e+03

0e+00 0e+00 −5000 0 500010000 −5000 0 500010000

Peered Ports (B)

4e+08 3e+08 2e+08 1e+08 1 −1

0

1

8

Autocorrelation

1e+14 5e+13 0e+00 −1

1e+14 5e+13 0e+00 1 −1

0

2e+04 1e+04 0e+00 −5

0

5

4e+12 2e+12 0e+00 −1

0

5

5

(D) 2e+12 0e+00 0e+00 1 −1

0

5

0

1

5

5

x 10

10

x 10

(E) 1e+05 5e+04 0e+00 −1

0

(F) 4e+04 2e+04 0e+00 2 −1

1

5

x 10

2 6

x 10

(C)

x 10 (F)

1e+04 5e+03 0e+00 10 −5

0

6

x 10

1

5

x 10 (E)

(B) 1e+07 5e+06 0e+00 2 −2

0

x 10 (D)

0

(A) 1e+07 5e+06 0e+00 −2

8

x 10 (C)

Autocorrelation

0

Periodogram

(A) 4e+08 3e+08 2e+08 1e+08 −1

Histogram

Periodogram

Server Factor

Histogram

0.183783 Packets per second Total packets per second 0.194217 Destination Destination packets per packets second 0.229815 Server factor Measure of typical port usage 0.257600 Mid-range ports Percentage of ports > 1024 and < 10000 0.284241 Low-range ports Percentage of ports < 1025 0.301142 Source packets Source packets per second 2.109031 Total bytes Total bytes per second 2.244572 Destination bytes Destination bytes per second 4.569049 Source bytes Source bytes per second 39.264246 Peer factor Measure of same port usage

0

1

5

x 10

2 5

x 10

x 10

Figure 3. Periodograms ( Px ,Px ,Px ,Px ), auto-correlation ( R x ,R x ,R x ,R x ), 1

2

3

4

1

2

3

4

and histograms, for measured feature set and normal PDF approximation.

Figure 4 €includes the plots of the €auto-correlation and cross-correlation functions for the selected feature set. While (x1,x2) and (x1,x3) possess high correlation, the remaining pairs do not. C x ix 2

Cx ix 1

13

Cx 1x i

5

12

x 10

4

10

2 0 −1

0

1

0 −1

0

5

Cx 2x i

4

4

0

1

0 −1

5

10

0

1

0

5

5

2

−5 −1

1

1 5

x 10

−2 −1

5

x 10

1 5

x 10

12

5

x 10

0

0 −1

0

1

−5 −1

0

5

−5 −1

1 5

x 10

x 10

12

x 10

4

0 1

0

5

5

0

−2 −1

x 10

12

x 10

0 0

1

x 10

x 10

11

0

0

5

x 10

12

x 10

2

x 10

0

0 −1

10

1 5

x 10

11

5

0 −1

0

5

13

x 10

5

0 −1

C x 4x i

1

−5 −1

x 10

5

10

1

x 10

x 10

12

5

0 12

10

0

x 10

0

0 −1

5

x 10

13

x 10

12

5

5

x 10

2

0 −1

C x 3x i

1 x 10

11

2

x 10

5

x 10

12

x 10

C x ix 4

Cx ix 3

13

x 10

x 10

2 0

1 5

x 10

0 −1

0

1 5

x 10

Figure 4. Cross correlation plots for features x1, x2, x3, and x4 selected in this particular implementation.

Figure 5 depicts 100 seconds of collected real-time data and illustrates the prediction results at different stages in the ADAP algorithm. In Subplot (A) and (B) a dashed line shows the boundary of the Wiener and auto-correlation windows respectively. Part (C) represents the ARMA predictor signal sˆi (n +1), and (D) shows the difference between measured values and ARMA output. The solid vertical line indicates a predicted anomaly at time n+1 (263 seconds). Notice that the peak in predictor measurements (E) and (F) occurs just before the maximum value of the actual anomaly in (A). This supports our conclusion that the algorithm predicts network anomalies. In figure 6 we demonstrate the affect of varying parameters such as window size (shaded area). The solid vertical lines indicate a location in time of predicted anomalies. Part (A) corresponds to smaller sized windows while part (B) represents a larger window size. The system predicts anomalies of various magnitude by changing parameters. This robust performance is due to the fact that the ADAP unit has a feedback control, which allows it to adjust to the changing signal waveform.

measurements are used to confirm the anomaly prediction as part of the majority-voting scheme. Experiments reveal that our method is highly effective and robust in predicting anomalies in network traffic flow. The proposed system enables a network engineer to develop a defense mechanism, which filters hazards and other cyber attacks. In this particular study, the broader features are targeted due to their independence from specific IP addresses or ranges rather than more specific features. In turn, this allows us to predict an anomaly without a priori knowledge of any specific activity. Additional future work includes the use of an inter-feature cross correlation matrix, expanding the algorithm to handle more attributes, adding additional prediction mechanisms, and/or exploiting the optimal feature space in which a cyber attack can be estimated more easily than the original feature space. REFERENCES [1]

[2]

[3] Figure 5. Snapshots of (A) measured signal x i (n) , (B) Wiener output sˆi (n) , (C) ARMA prediction sˆi (n + 1) , (D) difference between Wiener input and output x i (n) − sˆi (n) , (E) difference between measured signal and ARMA

€ and ARMA output x i (n) − sˆi (n + 1) , and (F) €difference between Wiener € outputs sˆi (n) − sˆi (n + 1) . € €

[4]

[5]

[6] [7]



[8] [9] [10] [11]

[12] [13] Figure 6. Experimental results of a sample run on x1, x2, x3, and, x4, respectively, showing the effect of changing parameters (i.e., window sizes, thresholds, and weights).

IV. CONCLUSIONS This research presents a method of anomaly detection based on Wiener filtering of noise and ARMA modeling of network flow data. Noise and traffic signal statistics are dynamically calculated using network-monitoring metrics for traffic features such as average port, high port, server ports, and peered port. The underlying approach has been tested on near-real-time Internet traffic in the wide-area network (WAN) of Ohio University. The port usage has been determined to be the most usefull measure in the estimation process. Other port

[14] [15]

[16] [17]

[18]

R. Kwitt and U. Hofmann, "Unsupervised anomaly detection in network traffic by means of robust PCA," Computing in the Global Information Technology, ICCGI 2007, March 2007, pp. 37-37. G. Shen, et al. “Anomaly detection based on aggregated network behavior metrics,” Proc. of Networks and Mobile Computing, Shanghai, China, Sept. 21-25, 2007, pp. 2210-2213. A. Feldman, et al., “Data networks as cascades: Investigating the multifractal nature of Internet WAN traffic,” Proc. of ACM/SIGCOMM 98, vol. 28, pp. 42–55, 1998. W. Yurcik and Y. Li, "Internet security visualization case study: Instrumenting a network for NetFlow security visualization tools, " in Proc. of ACSAC 05, 2005. [D. Plonka, “A network traffic flow reporting and visualization tool,” in Proc. of the 14th USENIX Conference on System Administration, New Orleans, 2000, pp. 305-318. [K. Cho, et al., “An aggregation technique for traffic monitoring,” in Proc. of SAINT, 2002, pp. 74-81. [C. Bullard, “Argus record format,” June 2005 Y. Gong, “Detecting worms and abnormal activities with NetFlows,” August 2004; R. Pang, et al., “Characteristics of internet background radiation,” in Proc. IMC’04, Oct. 25-27, 2004, Taormina, Sicily, Italy. S. Jin, et al., “Network intrusion detection in covariance feature space,” Pattern Recognition 40 (2007), 2185-2197. A. Karasaridis, et al., “Wide-scale botnet detection and characterization,” Proc. of the first conference on First Workshop on Hot Topics in Understanding Botnet, Cambridge, MA, 2007, pp. 7 – 7. A. Sang and S. Li, “A predictability analysis of network traffic,” in Proc. of INFOCAM 2000, vol. 1, pp. 342-351. S. Theodoridis and K. Koutroumbas, Pattern Recognition, 3rd ed., Academic Press, 2006. B. Porat, Digital Processing of Random Signals Theory & Methods, Prentice Hall, 1994. P. Scalart and J.V. Filho, “Speech enhancement based on a priori signal to noise estimation,” Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-96), Vol. 2, 7-10 May, 1996, pp. 629– 632. J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, 1990. W. L. Crum, "Tests for Serial Correlation in Regression Analysis Based on the Periodogram of Least-Squares Residuals," Journal of the American Statistical Association, Vol. 18, No. 143. (Sep., 1923), pp. 889-899. J. Durbin, “The Resemblance Between the Ordinate of the Periodogram and the Correlation Coefficient,” Biometrika, Vol. 56, No. 1. (Mar., 1969), pp. 1-15.