Traffic Self-Similarity - Semantic Scholar

Traffic Self-Similarity Adrian Popescu Department of Telecommunications and Signal Processing University of Karlskrona/Ronneby 371 79 Karlskrona, Sweden

Contents A Traffic Self-Similarity A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Traditional and non-traditional truths . . . . . . . A.1.2 Fractals and self-similarity . . . . . . . . . . . . . A.1.3 Chaos and randomness . . . . . . . . . . . . . . . A.2 Traffic Self-Similarity . . . . . . . . . . . . . . . . . . . . A.2.1 Stationary random processes . . . . . . . . . . . . A.2.2 Self-similar processes: continuous-time definition . A.2.3 Self-similar processes: discrete-time definition . . A.2.4 Properties of fractal processes . . . . . . . . . . . A.2.5 Traffic measurements . . . . . . . . . . . . . . . . A.3 Modeling Self-Similar Traffic . . . . . . . . . . . . . . . . A.3.1 Single-source models . . . . . . . . . . . . . . . . A.3.2 Aggregate traffic models . . . . . . . . . . . . . . A.3.3 Generation of synthetic self-similar traffic traces . A.4 Revealing Self-Similarity in Teletraffic . . . . . . . . . . . A.4.1 Rescaled range analysis . . . . . . . . . . . . . . A.4.2 Variance-time analysis . . . . . . . . . . . . . . . A.4.3 Periodogram method . . . . . . . . . . . . . . . . A.4.4 Whittle estimator . . . . . . . . . . . . . . . . . . A.4.5 Wavelet method . . . . . . . . . . . . . . . . . . . A.5 Origins of Self-Similarity in Teletraffic . . . . . . . . . . . A.6 Performance Impacts . . . . . . . . . . . . . . . . . . . . A.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

3 4 4 5 5 6 6 8 8 10 14 16 17 18 21 23 23 24 25 25 26 26 29 31

Appendix A

Traffic Self-Similarity

The unifying concept underlying fractals, chaos and power laws is selfsimilarity. Self-similarity, or invariance against changes in scale or size, is an attribute of many laws of nature and innumerable phenomena in the world around us. Self-similarity is, in fact, one of the decisive symmetries that shapes our universe and our efforts to comprehend it.

Manfred Schroeder, 1991

This document is a review of basic concepts in traffic self-similarity. It is about definitions of self-similarity, key properties, measurements, models, generation of synthetic traces, revealing of specific properties in self-similar teletraffic, origins as well as performance impacts. The material is standard and can be used for a first course in Traffic Self-Similarity at the senior/graduate level.

3

A.1 Introduction Statistical analysis of high-resolution traffic measurements from a wide range of working packet networks (e.g., Ethernet LANs, WANs, CCSN/SS7, ISDN and VBR video over ATM) have convincingly shown the presence of fractal or self-similar properties in both local area and wide area traffic traces [3, 6, 9, 11, 25, 33, 41]. That means that similar statistical patterns may occur over different time scales that can vary by many orders of magnitude (i.e., ranging from milliseconds to minutes and even hours). This discovery calls to question some of the basic assumptions made by earlier research done in control, engineering, and operations of broadband integrated systems. The fact that network traffic is inherently fractal or long-range dependent (LRD) and many studies assume traffic to be short-range dependent (SRD) leads one to question the extent to which the results of these studies are applicable in practice. At the time being, there is mounting evidence that LRD is of fundamental importance for a number of teletraffic engineering problems, such as traffic measurements [40], queueing behaviour and buffer sizing [12, 13, 28], admission control [10] and congestion control [36]. Unlike traditional packet traffic models, which give rise to exponential tail behavior in queue size distributions and typically result in overly optimistic performance predictions and inadequate resource allocations, fractal traffic models predict hyperbolic or Weibull (stretched exponential) queue size distributions and could therefore affect control and management of present and future packet networks within an integrated framework of end-to-end Quality of Service (QoS) provision. This also indicates that increasing link capacity could be more effective in improving performance than increasing buffer size. Although similar processes have been observed and analysed in a number of other areas like, for instance, hydrology, biophysics, financial economics [39], the work on fractal dynamics of network traffic behaviour is relatively new. While most of the work done in science and engineering has almost exclusively focused on statistical and practical features of fractal models (e.g., data analysis, mathematical modeling), their engineering impacts on performance and analysis have not yet received an adequate interest. This is mainly because of the difficulties related to analysis and the ability to use these models in control. Self-similarity implies that a specific correlation structure is retained over a wide range of time scales, albeit it may come in different forms. In order to be able to do control on such processes, one needs first to explain and to validate on physical grounds the causal mechanisms that could be responsible for generating self-similarity in a realistic network environment. Furthermore, understanding the impacts of such processes is at least as important as the understanding of their physical origins.

A.1.1 Traditional and non-traditional truths Self-similarity is a concept related to fractals and chaos theory. This chapter starts therefore with a description of these phenomena. The recent history of mathematics and natural sciences, to capture features of natural phenomena, testifies to the revision of several important traditional truths [39]. Features of natural, complex phenomena include, among others, out-of-equilibrium dynamics, unpredictability, non-linearity, feedback loops, phase transitions, attractors, adaptation, and inductive behavious. For instance, one of the first traditional truths (20th century) takes the form of ”quantitativeness”, i.e., most of physical theories are and should be quantitative. However, later events, like for instance the upcoming of catastrophe theory (which deal4

s with forms of things and not their magnitudes), has turned this truth into a nontraditional one. Qualitative aspects have been found to be as important, and sometimes even more important, than the quantitative ones. Accordingly, many natural phenomena can no longer be represented by means of analytic functions (traditional truth) but they are usually singular in character and therefore difficulties in representation by analytic functions (fractals). Their evolution, although derivable from dynamical equations, can no longer be predicted from motion equations, as they are not necessarily predictable (chaos). Furthermore, another important traditional truth, according to which physical systems can be characterized by so-called fundamental scales (e.g., in time and in length), has turned into a non-traditional one, namely the scaling principle. That means, natural phenomena (whose central moments diverge such as inverse power law) do not necessarily have fundamental scales to describe their behaviour but, on the contrary, they may be described by scaling relations. For instance, the length of a fractal curve depends on how it is measured, i.e., the scale of the measuring instrument. The traditional principle of superposition has been also shown to be violated by most natural phenomena, because of diverse reasons like perturbations, non-linearities, etc. The consequence is that understanding a process as a whole cannot be accomplished only by the knowledge of the decomposed elements of its behaviour. The evolution of natural phenomena may become even more complex than that according to the superposition principle.

A.1.2 Fractals and self-similarity A self-similar phenomenon represents a process displaying structural similarities across a wide range of scales of a specific dimension. In other words, the reference structure is repeating itself over a wide range of scales of diverse dimensions (geometrical, or statistical, or temporal), and the statistics of the process do not change with the change. However, these properties do not hold indefinitely for real phenomena and, at some point, this structure breaks down. Self-similarity can therefore be associated with ”fractals”, which are objects with unchanged appearances over different scales. The concept of fractals includes, besides the geometrical meaning, statistics as well as dynamics. That means, there are fractal processes of diverse dimensions, e.g., geometrical, statistical and dynamical. Examples of geometrical fractal processes are the Cantor set, Sierpinski triangle, Koch curve, etc [34]. In the case of statistical fractals it is the probability density that repeats itself on every scale, like for instance as found in economics (Pareto’s law), in linguistics (Zipf’s law) and in sociology (Lotka’s law) [39]. On the other hand, a dynamical fractal is generated by a low-dimensional dynamical system with chaotic solutions, like in biology (Willis law), in medicine (cardiac spectrum, mammalian lung) and in teletraffic modeling (chaotic deterministic maps).

A.1.3 Chaos and randomness The concepts of ”chaos” and ”order” have been long time viewed as antagonistic, with the consequence of development of specific investigation methods. Diverse ”natural” laws (e.g., Newton’s law, Kepler’s law) represent good examples of ”ordered” nature, whereas chaos belongs to a distinct part of the nature where simple laws are no 5

more valid. That means, chaos does not necessarily mean a higher degree of complexity, but a condition when nature fails to obey laws with different degrees of complexities [34]. Furthermore, things may get even more challenging because of the observation that natural systems seem to have no difficulties in switching from one state to another. A chaotic system represents a system with an extreme sensitive dependence on initial conditions. Very small changes in (diverse) initial conditions (e.g., specific parameters, background noise, innacuracy of equipment) may drastically change the long-term behaviour of the system. For instance, with an initial parameter of, say, 5.0, the final result of a chaotic system may by entirely different from that of the same system with an initial parameter value of, say, 5.0000001. Chaos, as a meteorological phenomenon, was initially discovered by a meteorologist, Edward Lorenz, in 1960 [34]. Based on numerous studies, he stated that it is impossible to forecast the weather accurately. However, his studies led finally to other issues of what eventually got to be known as the chaos theory (the Lorenz attractor, interplay between randomness and deterministic fractals, etc) [34]. Other examples of chaotic systems have been subsequently found in ecology, prediction of biological populations (the well-known bifurcation diagram), human heart, computer sciences, fractal images (Mandelbrot set, Sierpinski triangle, Koch curve), etc. Some of the most fundamental characteristics of these systems are the fact that randomness creates finally deterministic shapes/trajectories as well as the existence of (diverse) so-called invariants, i.e., different constants that are universal valid (e.g., the Feigenbaum constant, diverse invariants that characterize the web server workloads, etc). The consequence therefore is that nothing in nature seems to be random, and a process that we experience as being random is perhaps so only because of the incompletness of our knowledge.

A.2 Traffic Self-Similarity Processes with LRD are reviewed in this chapter with emphasis on the connexions with second-order time series analysis. Specific properties are outlined and traffic measurements from a number of experiments presented.

A.2.1 Stationary random processes In order to review the salient properties of LRD processes several definitions are introduced. Assume a time series X = fXn ; n 2 Z + g that represents the number Xn of packets arrived during the n-th time interval of duration Ti . Accordingly Xn = N [nTi ℄

N [(n

1)Ti℄

(A.1)

The number of packet arrivals up to time t represents a so-called counting process and it is calculated with Zt dN (x) (A.2) N (t ) = 0

where dN (t ) is a point process that represents packet arrivals. At each point in time t, Xn is a random variable and the complete form of Xn as t varies over time is a random process. Some of the main statistics of such a process are: 6

mean µ = E [Xn ℄

(A.3)

variance σ2 = Var[Xn ℄ = E [(Xn

µ )2 ℄

autocovariance function Cov(Xn ; Xn+k ) = E [(Xn

µ)℄

(A.5)

Cov(Xn ; Xn+k ) σ2

(A.6)

index of dispersion for counts (IDC)

F (T ) =

µ)(Xn+k

(normalized) autocorrelation function (ACF)

R(k; Ti ) =

(A.4)

Var[N (T )℄ E [N (T )℄

(A.7)

power spectral density (PSD) ∞

S(w) =

∑

R(k)e

j2kw

(A.8)

k= ∞

where w = 2π f represents the frequency in radians, and j = component of power spectrum is

p

1. Also, the dc

∞

S(0) =

∑

R(k)

(A.9)

k= ∞

A random process Xn is considered to be stationary if the statistical properties do not change over time. That means that the process lacks a characteristic scale, both quantitatively and qualitatively. The first order statistics are the same in both shortterm and in long-term scales. Furthermore, a random process is said to be wide-sense stationary (WSS) or stationary up to order 2 if the first and the second order statistics (i.e., µ and σ2 ) do not change over time. That means, for wide-sense stationary processes the mean µ is constant, the variance σ2 is finite and the autocorrelation function R(k; Ti ) depends only on k (i.e., time difference). Also, in the case that X1 ; : : : ; Xn are not correlated, then R(k; Ti ) = 0 for n 6= n + k, i.e., for k 6= 0. 7

A.2.2 Self-similar processes: continuous-time definition A stochastic process is called fractal when a number of (relevant) statistics exhibit scalings with related scaling exponents. Since scaling leads mathematically to powerlaw relationships in the scaled quantities (i.e., power-law statistics), the conclusion therefore is that the traffic shows fractal properties when several estimated statistics (e.g., ACF, IDC, PSD) exhibit power-law behavior over a wide range of time or frequency scales. A continuous-time stochastic process X (t ) is considered to be statistical self-similar with parameter H (0:5 H 1:0) if, for any real, positive a, the processes X (t ) and aH X (at ) have identical finite-dimensional distributions (i.e., same statistical properties) for all positive integers n

fX (t1 )

;

X (t2 ); : : : ; X (tn )g

D fa

H

X (at1 ); a

H

X (at2 ); : : : ; a

H

X (atn )g

(A.10)

The term means ”asymptotically equal to” in the sense of distributions. Practically, statistical self-similarity means that the following conditions are valid [35]: D

mean E [X (t )℄ =

E [X (at )℄ aH

variance Var[X (t )℄ =

(A.11)

Var[X (at )℄ a2H

(A.12)

autocorrelation function R(t ; τ) =

R(at; aτ) a2H

(A.13)

The so-called Hurst parameter (H) indicates the ”degree” of self-similarity, i.e., the degree of persistence of the statistical phenomenon. A value of H = 0:5 indicates the lack of self-similarity, whereas large values for H (close to 1.0) indicates a large degree of self-similarity or long-range dependence (LRD) in the process. That means that, for a LRD process, an increasing (or decreasing) trend in the past of the process implies, with a large probability, an increasing (or decreasing) trend in the future.

A.2.3 Self-similar processes: discrete-time definition Consider a time series X = fXn ; n 2 Z + g and define another time series (m-aggregated) (m) X (m) = fXn ; n 2 Z + g by averaging the original time series over non-overlapping, adjacent blocks of size m: (m)

Xn

=

nm 1 ∑ m i=nm (m

Xi

(A.14)

1)

8

X (1) represents in this case the highest resolution that is possible for the process. Lower resolution evolutions of the process X (m) can be obtained by m-aggregating the Xn process, like for instance (4)

Xn

=

X4n

3 + X4n 2 + X4n 1 + X4n

4

(A.15)

The process X (m) represents therefore a less-detailed copy of the process X (1) . In the case that statistical properties (e.g., mean, variance) are preserved with aggregation, then the process is of self-similar nature. (m) The process Xn can be also viewed as a time average of the process Xn . For a stationary ergodic process Xn the time average should equal the ensemble average X¯

= E [Xn ℄ =

∑ni=1 Xi n

(A.16)

and the variance of the sample mean should equal the variance of the underlying random variable divided by the number of samples σ2 Var(X¯ ) = n

(A.17)

However, in the case of a self-similar process aggregating by a factor of m is not exactly the same as considering a sample mean with sample size m because of the persistence of statistical properties across time scales. That means that the variance goes to zero more slowly than in the case of a stationary ergodic process, i.e., [4] Var(X¯ )

σ2 [1 + δn(R)℄ n

(A.18)

where δn (R) =

∑i6= j R(i; j) n

(A.19)

and the autocorrelation function between Xi and X j R(i; j) =

Cov(Xi ; X j ) σ2

(A.20)

There are two classes of self-similar processes, namely exactly self-similar processes and asymptotically self-similar processes. A process X is said to be exactly self-similar with parameter β (0 < β < 1) if, for m 2 Z + the following conditions are fulfilled [35]:

variance Var[X (m) ℄ =

Var[X ℄ mβ

(A.21)

autocorrelation R(k; X (m) ) = R(k; X )

(A.22) 9

The parameter β is related to the Hurst parameter by β = 2(1

H)

(A.23)

It is observed that for stationary ergodic processes β = 1 and the variance decays to zero. On the contrary, for exactly self-similar processes the variance decays more slowly. Another class of self-similar processes are the so-called asymptotically self-similar processes. A process X is said to be asymptotically self-similar if, for k large enough

variance Var[X (m) ℄ =

Var[X ℄ mβ

(A.24)

autocorrelation function R(k; X (m) ) as m

! R(k

;

X)

(A.25)

!∞

It is observed that, for both classes of self-similar processes, the variance of X (m) decreases more slowly than m1 as m ! ∞. This is to compare with the case of stochastic processes where the variance decreases proportional to m1 and approaches zero as m ! ∞ (consistent with white noise, i.e., uniform power spectrum). The most striking feature of self-similar processes is however that the autocorrelation function does not degenerate when m ! ∞. This is in contrast to stochastic processes, where the autocorrelation function degenerates as m ! ∞.

A.2.4 Properties of fractal processes Self-similarity and fractals are intimately related [34]. Some of the defining properties of fractal processes are as follows:

Self-similarity Long-range dependence Slowly decaying variance Heavy-tailed distributions 1/f - noise Infinite moments Fractal dimensions 10

Self-similarity As mentioned above, a process is considered to be exactly self-similar if the aggregated processes have first- and second-order statistical properties that are indistinguishable from those of the process itself. On the contrary, an asymptotically self-similar process is a process where the autocorrelation function of the aggregated process approaches that of the process itself for a large degree of aggregation. When a self-similar process X (t ) has the property of stationary increments (i.e., when the finite-dimensional distributions of X (t + τ) X (t ) do not depend upon t), then this process may serve as an underlying process yielding a fractal process. That means that one can construct a (discrete-time) stationary increment process Xn = X [nTi ℄

X [(n

1)Ti℄

(A.26)

with long-range dependence, slowly decaying variance and 1/f - noise properties, which are specific for fractal processes. The most widely-known example of self-similar process is the fractional Brownian motion (fBm) process (with infinitely long-run correlations), which is a generalization of the Brownian motion (Bm) (with independent and uncorrelated intervals) [4]. Long-range dependence Long-range dependence (LRD) reflects the persistence phenomenom in a self-similar process and it is mainly defined in terms of the behavior of the autocovariance (or autocorrelation) function. A process X is said to be LRD if, for fixed Ti , the autocovariance (or autocorrelation) function is non-summable ∞

∑ Cov(k) = ∞

(A.27)

k=0

It is observed that this definition reveals little information about the process X as it depends only on the sum of autocovariance function, and not on the specific form. A better definition is the one that specifies the behavior of the tail of the autocovariance function, namely a LRD process has an autocovariance function that decays hyperbolically Cov(Xn ; Xn+k )

jkj

β

(A.28)

when jkj ! ∞ and 0 < β < 1. The parameter β is defined in eq. A.23. In contrast, short-range dependent (SRD) processes have exponentially decaying autocovariance Cov(Xn ; Xn+k )

ajkj

(A.29)

as jkj ! ∞ and 0 < a < 1. The consequence is that a SRD process has summable autocovariance ∞

∑ Cov(k) =

f inite

0. For 0 < α 1 all moments of X are infinite and generally the n-th moment is infinite for n α. The most famous example of heavytailed distribution is the Pareto distribution [35]. An important observation is that the heavy-tailed distributions property is not a necessary condition for self-similarity. However, the self-similar nature of many traffic traces result directly from heavy-tailed distributions (the so-called Joseph effect). Infinite moments As a consequence of the property of heavy-tailed distributions, moments above a certain order may not exist, i.e., infinite mean, infinite variance, etc. 12

1/f - noise This property is a frequency-domain manifestation of LRD. Specifically, the power spectral density PSD (as defined in eq. A.8) of LRD processes diverges at low frequencies. This is in contrast to SRD processes where the spectral density remains flat at low frequencies. In other words, the PSD of LRD processes shows a power-law behavior near the frequency origin S(w)

jw1jγ

when jwj

(A.35)

! 0 and 0

γ=1

β = 2H

α β

α+1

β x

β and α

(A.39) 0. On the contrary, for x

>

β

f (x) = F (x) = 0

(A.40)

The mean value is E [X ℄ =

αβ α 1

provided that α σ2 =

(A.41) >

1. The variance is

α (α

1 ) 2 (α

(A.42)

2)

provided that α > 2 and β = 1. The parameter α determines the mean and the variance of x in the following way

1 the distribution has infinite mean for 1 α 2 the distribution has finite mean and infinite variance for α 2 the distribution has infinite variance for α

Furthermore, the parameter α is related to Hurst parameter by H=

α

3

(A.43)

2

The Pareto distribution has been also observed in many other areas, like social and physical phenomena [33]. 17

Log-normal distribution If the stochastic variable Y = logX has a normal distribution, then the variable X has a log-normal distribution. This distribution has two parameters, namely the mean µ of ln(x) (µ > 0) and the standard deviation σ of ln(x) (σ > 0), for a range 0 x ∞. The log-normal CDF and pdf functions are

F (x) =

f (x) =

1 1 + er f 2

p1

lnx µ p σ 2

(A.44)

µ )2 2σ2

(lnx

e

σx 2π

(A.45)

The error function er f (y) is defined as er f (y) =

p2π

Z

y

e

t2

dt

(A.46)

0

The mean and the variance of the log-normal distribution are E [X ℄ = eµ+

σ2 2

(A.47) 2

Var[X ℄ = e2µ+σ

eσ

2

1

(A.48)

The log-normal distribution has applications in many areas like for instance in regression modeling and analysis of experimental designs. It is also often used to model errors that are product of effects of a large number of factors.

A.3.2 Aggregate traffic models Some of the models that are most used to modeling the bursty network traffic in LAN and WAN environments are the ON-OFF model, the fractional Gaussian noise model, fractional ARIMA processes, and chaotic deterministic maps. The ON-OFF model This approach, originally suggested by Mandelbrot (for economic settings) [26], is based on renewal-rewards processes and it was rephrased by W. Willinger, M. Taqqu, R. Sherman and D. Wilson in the context of teletraffic [24]. The aggregate traffic is generated by multiplexing a number of independent ONOFF sources (renewal-reward processes), where each source alternates between these states at certain time intervals. The generated traffic is (asymptotically) self-similar if many renewal-rewards processes are superimposed, the rewards are restricted to the values 0 and 1 only, and the inter-renewal times are heavy-tailed (e.g., Pareto distribution with 1 < α < 2). In other words, the superposition of many ON-OFF sources exhibiting the infinite variance syndrome (the so-called Noah effect) results in self-similar aggregate traffic approaching fractional Brownian motion (fBm) when the number of sources grows large (the so-called Joseph effect) with the Hurst parameter H

=

α

3

(A.49)

2 18

The ON-OFF model is simple and very attractive as for instance it provides an intuition into the self-similarity of Ethernet traffic. An workstation attached to an Ethernet network can be in this case considered as a renewal-reward process where the process is in ON-state if the workstation is communicating over the network and in OFF-state otherwise. The Ethernet traffic is the result of the aggregation of renewal-rewards processes. Fractional Brownian motion The fractional Brownian motion (fBm) was originally introduced by Mandelbrot and Van Ness in 1968 as a generalization of the Brownian motion (Bm) [27]. The Bm process B(t ) is a real random function with independent Gaussian increments, and where the expected value of the increment process is zero E [B(t + τ)

B(t )℄ = 0

(A.50)

and the variance is proportional to the time difference Var[B(t + τ)

B(t )℄ = σ2 jτj

(A.51)

This process can be used for different applications like for instance to describe the movement of a specific particle in a liquid. On the other hand, the fBm process BH (t ) is defined as being the moving average of dB(t ), where the past increments of B(t ) are 1 weighted by the kernel (t τ)H 2 [27]. The Weyl’s fractional integral of B(t ) is used to express the fBm process: Z0 h Zt i 1 H 12 H 12 H 12 (t τ) dB(τ)(A.52) ( τ) dB (τ) + (t τ) BH (t ) = 1 Γ H+2 0 ∞ where the function Γ(x) represents the Gamma function. fBm is a continuous zeromean Gaussian self-similar process (BH (t ) : t 0) that is defined for the variable t 0 and with the parameter 0:5 H < 1. All finite-dimensional marginal distributions are Gaussian. The value of the process at time t can be calculated with: BH (t ) = Xt H

(A.53)

where X is a normally distributed random variable with zero mean and variance one. Also, because of self-similarity: BH (at ) = aH BH (t )

(A.54)

For H = 0:5, the ordinary Brownian motion (Bm) process is obtained. The main statistical properties of the fBm process are [4]:

mean: E [BH (t )℄ = 0 variance: Var[BH (t )℄ = Var[Xt H ℄ = t 2H autocorrelation: RBH (t ; τ) = E [BH (t )BH (τ)℄ = stationary increments: Var[BH (t )

BH (τ)℄ = jt

1 2

t 2H + τ2H τj2

jt τj2H

The fBm model is extensively used in analytical and simulation studies regarding performance evaluation of systems driven by self-similar traffic, especially for simulating data traffic at a system level [41]. 19

Fractional Gaussian noise The (stationary) increment process XH of the fractional Brownian motion (fBm) is known as the fractional Gaussian noise (fGn) XH

= (XH (t ) = BH (t + 1)

BH (t ) : t 0)

(A.55)

This process was introduced by Mandelbrot and Van Ness in 1968 [27]. A fractional Gaussian noise (fGn) is a process with a Gaussian distribution, with mean µ, variance σ2 , and where the variance of the mean square increments is proportional to the time differences. Also, the autocorrelation function is: R(t ) =

1 jt + 1j2H 2

jt j2H + jt + 1j2H

!∞ 1)jt j2h 2

(A.56)

In the case of limit t R(t )

H (2H

(A.57)

fGn is exactly self-similar when 0:5 < H < 1. The applicability of fGn models in teletraffic modeling is somehow limited because of the strict autocorrelation structure. Processes with LRD and strong SRD correlation structure (e.g., VBR video) are less suitable for modeling using fGn models. Fractional ARIMA processes (FARIMA) ARIMA (Auto-Regressive Integrated Moving-Average) models were first introduced by Box and Jenkins in 1970 [5]. Because of their simplicity and flexibility, these models are very popular in many areas, especially time series analysis. Fractional ARIMA (FARIMA) models are a natural extension of the ARIMA models. A fractional ARIMA (p, d, q) process is defined [4, 25] as a stochastic process X = (Xk : k = 0; 1; 2; :::) represented by Φ(B)∆d Xk = Θ(B)εk

(A.58)

The polynomials Φ(B) = 1 φ1 B φ2 B2 : : : φ p B p and Θ(B) = 1 θ1 B θ2 B2 ::: θq Bq are polynomials in the backward-shift operator BXk = Xk 1 . The parameters p and q are integers that are associated with these polynomials. The operator ∆ denotes the differencing operator ∆ = 1 B. Furthermore, the operator ∆d denotes the fractional differencing operator defined by ∆d = (1 B)d = ∑k C(d ; k)( B)k . Fractional differencing amounts to let the parameter d assume non-integral values in the binomial series. Finally, the term C(d ; k)( 1)k =

Γ( d + k) Γ( d )Γ(k + 1)

(A.59)

and the process (εk : k = 0; 1; 2; : : :) is a white noise process. It has been shown that, for 0 < d < 0:5, the process FARIMA (p, d, q) is asymptotically self-similar with H = d + 0:5 [7]. Furthermore, another important advantage of this model is related to the capability to capture both SRD and LRD correlation structure in time series modeling and, by this, to model complex traffical structures like in the case of VBR video. The main drawback is related to complexity, but this is often reduced by using the simplified FARIMA (0, d, 0) model. 20

Chaotic Deterministic Maps (CDM) Chaotic maps are low dimensional non-linear systems with a time evolution that can be described by an initial state as well as a set of dynamical laws. The method of using chaotic deterministic maps approach to modeling packet traffic was first suggested by Erramilli and Singh in 1990, when they demonstrated that it is possible to generate complex traffic behavior with low order chaotic systems [14]. Such systems exhibit a chaotic behavior that arises from a property known as Sensitive dependence on Initial Conditions (SIC). If one considers a chaotic map defined as Xn+1 = f (Xn ) and two trajectories with nearly identical initial conditions X0 and X0 + ε, then SIC can be stated mathematically:

j f N (X0 + ε)

f N (X0 )j = εeNλ(x0 )

(A.60)

where f N (X0 ) refers to the map iterated N times and the parameter λ(x0 ) describes the exponential divergence (so-called the Liapunov exponent). For the map to be chaotic, the parameter λ(x0 ) should be positive for ”almost all” X0 . In other words, the fact that the initial conditions can be specified with only limited accuracy is used in this approach by increasing these uncertainties at an exponential rate, and making so the long term behavior unpredictable. Furthermore, another feature is that trajectories converge (in phase or in state phase) to an object known as strange attractor, which has typically a fractal structure. Trajectories that start away from it are finally ”attracted” to this. Some of the best known classes of chaotic maps are the so-called piecewise linear maps (where the map consists of a number of piecewise linear segments, e.g., Bernoulli shift, Liebovitch map) and the non-linear or intermitency maps (where non-linear segments are used, e.g., single intermitency maps, double intermitency maps). Details of these maps as well as of using them as traffic source models are presented in [29].

A.3.3 Generation of synthetic self-similar traffic traces Given the complexity of the self-similar processes, simulation has proven to be an imperious tool to use when analyzing the performance. Connected with this, efficient generation techniques for self-similar traffic traces are necessary in order to speed up the simulation times, especially when simulating and lab testing very high-speed networks. Although a number of methods can be for instance used for generating fBm traces [25], the problem still does exist for the need of approximate methods for generating long traces within a reasonable time and with high accuracy and quality. Some of the most widely-used methods for generating synthetic self-similar traces are the following [30]: ON-OFF method This method (also known as the aggregation method), was described in section A.3.2. An asymptotically self-similar process is generated. The main advantage is that it can be efficiently implemented on a parallel computer. The drawback however is that one must trade-off the speed of computation for the degree of self-similarity. A high degree of self-similarity (i.e., large H) requires that the number n of aggregated renewal processes is high, and this reduces the speed. 21

Norros method An exactly self-similar process can be generated with the help of a fBm process Z (t ) [28]

p

A(t ) = mt + amZ (t )

(A.61)

for ∞ < t < ∞. The process A(t ) represents the cumulative arrival process, i.e., the total amount of traffic arriving until time t. Z (t ) is a normalized fBm process with parameter H (0:5 < H < 1:0), m is the mean rate, and a is the peakedness (i.e., variance to mean ratio, over an unit time interval). This method is very popular in spite of the drawback related to the need to generate a fBm process. M/G/∞ queue model Consider a M/G/∞ queue where the customers arrive according to a Poisson process and have service times drawn from a heavy-tailed distribution (with infinite variance). Then the process fXt g representing the number of customers in the system at time t is asymptotically self-similar. This method is not easily implemented on a parallel machine. Besides that, it has also the drawback of the trade-off between the degree of self-similarity and the length of computation. Random Midpoint Displacement (RDM) method Given that the exact generation of fBm traces is practically infeasible (due to the large amount of required CPU time), a more efficient approximate algorithm is the RMD method [23]. A self-similar process can be synthesized by progressively subdividing an interval over which a sample path is generated at the midpoint from the values at the endpoints. At each division, a zero-mean Gaussian displacement is used to determine the value of the sample path at the midpoint of the subinterval. Self-similarity comes out by appropriate scaling of the variance of the displacement. The method is very popular, mainly because it is fast and also because it can be used to interpolate a self-similar path between observations made on a larger time scale. This is very advantageous for instance in traffic measurements, where it is no longer necessary to have very fine time scale measurements (such as in the order of units of milliseconds) and the relevant traffic parameters can be estimated from coarser measurement samples (such as 1 sec counts). The main drawback of this method is given by the fact that it only generates an approximately self-similar process. Particularly, the actual H parameter of the sample paths tends to be larger than the target value for 0 :5 < H < 0:75 and smaller than the target value for 0:75 < H < 1:0. Wavelet-based method This method is based on the so-called Multi-Resolution Analysis (MRA), which consists of splitting a sequence into a (low-pass) approximation and (high-pass) details [16]. The wavelet coefficients corresponding to a wavelet transform of fBm are first computed. These coefficients are then used with an inverse wavelet transformation 22

to yield sample paths of fBm. This method has the drawback that it is only approximate since the wavelet coefficients are generally not uncorrelated. The problem can be however compensated in the case of wavelets with an enough number of vanishing moments (i.e., the degree of the polynomial for which the inner scalar product with the wavelet is zero) [1]. FARIMA method An asymptotically self-similar process can be generated by using sample paths from a FARIMA process [18]. This method was described in section A.3.2. It has the drawback that it is slow. Fast Fourier Transform (FFT) method A faster method generating synthetic self-similar traces is by using the Fast Fourier Transform [30]. A sequence of complex numbers is first generated that corresponds to the power spectrum of a fGn process. The inverse Discrete Time Fourier Transform (DTFT) transform is used then to obtain the time-domain counterpart of this power spectrum. Because the sequence has the power spectrum of fGn, and also because the autocorrelation and the power spectrum form a Fourier pair, the conclusion is that this sequence has the autocorrelational properties of the fGn process. Furthermore, the DTFT and its inverse can be easily computed using the FFT algorithm. The main advantage of this method is that it is fast.

A.4

Revealing Self-Similarity in Teletraffic

As discussed above, the level of self-similarity in a time series is indicated by the Hurst parameter H. The self-similarity parameter H was named after the hydrologist H. E. Hurst, who spent many years to study the problem of water storage and also to determine the level patterns of the Nile river and other rivers [19]. He finally found that the Nile river obeys a self-similar pattern over an 800 year time period. Other rivers, like Rhine, show milder fluctuations. This H parameter has a value range of 0 :5 H 1:0 and strong self-similarity means that H has larger values (close to 1.0). Estimation of H is unfortunately a difficult task. On top of that, things may get even more complicated as the expressions provided for H require generally infinite length time series and the estimation must be done, practically, from finite time series. Several methods are available today to estimate self-similarity in a time series [1, 4, 35, 37]. The most popular methods are: R/S statistics analysis; variance-time analysis; periodogram-based analysis; Whittle estimator; and wavelet-based analysis.

A.4.1 Rescaled range analysis Based on the examination of different phenomena (e.g., variation of the water level of rivers), Hurst has developed a normalized, dimensionless measure to characterize variability that is called the rescaled range (R/S). For a given set of observations X = fXn; n 2 Z+ g with sample mean X (n), sample variance S2(n), and range R(n), the 23

rescaled adjusted range or R/S statistic is given by R(n) S(n)

=

max(0; ∆1 ; ∆2 ; ; ∆n ) min(0; ∆1; ∆2 ; ; ∆n ) S(n)

(A.62)

where k

∆k = ∑ Xi

kX

(A.63)

i=1

for k = 1; 2; ; n. Hurst has also observed that for many natural phenomena

R(n) E S(n)

cnH

(A.64)

as n ! ∞ and c being a finite positive constant independent of n. Taking the logarithm of both sides

R(n) log E S(n)

Hlog(n) + log(c)

(A.65)

as n ! ∞. Thus, one can estimate H by plotting logfE [ RS((nn)) ℄g versus log(n) on a graph, and least-squares fitting a straight line, with slope H, through the resulting points. The R/S method is not a precise one, it provides only an estimation of the level of self-similarity in a time series. This method can therefore be used only to test whether a time series is self-similar and, if so, to obtain a rough estimate of H.

A.4.2 Variance-time analysis The variance-plot analysis is based on the property of slowly decaying variance of self-similar processes under aggregation (presented in section A.2.4). According to this, the variance of an aggregated (exactly or asymptotically) self-similar process is (eqs. A.21 and A.24) Var[X (m) ℄ =

Var[X ℄ mβ

(A.66)

where H

=1

β 2

(A.67)

When aggregating a process, with different levels of aggregation m, the variance is normally decreasing quite rapidly (with H = 0:5). The exception however is in the case of self-similar processes, where the variance is decreasing slowly, according to a power-law (larger values of H). The eq. A.66 can be rewritten as logfVar[X (m) ℄g

(A.68) log[Var(X )℄ βlog(m) Having that logfVar[X m ℄g is a constant independent of m, if one plots Var(X ) ( )

versus m on a log-log graph, and by fitting a least-squares line through the resulting 24

points (and also ignoring the values for small m), the result should be a straight line with a slope of β. Slope values inbetween (-1; 0) suggest self-similarity. As in the case of R/S analysis, the variance-time method is only a heuristic method. On top of that, both methods further suffer because of diverse limitations, e.g., they may get seriously biased because of poor statistics available from a single realization of the self-similar process. The variance-time method can therefore be used only to test whether a time series is self-similar and, if so, to only obtain a rough estimate of H.

A.4.3 Periodogram method The periodogram-based estimator is a method that offers better statistical rigor than the aggregation-based estimators, but at the price of being a parametric method that requires parametrized model process to be known in advance. The periodogram (or ”intensity function”) IN (w) is the estimated spectral density of a stochastic process X(t) (defined at discrete time instances). This can be estimated by a Fourier series operation over a time period N

IN (w) =

2

1 N jkw ∑ Xk e 2πN k=1

(A.69)

where w is the frequency, Xk is the time series, and N is the length of the time series. Given that self-similarity affects the behavior of the spectrum I (w) when w ! 0, a periodogram of the form IN (w)

jwj1

2H

(A.70)

should be obtained when w ! 0. By plotting log[IN (w)℄ versus log(w) (only for low frequencies), and fitting a straight line to the curve, the slope is approximately (1 2H ). Practically, only the lowest 10% of the frequencies should be used for the calculation as this behavior holds only for frequencies close to zero [37]. This method has the main drawback that it is computational intensive.

A.4.4 Whittle estimator The Whittle estimator is based on the periodogram and uses a maximum likehood, non-graphical technique. Assuming that the process considered for examination is a self-similar one (with the unknown parameter H), and that the fBm model is used to model this process, and also with the (known) spectral density S(w; H ), it has been shown that the parameter H can be estimated by finding the value of H that minimzes the so-called Whittle expression [4, 37] Zπ IN (w) dw (A.71) π S(w; H ) Asymptotically unbiased estimators derived from the Gaussian Maximum Likehood Estimation (MLE) (like the periodogram and Whittle estimator) show generally good statistical performance (estimations of H with confidence intervals), but they have the drawback of being parametric estimators that require parametrized model processes (for the spectral density) to be known in advance. This poses large difficulties in terms of exact implementation for large data sets because of high computational complexity. Furthermore, if the assumed spectral density model is not the correct one, then this may result in a biased estimation as well. Because of this risk, these methods are not very robust. 25

A.4.5 Wavelet method The wavelet-based method has connections to the variance-time analysis and it allows analysis in both time and frequency domains. The wavelet estimator is based on the discrete wavelet transform (DWT), which has the advantages of both aggregationbased and MLE estimators and also avoids their drawbacks. The underlying concept of the DWT is the so-called Multi-Resolution Analysis (MRA), which consists of splitting a sequence x(t ) into a (low-pass) approximation and (high-pass) details. These are associated with the coefficients ax and dx , respectively [1] J

J

x(t ) = approxJ (t ) + ∑ detail j (t ) = ∑ ax (J; k)φJ;k (t ) + ∑ ∑ dx ( j; k)ψ j;k (t )(A.72) j =1

k

j =1 k

The parameters fφ j;k (t ) = 2 j=2 φ0 (2 j t k)g and fψ j;k (t ) = 2 j=2 ψ0 (2 j t k)g are the set of shifted and dilated functions of the scaling function φ0 and the wavelet ψ0 . The discrete wavelet transform consists of the collection of coefficients fdx ( j; k); j = 1; : : : ; J; k 2 Z g and fax (J; k); k 2 Z g as defined by the inner products dx ( j; k) = hx; ψ j;k i

(A.73)

ax ( j; k) = hx; φ j;k i

(A.74)

These coefficients are located on a dyadic grid. The focus is then placed on details that are described by the wavelet coefficients. When going from high resolution to lower resolution, the MRA gives rise to details at larger time scales. This can be interpreted in the frequency domain as band-pass filtering, going from high to low frequencies with constant relative bandwidth. On the other hand, spectral estimators (based on periodograms) may easily get strongly biased due to the fact that constant bandwidth mismatches the power-spectrum to be analyzed [37]. In contrast, the wavelet constant relative bandwidth manages to provide a perfect match. Given that Var[detail j ℄

2 j 2H (

2)

(A.75)

it is possible to estimate H by performing a linear fitting in a log-log plot log2 (

2j jdx( j; k)j2 ) = (2Hˆ n0 ∑ k

1) j + c

(A.76)

where Hˆ is a semi-parametric estimator of H, n0 is the data length and c is finite. This estimator has been proved to be unbiased under very general conditions and efficient under Gaussian assumtions [1].

A.5 Origins of Self-Similarity in Teletraffic Recent studies have shown that self-similarity in an idealized environment (i.e., with unbounded resources and independent traffic sources) can derive from the aggregation of many individual, albeit higly variable, ON-OFF sources (i.e., ON- and OFFperiods with heavy-tailed distributions with infinite variance, e.g., Pareto distributions) [6, 9, 25, 32]. In other words, the superposition of many ON-OFF sources exhibiting 26

the infinite variance syndrome (the so-called ”Noah Effect”) results in self-similar aggregate network traffic approaching fractional Brownian motion (the so-called ”Joseph Effect”). Furthermore, investigation of diverse traffic sources have also revealed that the highly variable ON/OFF behavior is an intrisic property of the client server interaction [2, 9, 20]. These results, although valuable and elegant, suffer however because of assumptions that are not realistic in terms of network environment. Contrary to the abovementioned assumptions, realistic client/server network environments implies resource boundedness. That means that non-linear processes could be also induced because of competition for bounded resources, with the consequence of coupling among traffic sources. Furthermore, diverse (feedback-based) control mechanisms, e.g., OS scheduling algorithm, TCP, Ethernet, may induce additional non-linearity in the case of congestion as well. The complexity of understanding the underlying physics that could give rise to self-similarity in network traffic is mainly given by the fact that this is not induced by one physical phenomena, but more. Different correlations existent in a self-similar network traffic, which are acting at different temporal scales, may arise because of different physical phenomena that manifest themselves with characteristics relevant to the specific temporal scale, e.g., information organization and retrieval (applications, disk and memory scheduling), user ”think time” and preference in file transfers (session/activity), effects of caching, TCP, Ethernet, and diverse ATM control mechanisms (admission control, congestion control, etc.). Furthermore, difficulties in understanding the interactions between diverse ”individual” correlations created in network traffic by these physical phenomena further complicates the issue. For instance, some of the most significant physical phenomena that may give rise to LRD of different kinds in network traffic are as follows:

User behavior Data generation, organization and retrieval Traffic aggregation Network controls Feedback-based control mechanisms Network evolution

User behavior One of the most important factors that affects the traffic behavior (at a session/call level and also during the session) is due to (human) user behavior. It was found, for instance, that the distribution of user requests (think time) and preferences for documents on Internet (WWW) show an extreme degree of fluctuations over a wide range of time scales [8, 9]. Furthermore, different flow control mechanisms existent in diverse traffic sources (e.g., VBR video MPEG-coded sources, ABR, TCP), which regulate the output traffic rate depending upon the states of the network, may also contribute to an increased burstiness in the network traffic [21]. It is also relevant to mention that the distributions of many phenomena of potential social and biological implications were shown to have inverse power law tails [39]. Examples of such systems are found in sociology (Lotka’s law), linguistics (Zipf’s law), economics (Pareto’s law), medicine (cardiac spectrum) and biology (Willis’ law). 27

Data generation, organization and retrieval An important aspect concerning the origins of traffic self-similarity is related to the way data is processed. Measurement studies have shown that self-similarity in network traffic as well as in application traffic is a two-dimensional property that refers partly to file/packet/cell interarrival time distributions and partly to file/packet size distributions. These results suggest that, at least for the application traffic, self-similarity may not be a machine induced artifact. Different applications/sources may supply traffic of statistically distinctive characteristics, but key statistical behavior (e.g., correlation structure) may be invariant from one machine/network to another. For instance, it has been shown that the distribution of the sizes of the information entities usually transferred at the application level (client-server paradigm) is best described by power law distributions. Traffic aggregation One of the most serious obstacles that prevents from keeping the traffic generated by individual sources isolated from other traffic (at least for as far into the network as possible) is due to the actual form of traffic aggregation (i.e., statistical multiplexing), used in the packet- and cell-based networks, with the consequence of coupling among traffic sources. As mentioned above, the superposition of many ON-OFF sources exhibiting the infinite variance syndrome results in self-similar aggregate network traffic approaching fractional Brownian motion. The LRD characteristics in traffic are also shown to be highly robust with respect to a variety of network operations, such as traffic splitting, merging, queueing, policing and shaping [13]. Self-similarity seems to be preserved under superposition of both homogeneous and heterogeneous, i.e., independent, traffic sources, and this holds over a wide range of conditions like for instance variations in the bandwidth bottleneck and buffer capacity, and mixing with cross-traffic possessing dissimilar (correlation) characteristics [21, 31]. Once a particular traffic source generates LRD traffic, the aggregate network traffic seems to become LRD, irrespective of the characteristics (SRD or LRD) of other traffics in the mix. The coupling process is a very complex one, an entire range of properties might get (completely) altered, i.e., not only the Hurst parameter, but also the mean and the variance might get significantly changed [13, 15]. Network controls One of the main reasons behind LRD seems however to be the resource boundedness that is typical in a realistic network environment, e.g., bounded communication and switching resources/bandwidth, limited buffer space, finite (CPU) processing capabilities. This is to be judged in the light of heavy-tailed asymptotics of LRD traffic processes existent in such environments, which generally claim more resources in the case of traffic control. Non-linear processes could be induced in the case of control mechanisms where diverse conflicts are not solved properly simply because of bounded resources. This is a general problem, which could manifest in many forms, from simple queueing models to very sophisticated control mechanisms. Perhaps the most eloquent example for such kind of problems is presented in [31], where it has been shown that the degree to which file sizes (WWW documents) are heavy-tailed (Pareto shape parameter α) can directly determine the degree of self-similarity (Hurst parameter H) of network traffic in the case of using flow control mechanisms of type TCP. 28

In contrast, when using non-flow-controlled mechanisms, such as UDP, the resulting network traffic shows little fractal characteristics. In other words, LRD in network traffic may arise because of the heavy-tailed distribution of files being transferred over the network as well as the (channel) bandwidth bottleneck. Furthermore, the problem of control of LRD traffic may even get exacerbated in the case of competition of more users for bounded resources, in the case of which the problem of correct resource dimensioning becomes of paramount importance. Finally, this issue is further complicated because of its multi-dimensional character, e.g., hardware contention (CPU, memory), bandwidth contention, and software contention (OS scheduling strategies, process priorities). Feedback-based control mechanisms A further complication is because many control mechanisms are of feedback-based type, e.g., flow and congestion control mechanisms (TCP, rate-based control mechanisms, etc). That means additional non-linearity could be induced in the case of congestion, with possibilities for a wide range of dynamical system behaviour, such as multistability and chaos [22]. Furthermore, it is also important to notice that there may exist very complex interactions between workload fluctuations and diverse (network) control mechanisms. These interactions not only include the potential for controls to temper the fractal nature of traffic (e.g., traffic shaping, UDP) but, in some cases, may also create or even enhance it (e.g., OS scheduling contention, TCP, Ethernet) [13, 31]. Two sets of critical issues are therefore identified in this case, namely, one on the impact of realistic traffic on the efficacy of specific traffic control mechanisms, and the other dealing with the extend to which controls may modify traffic characteristics. Network evolution Finally, an important reason for the increase of burstiness in the network traffic is because of the network evolution, with the continuous appearance of new services and applications. Especially the upcome of WWW-based services has dramatically resulted in more sophisticated traffic patterns [17]. As a quantitative estimation, link layer characteristics (e.g., Ethernet) seem to dominate the network traffic profile at millisecond time scales [25]. In contrast, the effects of human behaviour as well as data generation and retrieval seem to dominate at tens of seconds and beyond (minutes and even hours) [8]. In between, the effects of diverse flow control mechanisms, such as TCP, would likely dominate (at time scales of seconds). Queueing characteristics might dominate at tens and hundreds of milliseconds as well [13].

A.6

Performance Impacts

Understanding and validating self-similarity on physical grounds in a realistic network environment is, although important, not enough when developing efficient and integrated network frameworks within which requested end-to-end QoS guarantees are fully supported. An equal importance must be placed on understanding the impacts of self-similarity as well. These impacts can be broadly partitioned into fractal queueing (i.e., queueing models driven by fractal traffic processes) and impacts on controllability [13, 15]. 29

Packet network traffic in LAN and WAN environments has been shown to be LRD. Packets enter the network in clusters. Because of the LRD phenomenon, the bursts do not smooth out by simply aggregating over longer timescales. Such traffic is an inherent property of the sources that alternate between ON and OFF states where at least one of the states has heavy-tailed distribution for the time spent in that state. The LRD emerges as a manifestation of the high variability in the ON and OFF periods. Almost all applications on the Internet follow the client-server paradigm. The client makes a REQUEST for a service at the server (client side ON state) and the server responds with a RESPONSE (server side ON state). In between the requests and responses, both the entities have their OFF periods. The duration of the ON period depends on the sizes of the application layer messages. Typically, the REQUEST messages are small with a distribution showing limited variability whereas the RESPONSE messages are large with a distribution showing high variability. The application RESPONSE messages are fragmented into application layer buffers. These buffers are in turn broken up into packets (so-called ”cascading effect” [17]) to be carried by the specific network media being used. The high variability in the RESPONSE sizes directly results in the high variability in the packet arrival process. The OFF periods are typically due to user inactivities and the variability in the OFF process may be due to many different types of users accessing network services concurrently. The level of activity of human users generally varies from user to user. There may even be ”users” which are machines (HTTP proxy servers sitting behind a firewall or robots downloading web pages to create mirror sites). Transporting packets across the network between the application end points is subject to delays and errors. They may even be discarded by a congested router/switch. Because of the presence of LRD phenomenon across many types of networks, metrics of network performance such as throughput, packet loss, response time, buffer occupancy levels are affected [13, 28]. For instance, numerical and analytical studies of LRD traffic show that the tail of queue length distribution decays at a rate slower than that of the characteristic exponential decay as seen in traditional traffic studies. This deviation in the queue length process results in longer waiting times at the network processing elements (switches, routers etc). Furthermore, studies in [21] have shown the variation of some parameters (related to LRD) with link level load. Using a fBm model framework, the robustness of the Hurst effect has been shown to be an invariant under changing load conditions. This questions the effectiveness of traffic rate control mechanisms in altering auto-correlation structures seen in traffic streams. Other important consequences are:

Packet delays and consequently application level delays have a heavy-tailed distribution. Transport layer protocols like TCP estimate the round trip timer values from the peer acknowledgements and hence are influenced by it. Congestion situations are unavoidable and they appear as short-lived impulses. With increase in load, congestions appear more frequently while maintaining the impulsive behavior. Only increasing buffer sizes does not result in significant improvement in packet loss behavior.

The main interest is in evaluating the consequences of LRD in the application layer quality on an end-to-end basis. The end-to-end delivery is the responsibility of the 30

transport layer which encapsulates the manifestations of LRD property at the packet level traffic management. The transport layer characteristics are in turn modulated by the events occuring at the application layer and the way these events are managed at the application layer. It is important to understand and to quantify the effect of network controls (e.g, TCP flow control) on application-level parameters since the management of the application traffic has requirements that are different from those for the management of transport-level traffic. Application-level traffic management occurs by decisions on the admission control and resource management hierarchy whereas transport and network-level management requires the delivery of implied (e.g., IP) or contracted (e.g., ATM) QoS. Thus, in delivering end-to-end guaranteed service it is not just important to understand the traffic parameters at the application layer, but also to model and to quantify the effects of network control mechanisms on the application-level parameters. For example, if the application happens to be delivering multimedia service, then a stream of video and audio may be delivered to the client where the parameters of the video and audio stream are impacted by network congestion and recovery mechanisms. By the time the stream arrives at the client node, it may have been shaped significantly and the parameters of the stream may have changed drastically from the actual parameters negotiated during client/server connection setup. The case of interactive multimedia communications is more challenging as the source itself will shape the stream based upon feedback from the client. Thus, the issue of effective resource and traffic engineering to provide QoS on an end-to-end basis is a difficult question and is subject of ongoing and future research.

A.7

Conclusions

The traffic self-similarity existent in network and application traffic seems to be an ubiquitous phenomenon that is independent of technology, protocol and environment. As a consequence, many of the basic assumptions made in traffic control, engineering and operations and maintainance are questioned. This has also a profound impact on the performance, just to mention the potentially heavier queueing asymptotics than predicted by traditional Markov models. The network performance is dominated by the self-similarity property in the network traffic. Other important areas that are impacted by self-similarity are data analysis, statistical inference, mathematical modeling, queueing and performance analysis. These are questions that are still under investigation and research, and more efforts must be done in the future to answer them.

31

Bibliography [1] Abry P. and Veitch D., Wavelet Analysis of Long Range Dependent Traffic, IEEE Transactions on Information Theory, Vol. 44, No. 1, January 1998. [2] Arlitt M.F. and Williamson C.L., Web Server Workload Characterization: The Search for Invariants (Extended Version), IEEE/ACM Transactions on Networking, Vol. 5, No. 5, October 1997. [3] Addie R., Zukerman M. and Neame, T., Fractal Traffic: Measurements, Modelling and Performance Evaluation, Proceedings of IEEE INFOCOM’95, 1995. [4] Beran J., Statistics for Long-Memory Processes, Chapman and Hall, 1994. [5] Box G.E.P. and Jenkins G.M., Time series analysis: forecasting and control, Holden Day, San Francisco, 1970. [6] Beran J., Sherman R., Taqqu M.S. and Willinger, W., Long-Range Dependence in Variable-Bit Rate Video Traffic, IEEE Transactions on Communications, Vol. 43, No. 2/3/4, February/March/April 1995. [7] Cox D.R., Long-Range Dependence: A Review, Statistics: An Appraisal, Iowa State University Press, 1984. [8] Crovella M.E. and Bestavros A., Explaining World Wide Web Traffic SelfSimilarity, Technical Report: TR-95-015, Computer Science Department, Boston University, 1995. [9] Crovella M.E. and Bestavros A., Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Proceedings of ACM SIGMETRICS’96, 1996. [10] Causey J.W. and Kim H.S., Comparison of Admission Control Schemes in ATM Networks, International Journal of Communication Systems, Vol. 8, 1995. [11] Duffy D.E., McIntosh A.A., Rosenstein M. and Willinger, W., Statistical Analysis of CCSN/SS7 Traffic Data from Working Subnetworks, IEEE Journal on Selected Areas in Communications, Vol. 12, No. 3, 1994. [12] Duffield N.G. and O’Connell N., Large Deviation and Overflow Probabilities for the General Single-Server Queue, with Applications, preprint, Dublin Institute for Advanced Studies, 1993. [13] Erramilli A., Narayan O. and Willinger W., Experimental Queueing Analysis with Long-Range Dependent Packet Traffic, IEEE/ACM Transactions on Networking, Vol. 4, No. 2, April 1996. 32

[14] Erramilli A. and Singh R.P., Application of Deterministic Chaotic Maps to Characterize Broadband Traffic, Proceedings of the 7th ITC Specialists Seminar, Morristown, USA, 1990. [15] Erramilli A., Willinger W. and Wang J.L., Modeling and Management of SelfSimilar Traffic Flows in High-Speed Networks, Network Systems Design, Gordon and Breach Science Publishers, 1999. [16] Flandrin P., Wavelet Analysis and Synthesis of Fractional Brownian Motion, IEEE Transactions on Information Theory, Vol. 38, No. 2, March 1992. [17] Feldmann A., Gilbert A.C., Willinger W. and Kurtz T.G., The Changing Nature of Network Traffic: Scaling Phenomena, Computer Communications Review, Vol. 28, No. 2, April 1998. [18] Garett M. and Willinger W., Analysis, Modeling and Generation of Self-Similar VBR Video Traffic, Proceedings of ACM SIGCOMM’94, 1994. [19] Hurst H.E., Black R. and Simaika, Y., Long-term Storage: An Experimental Study, London: Constable, 1965. [20] Jena A.K., Pruthi P. and Popescu A., Resource Engineering for Internet Applications, Proceedings of the 7th IFIP ATM Workshop, Antwerp, Belgium, June 1999. [21] Jena A.K., Pruthi P. and Popescu A., Modeling and Evaluation of Network Applications and Services, Proceedings of the RVK’99 Conference, Ronneby, Sweden, June 1999. [22] Li G-L. and Dowd W.D., An Analysis of Network Performance Degradation Induced by Workload Fluctuations, IEEE/ACM Transactions on Networking, Vol. 3, No. 4, August 1995. [23] Lau W.C., Erramilli A., Wang J.L. and Willinger W., Self-Similar Traffic Generation: The Random Midpoint Displacement Algorithm and Its Properties, Proceedings of IEEE ICC’95, 1995. [24] Leland W., Taqqu M., Willinger W. and Wilson D., On the Self-Similar Nature of Ethernet Traffic, Proceedings of ACM SIGCOMM’93, 1993. [25] Leland W., Taqqu M., Willinger W. and Wilson D., On the Self-Similar Nature of Ethernet Traffic (Extended Version), IEEE/ACM Transactions on Networking, Vol. 2, No. 1, February 1994. [26] Mandelbrot B.B., New Methods in Statistical Economics, Journal of Political Economy, Vol. 71, No. 5, October 1963. [27] Mandelbrot B.B. and Ness J.W. van., Fractional Brownian motion, fractional noises and applications, SIAM Review, Vol. 10, No. 4, 1968. [28] Norros I., A storage model with self-similar input, Queueing Systems Theory and Applications, Vol. 16, Iss. 3-4, 1994. [29] Pruthi P., An Application of Chaotic Maps to Packet Traffic Modeling, PhD dissertation, Royal Institute of Technology, stockholm, Sweden, 1995. 33

[30] Paxson V., Fast Approximation of Self-Similar Network Traffic, preprint, Lawrence Berkeley Laboratory and EECS Division, University of California, Berkeley, USA, 1995. [31] Park K., Kim G.T. and Crovella M.E., On the Relationship Between File Sizes, Transport Protocols, and Self-Similar Network Traffic, preprint, Boston University, 1996. [32] Paxson V. and Floyd S., Wide Area Traffic: The Failure of Poisson Modeling, Proceedings of ACM SIGCOMM’94, 1994. [33] Paxson V. and Floyd S., Wide Area Traffic: The Failure of Poisson Modeling, IEEE/ACM Transactions on Networking, Vol. 3, No. 3, June 1995. [34] Peitgen H-O., J¨urgens H. and Saupe D., Chaos and Fractals New Frontiers of Science, Springer-Verlag, 1992. [35] Stallings, W., High-Speed Networks TCP/IP and ATM Design Principles, Prentice-Hall, Inc., 1998. [36] Tuan T. and Park K., Congestion Control for Self-Similar Network Traffic, preprint, Purdue University, 1998. [37] Taqqu M.S. and Teverovsky V., On Estimating the Intensity of Long-Range Dependence in Finite and Infinite Variance Time Series, preprint Boston University, USA, 1996. [38] Taqqu M.S., Willinger W. and Sherman R., Proof of a Fundamental Result in Self-Similar Traffic Modeling, Computer Communication Review, Vol. 27, No. 2, April 1997. [39] West B.J. and Deering B., The Lure of Modern Science Fractal Thinking, Studies of Nonlinear Phenomena in Life Sciences, No. 3, World Scientific Publishing Co., 1995. [40] Wang J.L. and Erramilli A., Monitoring Packet Traffic Levels, Proceedings of GLOBECOM’94, 1994. [41] Willinger W., Taqqu M.S. and Erramilli A., A Bibliographical Guide to SelfSimilar Traffic and Performance Modeling for Modern High-Speed Networks, Stochastic Networks: Theory and Applications, Oxford University Press, 1996. [42] Willinger W., Taqqu M.S., Sherman R. and Wilson D.V., Self-similarity through High Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level, IEEE/ACM Transactions on Networking, Vol. 5, No. 1, February 1997.

34