nonparametric renewal function estimation and ... - Semantic Scholar

1 downloads 0 Views 210KB Size Report
Keywords: Renewal function, nonparametric estimate, heavy-tailed distribution. 1. .... (t) is a minimum-variance unbiased estimator of F ∗n(t). In contrast to our ...
NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA Natalia M. Markovich Institute of Control Sciences, Russian Academy of Sciences, Profsoyuznaya 65, 117997 Moscow, Russia, E-mail: [email protected] Abstract. We consider an estimate of the renewal function (rf) using a limited number of independent observations of the interarrival times for an unknown interarrival-time distribution (itd). The nonparametric estimate is derived from the rf-representation as series of distribution functions (dfs) of consecutive arrival times using a finite summation and approximations of the latter by empirical dfs. Due to the limited number of observed interarrival times the estimate is accurate just for closed time intervals [0, t]. An important aspect is the selection of an optimal number of terms k of the finite sum. Here two methods are proposed: (1) an a priori choice of k as function of the sample size l which provides almost surely (a.s.) the uniform convergence of the estimate to the rf for light- and heavy-tailed itds if the time interval is not too large, and (2) a data-dependent selection of k by the plot of the proposed estimate against k for a fixed time t. To evaluate both the efficiency of the estimate and the selection method of k, a Monte Carlo study is performed. Keywords: Renewal function, nonparametric estimate, heavy-tailed distribution.

1.

INTRODUCTION

Renewal processes have a wide range of applications in the warranty control, in the reliability analysis of technical systems and particularly of telecommunication networks such as highspeed packet-switched networks like the Internet. Normally, measurement facilities count the events of interest, e.g. the number of requested and transferred Web pages, incoming or out-coming calls, frames, packets or cells in consecutive time intervals of fixed length. It is important for planning and control purposes to estimate the related traffic load in terms of the mean numbers of counted events and their variances in these intervals. In such applications the renewal function (rf) constitutes the basic characteristic of an underlying renewal process since by means of this function the expectation and variance of the number of arrivals of the relevant events before a fixed time instant can be calculated (cf. [11]). To estimate the rf, several realizations of the counting process may be required, e.g. the observations of the number of calls within several days. We estimate here the rf using inter-arrival times between events of only one realization of the process. Let F (t) = IP {τn < t} denote the common distribution function (df) of the i.i.d. interarrival times {τn , n = 1, 2, . . .} of these events with F (0+) = 0. The renewal counting process {Nt , t ≥ 0} P denotes the number of events before time t, Nt = max{n : tn < t} for t ≥ 0, n where tn = i=1 τi , t0 = 0 are the arrival times. The rf H(t) is expressed by H(t) = IE (Nt ) =

∞ X

n=1

IP {tn < t} =

∞ X

n=1

F ∗n (t)

(1)

2

Natalia M. Markovich

for t ≥ 0 where F ∗n denotes the n-fold recursive Stieltjes convolution of F . Several rf-estimation methods have been developed for a known interarrival-time distribution. Unfortunately, explicit forms of the rf are obtained only in rare cases, for example, if the interarrival times have a uniform distribution, or for the wide class of matrix-exponential distributions (exponential and Erlang distributions are from this class) (cf. [1]). Therefore, several attempts have been made to evaluate the rf computationally (cf. [3], [5], [8], [16], [23]). If the mean µ and variance σ 2 of F are finite, then the rf H(t) may be approximated for large t by an expression t σ2 1 H(t) = + 2 − + ◦(1) µ 2µ 2 widely used in the literature (cf. [11]). In [3] an alternative asymptotic expression is stated for the case that the Laplace-Stieltjes transform (LST) of F (t) is a rational function. Such estimates do not perform well for small times t relative to µ, which is especially important for the warranty control of devices (cf. [9]). In practice, it is a more realistic situation that the distribution is unknown or that just general information describing it is available. The restoration of the df or the probability density function (pdf), if the latter exists, may become complicated if the distributions of the random variables (rvs) are heavy-tailed. This means that those distributions have heavier tails than an exponential one (see [6], [7], [12], [17], [18]). Weibull distributions with a shape parameter less than one and Pareto distributions provide examples of such pdfs. Heavy-tailed distributions often arise in practice, for example, in insurance and queueing or in the characterization of World Wide Web (WWW) traffic (cf. [19]). In this paper we propose to estimate the rf without any information about the form of the underlying distribution and we use only an empirical sample T l = {τn , n = 1, 2, . . . , l} of the nonnegative i.i.d. interarrival times between events of the size l. The stated nonparametric estimate is related to a histogram-type estimate where the unknown probabilities IP {t n < t} in (1) are replaced by the corresponding empirical dfs and a limited number of terms k is used in the summation. A similar nonparametric estimate Hl (t, k) =

k X

(n)

Fl

(t)

(2)

n=1

was proposed by Frees [9], [10] and further investigated in [22]. These authors have used (n)

Fl

(t)

=



l n

−1 X c

θ(t − (τi1 + . . . + τin )) 

P 1, t ≥ 0 as estimate of the arrival-time distribution. Here θ(t) = and c denotes the 0, t < 0   l sum over all distinct index combinations {i1 , i2 , . . . , in } of length n. The U -statistic n (n)

Fl (t) is a minimum-variance unbiased estimator of F ∗n (t). In contrast to our estimate (4), which uses just one combination of adjacent interarrival times, the computation of the Hl (t, k) is awkward. The accuracy of such types of estimates depends on k. Frees has obtained the uniform consistency of Hl (t, k) a.s. on compact intervals [0, t], 0 ≤ t < ∞, under the assumptions

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

that k = l and F (t) has a positive mean and finite variance, and the asymptotic normality of Hl (t, k) for each fixed point t under some moment conditions (cf. [10]). In [13] the convergence of a non-computational empirical rf Hlg (t) =

∞ X

Fˆl∗n (t)

(3)

n=1

on IR as l → ∞ has been proved. Here, Fˆl∗n (t) is the n-fold convolution of the empirical df Fl (t) based on the sample T l . However, the data-dependent selection of k (which is important for moderate samples) was not considered in [9], [10], [13], and [22]. In our consideration, we use a unbiased estimate of F ∗n (t), but its variance is not minimal. We compensate this inaccuracy by the data-dependent selection of k and the use of larger samples. An a priori choice of k as a function of the sample size l is proposed to obtain a.s. the uniform convergence of the estimate to the rf as l → ∞ for light- and heavy-tailed pdfs. As data-dependent choice of k the plot of the estimate (4) against k for a fixed time t is applied. The paper is organized as follows. In Section 2 the histogram-type estimate of the rf is investigated. Theorems 1-4 related to the a.s. uniform convergence of this estimate to the rf are stated. The selection of the parameter k by the plot is described. In Section 3 a comparison of (4) with Frees’ estimate is given. Finally, the findings are summarized in the Conclusion. In Appendix the proofs of the Theorems of Section 2 are presented. 2.

HISTOGRAM-TYPE ESTIMATE OF THE RENEWAL FUNCTION

We consider the estimate of the rf H(t) which was first introduced in [15]. Let [r] denote the integer part of a number r. Inserting the empirical mean for IE (Nt ), we replace the df Pl n θ(t − tin ). IP {tn < t} by the empirical df Fln (t) = l1n i=1   P n·i It is its unbiased estimate and tin = q=1+n(i−1) τq , i = 1, . . . , ln , ln = nl , n = 1, . . . , k, are the observations of the rv tn . Then one can estimate thenrenewal function H(t) n based on the o o l1 1 samples of independent renewal-time observations t1 = t1 , . . . , t1 , . . . , tk = t1k , . . . , tlkk by: ln k X 1 X e (t, k, l) = H θ(t − tin ) (4) l n n=1 i=1 e (t, k, l) = k holds for t ∈ [tmax (k), ∞) where tmax (k) = max1≤n≤k max1≤i≤l ti Note that H n n and k is some fixed number. The errors of the estimation arise both from the approximation of H(t) in (4) by a finite sum and the approximation of IP {tn < t} by the empirical df Fln (t): e (t, k, l) k = k kH(t) − H

∞ X

n=k+1

IP {tn < t} +

k X

n=1

(IP {tn < t} − Fln (t)) k

(5)

e (t, k, l) as well as the estimator (2) are biased since k is By this formula one can see that H limited. A rough upper bound of the bias is given by e (t, k, l) = bias(t, k, l) = H(t) − IEH

∞ X

n=k+1

IP {tn < t} ≤

∞ X

n=k+1

(F (t))

n

k+1

=

(F (t)) .(6) 1 − F (t)

3

4

Natalia M. Markovich

For small t F (t) is generally small and F (t) < 1, thus, this error is small. To provide a good approximation of IP {tn < t} by the empirical df, according to the Glivenko-Cantelli theorem sufficiently large values ln should be used, i.e. k < l. Note that lk = 1 for 2l < k ≤ l, i.e. the sample tk contains only one point. Therefore, it is reasonable to take k ≤ 2l . In the following, we provide an optimized estimate of k. On e (t, k, l) in general, the other hand, to provide a good approximation of H(t) by means of H e the value of k should be large enough. Therefore, the estimate H (t, k, l) is sensitive to the e (t, k, l) choice of k and the length of the estimation interval [0, t]. Obviously, the estimate H may only be accurate within the interval [0, tmax (k)], since the sample size l is limited. 2.1. Convergence of the Histogram-Type Estimate Now we investigate the convergence of estimate (4) to the rf in the metric of the space C of continuous functions. To estimate the risk (5), one is interestedP in the t-regions [0, t] ∈ ∞ [0, tmax (k)]. The main problem is to estimate the systematic error n=k+1 IP {tn < t}. To estimate it, one needs some information about the df F (t) of the rv τ and then one can use precise large deviation results for IP {tn < t}. Such a principal information may be the existence of the moment generating function (Cram´er’s  condition (C.c.)) (cf. [20]). The rv τ satisfies C.c. if there exists θ > 0 such that IE eθτ < ∞. The C.c. is equivalent to an exponential decay rate of 1 − F (t) and it is satisfied for light-tailed distributions. The C.c. provides the existence of all moments of the rv τ . Theorem 1. Let {τ1 , . . . , τl } be a sequence of i.i.d. rvs and t ∈ [0, tmax (k)]. We suppose that IE|τi |m < ∞ for some integer m ≥ 3, IEτi = µ, var(τi ) = σ 2 , and that the parameter k obeys k = o(lρ ) (as l → ∞), 0 < ρ < 1/3. (7) Then

  e (t, k, l) | = 0 IP ω : liml→∞ sup |H(t) − H

=

1,

t

holds.

The rate of this uniform convergence may be proved for the class Se of itds such that 1 − F (t)



exp(−νt)

for any t ∈ [0, T ] and some ν > 0. We assume, without loss of generality, that [0, T ] = [0, 1]. The class Se includes, for example, the exponential distribution and the Weibull distribution with a shape parameter larger than one. Hence, it follows for the estimate of the right-hand side of (6): k+1

(F (t)) 1 − F (t)



(1 − exp(−νt)) exp(−νt)

k+1

Then, for F (t) ∈ Se the error of an approximation by (4) in the metric of C is estimated by e (t, k, l) | sup |H(t) − H

(8)

t



sup t

(1 − exp(−νt)) exp(−νt)

k+1

+|

k X

n=1

(IP {tn < t} − Fln (t)) |

!

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

e Theorem 2. If {τ1 , . . . , τl } is an i.i.d. sample with the df   F ∈ S, t ∈ [0, 1] and the ν+α ln l ρ −ρ parameter k = c · l (c = c(ν, α, ρ) ≥ −l ln(1−exp(−ν)) + 1 > 0), 0 < ρ < 1/3 − (2/3)α, e (t, k, l) to H(t) is 0 < α < 1/2, then the asymptotic rate of convergence of the estimate H given by the expression   e (t, k, l) | ≤ c1 IP ω : liml→∞ sup lα |H(t) − H = 1, t

where c1 is a constant that is independent of l.

Then, the following confidence interval is derived for the rf. Corollary 1. If assumptions of Theorem 2 hold, then the following inequalities hold with a probability of at least 1 − χ, 0 < χ < 1: e (t, k, l) − D ≤ H(t) ≤ H e (t, k, l) + D, H

where

D

=

l

−α

+k

r



(9)

k ln (χ/2) . 2l

In practice, interarrival times are often described by distributions with heavy tails (cf. [4], [12]). Two classes of heavy-tailed distributions are well known: the distributions with regularly varying tails where 1 − F (t) = t−α L(t), t > 0, α > 0 and L is a slowly varying function, and the subexponential distributions, i.e. the distributions with the property: for any ε > 0 there exists T = T (ε, F ) such that for any t > T , 1 − F (t) > exp(−εt). It is the specific feature of heavy-tailed distributions that they do not satisfy C.c.. If t is not too large, an approximation of IP{tn > t} by the tail of the standard normal R ∞ − y2 ¯ e 2 dy is used for heavy-tailed distributions, namely, distribution Φ(x) = √1 2π

x

¯ √t ) IP{tn > t} ∼ Φ( n

(10)

{tn >t} cn (this means that limn→∞ sup0 2, cn /hn may be ∼ n0.5 ln0.5 n (cf. [18]). Hence, the following Theorem can be proved. Theorem 3. If {τ1 , . . . , τl } is a sequence of i.i.d. rvs with the heavy-tailed df F (t), t ∈ (0, min(tmax (k), hckk )], and the parameter k obeys (7), then   e IP ω : liml→∞ sup |H(t) − H (t, k, l) | = 0 = 1, t

holds.

5

6

Natalia M. Markovich

We consider now the rate of uniform convergence for distributions with regularly varying tails. By the representation theorem [7], [18] the slowly varying function `(x) can be rewritten in the form Z x ε(y) dy}, x ≥ x0 (11) L(x) = c(x) exp{ y x0

for some x0 > 0, where c(·) is a measurable non-negative function such that lim x→∞ c(x) = c0 ∈ (0, ∞) and ε(x) is continuous function, ε(x) → 0 as x → ∞. In the following theorem, we will assume that c(x) is a monotone decreasing or increasing function and ε(x) is a non-positive function. Theorem 4. Let {τ1 , . . . , τl } be i.i.d. non-negative regularly varying rvs with the tail F (x) = L(x)x−α , x > 0, α > 0, the parameter k = d · l ρ , d = −A, 0≤ρ< A = A(β) =

1 2 − β, 3 3

(12)

(β + (α − ε(θ))(1 + η)) ln l − ln c∗ + ε(θ) ln x0   + 1, −ε(θ) ln 1 − c∗ l(1+η)(−α+ε(θ)) x0

c∗ = min(c0 , c(a)), η > 1/(α − ε∗ ), 0 < ε∗ < α, 0 < β < 1/2, θ ∈ [x0 , tmax (k)], x0 > 0, t ∈ [a, tmax (k)], a > 0, then   e (t, k, l) | ≤ c1 = 1, IP ω : liml→∞ sup lβ |H(t) − H t

where c1 is a constant that is independent of l.

Corollary 2. If assumptions of Theorem 4 hold, η > (1 − ln ν/lnl)/(α − ε∗ ), then with probability at least 1 − ν, 0 < ν < 1, the inequality (9) holds, where s   k ν − l1−η(α−ε∗ ) D = l−β + k ln . 2l 2

The Theorems determine the values of k as functions of the sample size l. These values k are given only up to a rough asymptotic equivalence. For instance, k can be multiplied by any positive constant and the Theorems remain valid. In practice, one needs exact optimal values of k which are adapted to the empirical data. Therefore, we subsequently consider a data-dependent selection of k. 2.2. Selection of k by a plot. As a practical tool of the visual selection of k the plot of the histogram-type estimate against k may be used. The example of similar approach is the Hill’s plot for the tail index selection. The idea of the plot is based on uniform convergence of estimate (4) to the true rf when k increases as l → ∞. Then one can select for the fixed t the minimal k corresponded to the stable interval of the plot, i.e. e k, l) = H(t, e k + 1, l), k = 1, ..., l − 1.} k ∗ = arg min{k : H(t,

(13)

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

In Figure 1 one can see the plots of the histogram-type estimate (4) against k for different fixed time intervals [0, t], t ∈ {1, 3, 5, 10}. The Weibull distribution with the shape parameter s = 3 and the sample size l = 50 are considered.

Fig. 1. Histogram-type estimate of the renewal function against k for a Weibull distribution.

In Figure 2 the histogram-type estimate for Weibull(s = 3) and Gamma(s = 0.55, λ = 1) is shown. The parameter k is selected by the the plot. 3.

A SIMULATION STUDY

The simulation study relates to the comparison with Frees’ tables presented in [9]. For this purpose samples of the Gamma distribution Ga(λ, s) with the parameters s = 2 and λ = 1  s−1 −s t λ exp(−t/λ)/Γ(s), t > 0 fG (t) = 0, t≤0 and of the Weibull distribution fW (t)

=



sts−1 exp (−ts ) , t > 0 0, t≤0

with s = 3 (not heavy-tailed Weibull) have been generated. The provided times are given by T ∈ {0.25, 0.5, 0.75, 1.0, 1.25} and l ∈ {10, 15, 20, 25, 30, 100} are the samples sizes. As in [9] two characteristics, the bias and the mean-squared error of the estimates, were calculated over 50 Monte Carlo repetitions and H(t) is the true rf. For the fixed number of points T and for the mentioned distributions the latter was taken from those tables presented in [2]. The results of the calculation are presented in the Tables 1 to 4. Frees’ results are included here, where H3n (t) denotes the estimate (2). Since (2) requires much computational effort, only k ∈ {5, 10} and l ≤ 30 were considered. For (4) the parameter k is calculated by the plot method, i.e. by (13). The tables 1 to 4 show that for all estimates

7

8

Natalia M. Markovich

Fig. 2. Histogram-type estimate of the renewal function against k for a Weibull(s = 3) (top plot) and Gamma(s = 0.55, λ = 1)(bottom plot) distributions and the corresponded rfs. k is selected by the plot. The values of rfs were taken from the tables, [2].

(a) the mean squared error increases as T increases; (b) for any fixed T the mean squared error decreases as l becomes larger; (c) the bias does not exhibit a stable behavior. e k) with H3n (t) one may conclude that Comparing H(t,

e k) for sample size equals to 100. (a) the mean-squared error is less for H(t, e k); irrespectively, the mean squared error (b) the bias of H3n (t) is less than that of H(t, of the latter tends to be less, especially for the smallest T .

4.

CONCLUSIONS AND DISCUSSION

In this paper we have developed and investigated a nonparametric histogram-type estimate of the rf which does not require any knowledge about the form of the inter-arrival time distribution. Due to the limited number of empirical data the histogram-type estimate (as well as Frees’ estimate (2)) can be applied for closed time intervals [0, t] with a relatively small (n) t. Compared to Frees’ estimate Fl (t) of the arrival-time distribution F ∗n (t) in (2), the estimate (4) proposed here uses a simpler and rougher estimate of F ∗n (t). The rf H(t) is approximated by a finite sum of estimates of the arrival-time distributions with k terms. The parameter k is selected to compensate the error of the risk function. The estimate (4)

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

Table 1. Part I: Gamma (s = 0.55, λ = 1, IEτ = 0.55)

Size

T

10

0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25

15

H3n (t) BIAS M SE ·104 ·104 k=5 -157 1346 -169 3158 -48 5864 -61 10510 -334 15210 6 980 33 2110 91 3722 135 7033 -99 10730

H3n (t) BIAS M SE ·104 ·104 k = 10 -151 1361 -157 3194 -34 5884 -65 9349 -167 13620 15 995 75 2167 154 3624 172 5507 138 7880

e k, l) H(t, BIAS M SE ·104 ·104 kplot -110 1870 -36.33 5260 420 10010 540 14180 -1370 15290 74.36 1280 -340 3180 -320 5340 -280 7720 -890 12040

Table 2. Part II: Gamma (s = 0.55, λ = 1, IEτ = 0.55)

Size

T

20

0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25

25

30

100

H3n (t) BIAS M SE ·104 ·104 k=5 -5 -68 -22 22 -208 33 0 38 88 -131 5 -27 -6 42 -170 n.a. n.a. n.a. n.a. n.a.

667 1468 2672 5201 8142 573 1226 2218 4555 7273 497 1032 1845 3926 6500 n.a. n.a. n.a. n.a. n.a.

e k, l) H(t, BIAS M SE ·104 ·104 kplot -37.46 -190 300 -370 -610 -84.07 -110 -370 -300 -95.75 -21.64 170 -53.57 -740 290 -46.02 28.48 -81.31 -260 63.47

990 2250 4670 6850 9630 750 2020 2970 5250 6970 630 1770 3030 3850 8040 210 550 890 1360 1830

9

10

Natalia M. Markovich

Table 3. Part I: Weibull (s = 3, IEτ = 0.89)

Size

T

10

0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25

15

H3n (t) BIAS M SE ·104 ·104 k=5 -12 15 9 98 16 238 -8 318 -35 307 -17 9 -13 71 36 177 9 211 -20 219

H3n (t) BIAS M SE ·104 ·104 k = 10 -12 15 9 98 16 238 -8 318 -35 307 -17 9 -13 71 36 177 9 211 -20 219

e k, l) H(t, BIAS M SE ·104 ·104 kplot -28.26 12.02 -49.27 110 -59.93 270 -1.445 380 -1.023 420 15.66 10.83 -31.78 73.78 41.87 160 76.49 250 86.14 300

Table 4. Part II: Weibull (s = 3, IEτ = 0.89)

Size

T

20

0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25 0.25 0.5 0.75 1 1.25

25

30

100

H3n (t) BIAS M SE ·104 ·104 k=5 -12 7 1 53 43 130 34 174 35 171 -13 6 -12 38 1 104 -12 130 13 125 -14 5 -39 31 -35 90 -46 114 -47 111 n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a.

e k, l) H(t, BIAS M SE ·104 ·104 kplot 10.67 7.862 -21.99 61.22 24.9 130 -20.74 190 -55.25 230 -3.505 5.819 25.38 47.21 -4.171 120 59.9 170 31.88 210 -1.642 4.715 51.86 32.65 -2.707 90.16 63.76 140 10.29 190 -1.908 1.522 19.59 11.6 -28.98 30.78 -8.576 45.14 -29.28 61.91

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

may be computed for sufficiently large l and k which is not realistic for (2). The Theorems 1 and 3 state both for heavy- and light-tailed inter-arrival time distributions (itds) those values of the parameter k as functions of the sample size which provide a.s. the uniform convergence of the histogram-type estimate (4) to the true rf for sufficiently small t. It is proved that a smaller value of k (k < l) than in [10] is sufficient to get a reliable estimate of the rf. In Theorems 2 and 4 the rates of the uniform convergence and confidence intervals of the rf for the classes of itds with the exponential and regularly varying tails are presented. But these Theorems determine k only up to a rough asymptotic equivalence. Such a value k does not depend on the empirical data. This feature may influence the accuracy of the estimation. To estimate k by samples of a moderate size, the plot method is used. It has been shown that the histogram-type estimate (4) has a larger bias but a smaller mean squared error compared to Frees’ estimate (2). In conclusion, a reliable computationally tractable histogram-type estimate of the renewal function has been investigated that is also applicable to heavy-tailed inter-arrival-time distributions.

Acknowledgements The author is grateful to Prof. Marc Burger and Prof.Paul Embrechts for their kind assistance and invitation in ETH. APPENDIX 1 Proof of Theorem 1: By (5) we get for 0 ≤ t ≤ tmax (k): e k, l)| sup |H(t) − H(t, t



sup t

∞ X

n=k+1

IP {tn < t} + k max sup |IP {tn < t} − Fln (t)| 1≤n≤k

t

Under the conditions of the Theorem it follows by well-known results that IP



tn − nµ √ 0, then at t ∈ [0, tmax (k)], e k, l)| ≤ c + η sup |H(t) − H(t, t

follows. Hence:   e IP sup |H(t) − H(t, k, l)| > c + η



IP k max sup |IP {tn < t} − Fln (t)| > η


η ≤ 2 exp −2ln η 2 , t

which is satisfied for sufficiently large l. Then it follows     l e k, l)| > c + η IP sup |H(t) − H(t, < 2 exp −2 3 η 2 = P (η, l, k) k t P ∞ Since k = o(lρ ) holds, 0 < ρ < 1/3 the series l=1 P (η, l, k) converges at least for one η > 0, and according to the Borel-Cantelli lemma the assertion of the Theorem follows. Proof of Theorem 2: Using (8) we have for t ∈ [0, 1]: e k, l)| sup |H(t) − H(t, t



(1 − exp(−ν))

k+1

exp(ν)

+k max sup |IP {tn < t} − Fln (t)| 1≤n≤k

t

and for α > 0: e k, l)| lα sup |H(t) − H(t, t



lα (1 − exp(−ν))

k+1

exp(ν)

+lα k max sup |IP {tn < t} − Fln (t)| 1≤n≤k

t

Since ρ < 1/3 − 2/3α, 0 < α < 0.5, ρ > 0, for sufficiently large l and the corresponding c(ν) we get lα (1 − exp(−ν))

k+1

exp(ν)



1,

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

therefore, if lα k max sup |IP {tn < t} − Fln (t)| 1≤n≤k



t

η

for any constant η > 0, then it follows e k, l)| ≤ 1 + η. lα sup |H(t) − H(t, t

Hence, 

 e IP l sup |H(t) − H(t, k, l)| > 1 + η t   α < IP l k max sup |IP {tn < t} − Fln (t)| > η . α

1≤n≤k

t

Using (16) we have:     l  η 2 α e = P (η, l, k) (17) IP l sup |H(t) − H(t, k, l)| > 1 + η < 2 exp −2 3 α k l t P∞ Since k = c(ν) · lρ and α + 1.5ρ < 0.5 holds, the series l=1 P (η, l, k) converges at least for one η > 0, and according to the Borel-Cantelli lemma the assertion of the Theorem holds. Proof of Corollary 1: Let the right-hand side of (17) be equal to 0 < χ < 1:   l  η 2 = 2 exp −2 3 α k l Hence, we have

r

χ

k ln (χ/2) . 2l Then one gets the level of the confidence interval D = (1 + η)l −α . η = kl

α



Proof of Theorem 3: Since for t ∈ (0, hckk ) expression (10) is valid for sufficiently large n, ∞ X

IP{tn < t}

n=k+1



∞ X

Φ

n=k+1



t √ n



P∞ holds. The expansion at the right-hand side converges. Therefore, n=k+1 IP{tn < t} < c follows, where c is a constant. The rest of the proof is similar to the proof of Theorem 1. Proof of Theorem 4: For t ∈ [a, tmax (k)], a > 0 we have from (11) sup t

(F (t))k+1 1 − F (t)

k+1

=

=

(1 − L(t)t−α ) L(t)t−α t k+1  Rt dy} 1 − t−α c(t) exp{ x0 ε(y) y sup . R t ε(y) −α t t c(t) exp{ x0 y dy} sup

13

14

Natalia M. Markovich

The mean value theorem implies exp{

Z

t x0

ε(y) dy} y

=

exp{ε(θ) ln

for some θ ∈ [x0 , t]. Hence, k+1

sup t

(F (t)) 1 − F (t)

= sup t





  ε(θ) t t , }= x0 x0

−ε(θ)

1 − c(t)t−α+ε(θ) x0

−ε(θ)

c(t)t−α+ε(θ) x0

k+1

Then by −α + ε(θ) < 0 we have

 k+1 −ε(θ) 1 − cinf (tmax (k))−α+ε(θ) x0 (F (t))k+1 sup , = −ε(θ) 1 − F (t) t cinf (tmax (k))−α+ε(θ) x

(18)

0

where cinf = inf c(t) = t



c(tmax (k)), c(a),

if if

c(t) c(t)

monotone decreasing function monotone increasing function.

Since c(tmax (k)) > c0 than cinf ≥ min(c0 , c(a)) = c∗ and the right-hand side of (18) is less than   −ε(θ) k+1 1 − c∗ (tmax (k))−α+ε(θ)x0 (19) −ε(θ) c∗ (tmax (k))−α+ε(θ)x0 Assume, that max τi ≤ lη

(20)

i

holds, where η > 1/(α − ε∗ ). Then tmax (k) ≤ l max τi < l1+η . i=1,...,l

It implies, that (19) is less or equal than  k+1 −ε(θ) 1 − c∗ l(1+η)(−α+ε(θ)) x0 −ε(θ)

c∗ l(1+η)(−α+ε(θ)) x0

.

Then from (18) we have sup lβ t

k+1

(F (t)) 1 − F (t)







−ε(θ)

1 − c∗ l(1+η)(−α+ε(θ)) x0

−ε(θ)

c∗ l(1+η)(−α+ε(θ)) x0

k+1

Since (1 + η)(−α + ε(θ)) < 0 then for sufficiently large l the right-hand side of the latter inequality is less or equal to 1 for k = −A·l ρ and ρ ≥ 0. Note, that for β > 0 and sufficiently large l A < 0 holds. Therefore, if lβ k max sup |IP{tn < t} − Fln (t)| ≤ χ 1≤n≤k

t

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

for any constant χ > 0, it follows from (5), (6) e (t, k, l) k ≤ 1 + χ lβ sup kH(t) − H t

Hence, from (20)

e (t, k, l) k > 1 + χ} IP{lβ sup kH(t) − H


χ}

+

IP{max τi > lη }

1≤n≤k

t

t

i

holds. By the global property of the regularly varying rvs (see [7], p.38, [18]) we have IP{max τi > x} ∼ lIP{τi > x} = lx−α L(x) i

as

x→∞

The following property of slowly varying functions is often used: for all ε ∗ > 0 such T exists, that for x > T ∗ ∗ x−ε ≤ L(x) ≤ xε

holds. Let ε∗ < α. Hence,

IP{max τi > lη } ∼ l1+η(ε



i

−α)

as

l → ∞.

Using (16) we finally have e (t, k, l) k > 1 + χ} < l1+η(ε IP{lβ sup kH(t) − H t



−α)

 + 2 exp −2χ2 l1−2β /k 3 = P (η, l, k)

(21) P∞ Since k = dlρ and 0 < ρ < 1/3 − 2/3β, η > 1/(α − ε∗ ) hold, the series l=1 P (η, l, k) converges at least for one η > 0, and according to the Borel-Cantelli lemma the assertion of the Theorem holds. Proof of Corollary 2: Let the right-hand side of (21) be equal to 0 < ν < 1:  ∗ l1−η(α−ε ) + 2 exp −2χ2 l1−2β /k 3 = ν

Hence, we have

χ = kl

β

s

k − ln 2l



 ν − l1−η(α−ε∗ ) . 2

Then one gets the level of the confidence interval D = (1 + χ)l −β .

REFERENCES [1] Asmussen, S., 1996. Renewal Theory and Queueing Algorithms for Matrix-Exponential Distributions, in: Matrix-Analytic Methods in Stochastic Models, S.R. Chakravarthy and A.A. Alfa, eds., New York, 313–341. [2] Baxter, L.A., McConalogue, D.J., Scheuer, E.M., Blischke, W.R., 1982. On the Tabulation of the Renewal Function. Technometrics. 24, 2, 640–648.

15

16

Natalia M. Markovich

[3] Chaudhry, M.L., 1995. On Computations of the Mean and Variance of the Number of Renewals: a Unified Approach. Journal of the Operational Research Society 46, 1352–1364. [4] Chistyakov, V.P., 1964. A theorem on sums of independent positive random variables and its applications to branching random processes. Theory Probab. Appl. 9, 640–648. [5] Delig¨on¨ ul, Z.S., 1985. An approximate solution of the integral equation of renewal theory. J. Appl. Prob. 22, 926–931. [6] Devroye, L., Gy¨orfi, L., 1985. Nonparametric Density Estimation, the L1 View. John Wiley, New York. [7] Embrechts, P., Kl¨ uppelberg, C. and Mikosch, T., 1997. Modelling Extremal Events. Springer, Berlin. [8] Feller, W., 1941. On the integral equation of renewal theory. Ann. Math. Statist. 12, 243–267. [9] Frees, E.W., 1986. Warranty Analysis and Renewal Function Estimation. Nav. Res. Logist. Quart. 33, 361–372. [10] Frees, E.W., 1986. Nonparametric renewal function estimation. Ann. Statist. 14, 1366– 1378. [11] Gnedenko, B.W., Kowalenko, I.N., 1971. Einf¨ uhrung in die Bedienungstheorie. Oldenbourg Verlag, M¨ unchen. [12] Goldie, C.M., Kl¨ uppelberg, C., 1998. Subexponential distributions, in: A Practical Guide to Heavy Tails: Statistical Techniques for Analysing Heavy Tailed Distributions, R. Adler, R. Feldman and M.S. Taqqu, eds., Birkh¨auser, Boston, 435-459. [13] Gr¨ ubel, R., Pitts, S.M., 1993. Nonparametric estimation in renewal theory 1: the empirical renewal function. Ann. Statist. 21, 3, 1431–1451. [14] Hall, P. 1990. Using the Bootstrap to Estimate Mean Squared Error and Select Smoothing Parameter in Nonparametric Problems. Journal of Multivariatte Analysis. 32, 177– 203. [15] Markovitch, N.M., Krieger, U.R. 1999. Estimating Basic Characteristics of Arrival Processes in Advanced Packet-Switched Networks by Empirical Data, in: Proceedings of First IEEE/Popov Workshop on Internet Technologies and Services, October 25-28, Moscow, Russia, 70–78. [16] McConalogue, D.J., 1981. Numerical treatment of convolution integrals involving distributions with densities having singularities at the origin. Comm. in Statistics, Series B10, 265–280. [17] Mikosch, T., Nagaev, A.V. 1998. Large deviations for heavy-tailed sums with applications to insurance. Extremes 1, 81-110. [18] Mikosch, T., 1999. Regular Variation, Subexponentiality and Their Applications in Probability Theory. Technical Report 99-013, ISSN: 1389-2355, University of Groningen.

NONPARAMETRIC RENEWAL FUNCTION ESTIMATION AND SMOOTHING BY EMPIRICAL DATA

[19] Nabe, M., Murata, M., Miyahara, H., 1998. Analysis and modelling of World Wide Web traffic for capacity dimensioning of Internet access lines. Performance Evaluation 34, 249–271. [20] V.V.Petrov, ”Sums of Independent Random Variables”. Springer, N. Y., 1975. [21] Prakasa Rao, B.L.S., 1983. Nonparametric Functional Estimation. Academic, Orlando, Fla. [22] Schneider, H., Lin, B.-S., O’Cinneide, C., 1990. Comparison of Nonparametric Estimators for the Renewal Function. Appl. Statist. 39, 1, 55–61. [23] Xie, M., 1989. On the solution of renewal-type integral equations. Commun. Statist.Simula. 18, 1, 281–293. ————————————————–

17