Nonparametric regression for threshold data - TAMU Stat

The Canadian Journal of Statistics Vol. 28, No. ?, 2000, Pages ???-??? La revue canadienne de statistique

Nonparametric regression for threshold data ¨ Ursula U. MULLER Key words and phrases: binary data, kernel regression, optimal bandwidth, stochastic resonance. Mathematics subject classification codes (1991): primary 62G07; secondary 62G20. ABSTRACT Consider a detector which records the times at which the endogenous variable of a nonparametric regression model exceeds a certain threshold. If the error distribution is known, the regression function can still be identified from these threshold data. The author constructs estimators for the regression function that are transformations of kernel estimators. She determines the bandwidth that minimizes the asymptotic mean average squared error. Her investigation was motivated by recent work on stochastic resonance in neuro-science and signal detection theory, where it was observed that detection of a subthreshold signal is enhanced by the addition of noise. The author compares her model with several others that have been proposed in the recent past.

´ ´ RESUM E Supposons qu’un appareil enregistre les temps auxquels la variable endogène d’un modèle de régression non paramétrique dépasse un certain seuil. Si la loi des erreurs est connue, la fonction de régression peut quand même être identifiée ` a partir de telles observations. L’auteure propose des estimateurs pour cette fonction de r´ egression qu’elle obtient par transformation d’estimateurs a ` noyau. Elle détermine la taille de fenêtre qui minimise une erreur quadratique moyenne expérimentale asymptotique. Ses recherches ont été motivées par des travaux récents portant sur la résonance stochastique en théorie de la détection du signal et en neuro-science, domaines o` u l’on a observ´ e que la présence d’un bruit facilite la détection d’un signal inférieur ` a un seuil donné. L’auteure compare son modèle ` a plusieurs autres qui ont été proposés ces dernières années.

1. INTRODUCTION In a system with a threshold, a subthreshold signal may be detected if noise, either from the background or artificially generated, is added to the input. If the noise is too low, it does not help much. If it is too high, it drowns out the signal. It is plausible and has been observed both empirically and through simulations that there is an optimal level of noise. This property of the system is known as

1

stochastic resonance, although this name is not really appropriate unless the signal is periodic. The term stochastic resonance was introduced by Benzi et al. (1981) in the context of a model describing the periodic recurrence of ice ages. Climate changes are modeled as transitions in a double-well potential system pushed by a signal, the orbital eccentricity, which causes small variations of the solar energy influx. Since the periodic forcing for switching from one state point to the other is very weak, it must be assisted by other factors such as short term climate fluctuations that are modeled as noise. However, if there is too much noise, the transitions become independent of the frequency of the periodic signal. Consequently there must be an optimal noise level, i.e., stochastic resonance. Since then, stochastic resonance has been extended to a large variety of physical systems with simpler thresholds (for an overview, see Gammaitoni et al. 1998). It attracted particular attention in neuro-science these past few years. Various models for information processing in the nervous system were proposed that are explained by stochastic resonance. Here, theory turns away from bistable systems which do not describe neuronal dynamics well. Moreover, neuronal models with aperiodic input signal are often more appropriate than models limited to periodic inputs and have gained considerable attention (Collins et al. 1996). Perhaps the simplest model for neural dynamics regards a single neuron as a threshold crossing detector as follows: the cell is stimulated by an external input (signal); if its membrane voltage exceeds a fixed threshold, the cell fires and is reset. It can be assumed that the sensory system is optimized, so that the spike train of a firing cell contains significant information. The presence of noise, e.g., background noise from other neurons, leads to maximal performance, i.e., stochastic resonance (Wiesenfeld & Moss 1995). Stochastic resonance was studied not only theoretically but could also be exhibited in several experiments. For example, Douglass et al. (1993) demonstrated stochastic resonance in the firing rate of sinusoidally stimulated mechanoreceptor cells of crayfish. Although a large amount of literature is available, there is little statistical work on this subject. For the models described, stochastic resonance is usually exhibited through simulations. Various measures of detectability are used. In the case of periodic signals, this is typically the signal-to-noise ratio (e.g., Wiesenfeld & Moss 1995). If the signal is aperiodic, usually a correlation measure is considered (e.g., Collins et al. 1995). A more familiar approach from a statistical point of view was considered by Stemmler (1996) in the case of a (nearly) constant signal. Instead of commonly adopted measures, which break down for constant signal, he uses Fisher information. Recently, Greenwood et al. (1999) have derived efficient estimators and analyzed stochastic resonance statistically for single and multiple thresholds. As in Stemmler (1996) and Greenwood et al. (1999), the model considered here is embedded in a neuro-physiological context. We consider the simple neuron detector described above: if the incoming noisy signal reaches a certain threshold, the cell fires and is reset. The observations are the times when the threshold is exceeded. For the quantification of neuronal responses, it is standard practice to sum all spikes in a fixed period to obtain an estimation of the firing rate. Clearly there is additional information in the timing of the spikes, especially when the shape of the signal is unknown. This information will be used in our statistical approach based on kernel regression methods. We will derive a consistent estimator of a subthreshold signal from the exceedance times data. The problem is cast as nonparametric regression.

2

Consider the nonparametric regression problem Y (ti ) = s(ti ) + (ti ) with independent mean zero error variables (ti ). Let a > 0 be a threshold, and suppose that we do not observe the realizations Y (ti ) but only the times at which the threshold a is exceeded. Then the observations (threshold data) are Bernoulli variables coded by 1 or 0 as follows: X(ti ) = 1{s(ti ) + (ti ) > a} = 1{Y (ti ) > a};

(1)

this was suggested by McCulloch & Pitts (1943) as a model of a neuron. We can always estimate the probabilities E{X(ti )} = p(ti ) = P {X(ti ) = 1} = P {s(ti ) + (ti ) > a},

(2)

say by kernel methods. To identify the signal s(t), the distribution of the (ti )’s, say Fti , must be known and invertible. Then s(ti ) = a − Ft−1 {1 − p(ti )}. i

(3)

The estimator of s(ti ) will be taken to be the kernel estimator for p(ti ) transformed as in (3). If the noise is artificially generated, then it is reasonable to assume that the error distribution function is completely known. If the noise is background noise, we may at least know the form of the distribution. For example, it may be plausible to assume that the errors are normally distributed. However, we cannot identify the noise amplitude from the X(ti )’s. One way of dealing with the problem is to get information about the noise from elsewhere, for example by using a second detector with a different threshold; see Greenwood et al. (1999) for a treatment of the constant signal case. A second solution is to note that even if the noise distribution is known up to a scale parameter, the signal can still be identified up to a one-parameter family of transformations. Hence most of the information is retained, unless the signal is constant, of course. As in every approach involving kernel estimators, the choice of the bandwidth is of crucial importance. Our criterion for bandwidth selection will be the asymptotic mean average squared error of the estimator of s, which we derive. A formula for the asymptotically optimal bandwidth can then be written. Since the formula involves unknown quantities, we estimate the optimal bandwidth by plugging in estimators for them. An asymptotic approach is reasonable since a linearization of the mean squared error E{ˆ s(t) − s(t)}2 is necessary. This already involves asymptotics. Because of this approximation, it does not seem worthwhile to pursue a more sophisticated bandwidth selection technique. Our approach is similar to that of Ruppert et al. (1995), who derive an asymptotically optimal bandwidth for the classical setting, i.e., fully observed data. The main difference between this article and theirs is the nonlinear link occurring here, in particular in the mean squared error expression. Through the linearization of this expression, the calculation of the optimal bandwidth becomes similar to the standard case, and familiar results can be used. This paper is organized as follows. In Section 2 we give the kernel estimator, basic notation and assumptions. Section 3 is the main section of this article. We derive the asymptotic expressions for the mean squared error, the mean average squared error, and the resulting optimal local and global optimal bandwidths. Some remarks and references concerning the suggested plug-in estimation will be given in Section 4. Section 5 concludes the article with an example and a comparison with existing techniques, emphasizing stochastic resonance. 3

2. KERNEL REGRESSION Consider the model introduced in Section 1, Equations (1)–(3), and let the time points be equally spaced in [0, 1], i.e., ti = i/n, i = 1, . . . , n. We assume that the distribution functions of the noise Fti are continuous with mean zero. Often, the (ti ) will be identically distributed, and the distribution will be normal. We also require that s have two continuous derivatives and be bounded from below, viz. s(t) ≥ −c for every t ∈ [0, 1] for some c > 0. In addition, Ft should be four times continuously differentiable, and there should exist p1 , p2 ∈ (0, 1) such that for all 0 ≤ t ≤ 1, 1 − Ft (a + c) ≥ p1

and

Ft (0) ≥ p2 .

These conditions and the subthreshold assumption s(t) < a imply that p has two continuous derivatives and is bounded away from 0 and 1, viz. p(t) ∈ [p1 , 1 − p2 ] for each t,

(4)

a condition that is easily verified. To simplify the notation, we write Gt (x) = 1 − Ft (a − x), so that p(t) = Gt {s(t)} and s(t) = G−1 t {p(t)}. We treat the problem of estimating the probabilities p(t) as a nonparametric regression problem and estimate p(t) by a modified kernel estimator p˜h (t), where h > 0 denotes the bandwidth. We obtain an estimator for the signal by ph (t)} sˆh (t) = G−1 t {˜ (

with p˜h (t) =

pˆh (t) 1/2

if pˆh (t) ∈ C = [p1 /2, 1 − p2 /2], otherwise.

Here, pˆh (t) is a classical kernel estimator. If the values of pˆh (t) are near 0 and 1, G−1 ph (t)} is undefined. For simplicity, we set p˜h (t) equal to 1/2 if pˆh (t) ∈ / C (or t {ˆ to a suitable different constant). Because of (4), we have p(t) ∈ C. The kernel estimator used here is the Nadaraya-Watson estimator Pn 1 K( t−ti )X(ti ) Pnh 1 h t−ti . pˆh (t) = i=1 i=1 h K( h ) For the estimation at inner points t ∈ [h, 1 − h] considered in this article, let K : IR → IR be some second order kernel function, i.e., Z Z Z K(u)du = 1, uK(u)du = 0, u2 K(u)du = c 6= 0. We also need that the derivative K 0 be bounded and assume that the support of K is [−1, 1]. For t ∈ / [h, 1 − h], a different approach with some boundary kernel K should be chosen (cf., e.g., Gasser et al. 1985). Instead of the Nadaraya-Watson estimator, one could also consider a local linear kernel estimator (cf. Ruppert et al. 1995), which is known for its superior boundary behaviour but requires more work. The proofs given here can be adapted to this approach in a straightforward way. Later on, we take h = hn → 0 and nh3 → ∞ as n → ∞. 4

3. ASYMPTOTIC ERROR AND OPTIMAL BANDWIDTH In nonparametric regression theory, generally accepted measures of the quality of the estimation are E{ˆ sh (t)−s(t)}2 and the mean average P the mean squared error squared error n−1 ti ∈T E{ˆ sh (ti )−s(ti )}2 . In the latter case, summation is usually restricted to some interval T = [c, d] ⊂ (0, 1) because of boundary effects. These quantities will also be studied here and taken as criteria for an asymptotically optimal local and an asymptotically optimal global bandwidth. Another popular R global error measure would be the mean integrated squared error T E{ˆ sh (t) − s(t)}2 dt. The proofs given herein adapt to this case in a straightforward way. The approach of this section will be to derive a Taylor approximation for the mean squared error (which immediately gives the approximation for the mean average squared error). The bandwidth h that minimizes the leading terms of the expansion will then be called optimal. The distinguishing characteristic of our model is that it involves the nonlinear transformation sˆh (t) = G−1 ph (t)}. An t {˜ additional problem arising from the modification p˜h (t) of the kernel estimator pˆh (t) will be seen to be negligible, because it can easily be verified that p˜h (t) coincides asymptotically with the common estimator pˆh (t). The following lemmas recall and extend well known auxiliary results from classical theory. They provide approximation formulas for the bias, variance and higher moments of pˆh (t). The derivations are standard and can be found either in Eubank (1988) or M¨ uller (1999). In the following, let h be sufficiently small so that the time points t where estimation takes place satisfy t ∈ [h, 1−h]. The terminology and conditions introduced in Section 2 will be assumed throughout. Lemma 1. Consider the asymptotics n → ∞, h = hn → 0, and nh2 → ∞. For ` = 2, 3, the approximation Z 1 ` 1−` ` ph (t)}] = (nh) E{X(t) − p(t)} K ` (u)du + o (nh)1−` E[ˆ ph (t) − E{ˆ −1

then holds uniformly in t ∈ [h, 1 − h], where X(t) is a Bernoulli random variable with parameter p(t). Furthermore, ph (t)}]4 = O{(nh)−2 } = o{(nh)−1 }, E[ˆ ph (t) − E{ˆ uniformly in t ∈ [h, 1 − h]. Lemma 2. For n → ∞, h = hn → 0 and nh2 → ∞, the bias and the mean squared error of pˆh (t) can be approximately uniformly in t ∈ [h, 1 − h] by E{ˆ ph (t) − p(t)} = h2 p00 (t)µ2 (K)/2 + o(h2 ) + O{(nh2 )−1 } and E{ˆ ph (t) − p(t)}2

= V ar{ˆ ph (t)} + [E{ˆ ph (t)} − p(t)]2 = (nh)−1 p(t){1 − p(t)}R(K) + h4 p00 (t)2 µ2 (K)2 /4

+o{(nh)−1 } + o(h4 ) + O{(nh2 )−2 } R1 R1 with µ2 (K) = −1 u2 K(u)du and R(K) = −1 K 2 (u)du. Furthermore, E{ˆ ph (t) − p(t)}3 E{ˆ ph (t) − p(t)}

4

(5)

=

o{(nh)−1 } + O(h6 ) + O{(nh2 )−3 },

(6)

=

o{(nh)−1 } + O(h8 ) + O{(nh2 )−4 },

(7)

5

uniformly in t ∈ [h, 1 − h]. We now state our main theorem. It concerns the signal estimator sˆ, a function of the modification p˜h of the estimator pˆh , whose properties are described in Lemmas 1 and 2. We give the asymptotic mean squared error (locally), the asymptotic mean average squared error (globally) and the respective optimal bandwidths. Theorem. Consider the asymptotics n → ∞, h = hn → 0, and nh3 → ∞. The mean squared error M SE(h, t) = E{ˆ sh (t) − s(t)}2 exists and may be approximated as follows by the asymptotic mean squared error, viz. (nh)−1 p(t){1 − p(t)}R(K) + h4 p00 (t)2 µ2 (K)2 /4 G0t {s(t)}2

AM SE(h, t) =

(8)

R1 R1 where R(K) = −1 K 2 (u)du and µ2 (K) = −1 u2 K(u)du. This approximation is valid up to an term of order o{(nh)−1 + h4 } and uniformPin t ∈ [h, 1 − h], For the mean average squared error M ASE(h) = n−1 ti ∈T E{ˆ sh (ti ) − s(ti )}2 , the approximation AM ASE(h) =

R(K) X p(ti ){1 − p(ti )} h4 µ2 (K)2 X p00 (ti )2 + . n2 h G0ti {s(ti )}2 4n G0ti {s(ti )}2 ti ∈T

ti ∈T

holds up to a term of order o{(nh)−1 + h4 }. The asymptotically optimal local bandwidth is 1/5 R(K)p(t){1 − p(t)} (9) hopt (t) = nµ2 (K)2 p00 (t)2 and the asymptotically optimal global bandwidth is  hopt

= 

R(K)

P

1 ti ∈T G0t {s(ti )}2 p(ti ){1 i

P

nµ2 (K)2

− p(ti )}

1 00 2 ti ∈T G0t {s(ti )}2 p (ti )

1/5 

.

(10)

i

Proof. Let t ∈ [h, 1 − h] and sˆh (t) = G−1 ph (t)} as in Section 2. Consider the t {˜ loss function Ht : (0, 1) → IR defined by Ht (x)

=

−1 2 [G−1 t (x) − Gt {p(t)}] .

Applying the so-called delta method, we obtain a linearization of the mean squared error, E{ˆ sh (t) − s(t)}2

= =

E[Ht {˜ ph (t)}] 1 (2) H {p(t)}E{˜ ph (t) − p(t)}2 2 t 1 (3) + Ht {p(t)}E{˜ ph (t) − p(t)}3 + E[Rt {˜ ph (t)}] (11) 3!

(2)

with Ht {p(t)}/2 = 1/G0t {s(t)}2 and ph (t)} Rt {˜

=

{˜ ph (t) − p(t)}4

Z 0

1

(1 − z)3 (4) Ht [p(t) + z{˜ ph(t) − p(t)}]dz. 3! 6

In order to establish the asserted approximation, we use the auxiliary results E{˜ ph (t) − p(t)}` − E{ˆ ph (t) − p(t)}` = ph (t)}] = E[Rt {˜

o{(nh)−1 } + o(h4 ), o{(nh)−1 } + o(h4 ),

` ∈ IN (12) (13)

which hold uniformly in t ∈ [h, 1 − h], as will be verified at the end of the proof. Both relations, combined with (6) and nh3 → ∞, imply that the two last terms of (11) have the order o{(nh)−1 } + o(h4 ). Similarly, the mean squared error of p˜h (t) in the first term can be replaced by the familiar approximation (5) for the mean squared error of pˆh (t). This establishes (8), uniformly for t ∈ [h, 1 − h]. Since T ⊂ [h, 1−h] for sufficiently small h, the approximation formula AM ASE(h) for the P mean average squared error M ASE(h) = n−1 ti ∈T M SE(h, ti ) is immediately derived from this result. The optimal local and global bandwidth given in (9) and (10) are obtained by simple calculus, differentiating AM SE(h, t) and AM ASE(h) with respect to h. Hence only the auxiliary statements (12) and (13) remain to be shown. For the proof of (12), it should first be noticed that C = [p1 /2, 1 − p2/2] ⊂ (0, 1) was chosen such that p(t) ∈ C. By equation (4), we have p(t) ∈ [p1 , p2 ] ⊂ C. Hence there exists some δ > 0 such that [p(t) − δ, p(t) + δ] ⊂ C for all t ∈ [0, 1]. This will be used in the following chain of equalities and inequalities, which holds for arbitrary nonnegative integer `: |E{˜ ph (t) − p(t)}` − E{ˆ ph (t) − p(t)}` | ` X ` E[{˜ ph (t) − pˆh (t)}k {ˆ ph (t) − p(t)}`−k ] = k k=1 ` X ` E[1[0,1]\C {ˆ ≤ ph (t)} · |˜ ph (t) − pˆh (t)|k |ˆ ph (t) − p(t)|`−k ] k k=1 ` X ` E[1[0,1]\C {ˆ ≤ ph (t)}] k k=1

ph (t) ∈ / C} = (2` − 1)P {ˆ ` ph (t) − p(t)| > δ} ≤ (2 − 1)P {|ˆ ph (t) − p(t)}4 /δ 4 . ≤ (2` − 1)E{ˆ In the last step, Markov’s inequality was applied. Relation (12) now follows from (7) and the fact that nh3 → ∞. (4) For the proof of (13), we will use the boundedness of Ht on C, which results from the fact that Ft is four times continuously differentiable. Thus ph (t)}]| |E[Rt {˜ Z 4 ph (t) − p(t)} · E {˜

=

1

0

≤

(1 − z)3 (4) Ht [p(t) + z{˜ ph(t) − p(t)}]dz 3! (4)

4

E{˜ ph (t) − p(t)} · sup |Ht (x)|. x∈C

Using (12) with ` = 4 and E{ˆ ph (t) − p(t)}4 = o((nh)−1 ) + o(h4 ), one then gets (13) immediately. Hence the proof is complete. 7

Let us briefly discuss the approximate mean squared error calculated in (8). Its numerator is the Taylor approximation of the mean squared error E{ˆ ph (t) − p(t)}2 of the kernel estimator pˆh (t). This well known formula yields the decomposition of E{ˆ ph (t) − p(t)}2 into variance and squared bias of pˆh (t) (Equation (5), Lemma 2). In particular, the characteristic variance-bias trade-off becomes evident. With the optimal asymptotic bandwidth hopt at hand, the minimal value of AM ASE may now be written as " # 15 " # 45 5 µ2 (K)2 X p00 (ti )2 R(K) X p(ti ){1 − p(ti )} AM ASE(hopt ) = . 4 n4 G0ti {s(ti )}2 n G0ti {s(ti )}2 ti ∈T

ti ∈T

This value depends on the squared second derivatives of p(t) = Gt {s(t)}. Smooth signals will, in general, lead to small values of p00 (t)2 and thus to small values of inf h>0 AM ASE(h). Furthermore, this formula shows the influence of the kernel K, which appears only in the expression µ2 (K)2/5 R(K)4/5 . The (second order) kernel with support [−1, 1] which minimizes this term, and thus the asymptotic mean average squared error under the further constraint K ≥ 0, is the Epanechnikov kernel K ∗ (u) = 3(1 − u2 )1[−1,1] (u)/4. The optimal kernel K ∗ is not much better than other kernels, for example the Gaussian kernel (cf. Wand & Jones 1995). What is really crucial is the correct choice of the bandwidth h. 4. DATA-DRIVEN BANDWIDTH SELECTION In this section, we construct an optimal data-driven bandwidth. We assume that R1 the kernel K is given. In particular, the kernel constants R(K) = −1 K 2 (u) du R1 and µ2 (K) = −1 u2 K(u) du are known. Recall the asymptotically optimal local bandwidth in our theorem, viz. 1/5 R(K)p(t){1 − p(t)} c(p(t), p00 (t)) = . hopt (t) = n−1/5 c(p(t), p00 (t)), µ2 (K)2 p00 (t)2 The following arguments will also apply to the optimal global bandwidth hopt . Both are of the form n−1/5 c, with c depending on the unknown probability function p and its second derivative, p00 . At first it should be mentioned that both optimal bandwidths, hopt (t) and hopt , have the (optimal) convergence rate n−1/5 , and that this rate is maintained for any bandwidth h of the form h = n−1/5 c, where c > 0 is an arbitrary constant. However, the choice of c has a strong influence on the finite sample behaviour. Hence, in order to guarantee a good bandwidth approximation in a concrete application, p and p00 should be estimated reasonably well. We estimate hopt (t) by a so-called plug-in strategy. This means that we estimate the unknown values p(t) and p00 (t) with a preliminary estimator and plug them into the formula above. For the pilot estimator, a large variety of methods is available, since estimating p is a classical nonparametric regression problem with binary responses. One can, for example, apply cross-validation or the so-called blocking method introduced by H¨ ardle & Marron (1995). In the latter, the design space is divided into blocks, and a polynomial of low degree is fitted to every block. Another approach would be to carry out some preliminary kernel smoothing. However, a new bandwidth selection problem then arises and, at some point, a pilot bandwidth has to be determined with a different technique. 8

A comprehensive overview of bandwidth selection techniques is given by Wand & Jones (1995). Several plug-in methods for the classical setting are discussed in Ruppert et al. (1995). Further discussions of bandwidth selection techniques can be found in the books of Eubank (1988) and H¨ ardle (1990). 5. THE STOCHASTIC RESONANCE EFFECT The model considered here was suggested by models for neuron firing triggered by a noisy signal. The literature in this field emphasizes the stochastic resonance effect, i.e., the existence of an optimal noise level for the detectability of the signal. This effect has mainly been shown empirically through simulations. In several papers (Collins et al. 1995, 1996; Henegan et al. 1996; Chialvo et al. 1997a,b) a box kernel is used to estimate the probability that a threshold is crossed. This kernel corresponds with our more sophisticated estimator pˆh (t) but with a fixed bandwidth whose choice is justified by physical arguments. The estimator is compared with the signal using Pearson correlation or modifications thereof, mostly called “normalized power norm” or “cross-correlation coefficient,” namely Pn ph (ti ) − pˆh } i=1 {s(ti ) − s}{ˆ ∈ [−1, 1]. C = Pn Pn [ i=1 {s(ti ) − s}2 ]1/2 [ i=1 {ˆ ph (ti ) − pˆh }2 ]1/2 Here, pˆh stands for any estimator of the exceedance probabilities, s and pˆh denote mean averages.

Figure 1. (a) Left panel: simulation of signal estimates sˆh (dashed line) for s(t) = sin(2πt) (solid line), with n = 1000, a = 1, h = 0.16 and i.i.d. N (0, 1.082 ) noise; the dotted line shows the probability estimates pˆh . (b) Right panel: realizations of the Pearson correlation C (circles) between the sinusoid s from the left panel and its estimates sˆh , for σ = 0.5, 1, . . . , 3 and n = 100. An illustration of our technique is given in Figure 1, whose left panel shows an estimate of a simple sinusoidal signal s(t) = sin(2πt). In this example, we chose n = 1000 time points, a threshold a = 1 and independent normal N (0, 1.082) ˆ opt = 0.16 was estimated by plug-in methods, using a noise. The bandwidth h modification of the blocking method of H¨ ardle & Marron (1995), which is relatively close to the theoretically optimal bandwidth hopt = 0.13. Besides the estimated signal sˆh , this figure also shows the estimated probabilities pˆh that correspond to the box kernel mentioned above. Of course, these estimators are not consistent for 9

the signal. Nevertheless, both the Pearson correlation between s and pˆh and the correlation between s and our consistent estimator sˆh are close to 1 (C = 0.977 resp. C = 0.995). This is not surprising, however, since C is invariant with respect to linear transformations and pˆh still catches essential characteristics of the shape of s. For n = 100, the right panel of Figure 1 shows five realizations of the Pearson correlation between s and sˆh , for each value of σ = 0.5, 1, . . . , 3. Papers in this field usually perform more elaborate simulation studies and get empirical estimates of mean and standard error of C. In particular, a concave curve of the estimated mean as a function of σ can then be drawn, producing stochastic resonance. That there is an optimal noise level already emerges in our picture, however: for σ = 1, the five values of C are all close to one; for smaller and larger σ, the correlation is not always high. For a comparison, we performed the same simulation but with n = 1000 time points, getting values C close to one throughout. This observation illustrates well a phenomenon first reported by Collins et al. (1995) in a different setting, namely “stochastic resonance without tuning”: since the variance of the signal estimator and hence the variance of C decreases with n, the correlation is high for a broad range of σ’s. Although there is stochastic resonance, i.e., an optimal level of noise, C cannot detect it if the time points are too densely spaced. For threshold data in the nonparametric regression model considered here, a stochastic resonance effect analogous to that shown by simulation in the literature would imply that the asymptotic mean squared and the asymptotic mean average squared error are convex as functions of the standard deviation of the noise. We do not expect this behaviour for all signals or for all error distributions. For the example with the sinusoid from above, stochastic resonance is easily verified. In Figure 2, we plotted the asymptotic mean average squared error, with the optimal bandwidth, as a function of the noise level σ. Here, we used the “tuned” version, i.e., inf h {AM ASE(σ, h)} multiplied by the convergence rate n4/5 . Since the sums appearing in the formula approximate integrals, the same convex curve is produced for all n sufficiently large. In particular, a sharp optimal noise level, σ = 1.08, can be derived which we have already used for the simulation in Figure 1.

Figure 2. Plot of n4/5 inf h AM ASE(σ, h) for the sinusoid example, s(t) = sin(2πt), with n sufficiently large (here n = 1000). In general, a proof is not straightforward, not even if we restrict attention to well-behaved unimodal distributions such as the normal N (0, σ 2 ) distribution. The 10

function p of Section 2 is then expressed as p(t) = Φ[{s(t) − a}/σ], in terms of Φ, the distribution function of the standard normal N (0, 1) with density φ. As for the asymptotic mean squared error, it is equal to o n o n s(t)−a 2 2 Φ a−s(t) σ σ 1 σ Φ h4 a − s(t) 0 2 00 n o R(K) + s (t) + s (t) µ2 (K)2 . 2 s(t)−a nh 4 σ 2 φ σ

The first term of the sum corresponds to the inverse Fisher information Isa in Greenwood et al. (1999) and shows the typical stochastic resonance behaviour: it tends to infinity when σ 2 → ∞ or σ 2 → 0 (cf. Greenwood et al. 1999). The second term, however, varies like 1/σ 4 . The behaviour of the sum requires further analysis. The stochastic resonance behaviour of AM SE and further aspects will be investigated numerically and by simulations in a forthcoming paper of M¨ uller & Ward (1999). ACKNOWLEDGMENTS This work was supported by the Natural Sciences and Engineering Research Council of Canada, and the Crisis Points Group of the Peter Wall Institute for Advanced Studies at the University of British Columbia. I thank P. E. Greenwood, N. E. Heckman, L. M. Ward and W. Wefelmeyer for several helpful discussions on the theory and applications of nonparametric regression and stochastic resonance. REFERENCES Benzi, R., Sutera, A., & Vulpiani, A. (1981). The mechanism of stochastic resonance. J. Phys. A: Math. Gen., 14, L453–457. Chialvo, D. R., Longtin, A., & M¨ uller-Gerking, J. (1997a). Stochastic resonance in models of neuron ensembles. Phys. Rev. E (3), 55, 1798–1808. Chialvo, D. R., Longtin, A., & M¨ uller-Gerking, J. (1997b). Stochastic resonance in models of neuron ensembles revisited. Phys. Rev. E (3), in press. Collins, J. J., Chow, C. C., Capela, A. C., & Imhoff, T. T. (1996). Aperiodic stochastic resonance. Phys. Rev. E (3), 54, 5575–5584. Collins, J. J., Chow, C. C., & Imhoff, T. T. (1995). Stochastic resonance without tuning. Nature, 376, 236–239. Douglass, J. K., Wilkens L., Pantazelou E., & Moss, F. (1993). Noise enhancement of information transfer in crayfish mechanoreceptors by stochastic resonance. Nature, 365, 337–340. Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York. Gammaitoni, L., H¨ anggi, P., Jung, P., & Marchesoni, F. (1998). Stochastic resonance. Rev. Modern Phys., 70, 223–287. Gasser, T., M¨ uller, H.-G., & Mammitzsch, V. (1985). Kernels for nonparametric curve estimation. J. Roy. Stat. Soc. Ser. B, 47, 238–252. Greenwood, P. E., Ward, L. M., & Wefelmeyer, W. (1999). Statistical analysis of stochastic resonance in a simple setting. Phys. Rev. E, in press. H¨ ardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cambridge. H¨ ardle, W., & Marron, J. S. (1995). Fast and simple scatterplot smoothing. Comput. Statist. Data Anal., 20, 1–17.

11

Henegan, C., Chow, C. C., Collins, J. J., Imhoff, T. T., Lowen, S. B., & Teich, M. C. (1996). Information measures quantifying aperiodic stochastic resonance. Phys. Rev. E (3), 54, R2228–R2231. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bull. Math. Biophys., 5, 115–133. M¨ uller, U. U. (1999). Nonparametric regression for threshold data. Univ. Bremen, Mathematik–Arbeitspapiere A, 52. M¨ uller, U. U., & Ward, L. M. (1999). Stochastic resonance in a statistical model of a time-integrating detector. Unpublished manuscript. Ruppert, D., Sheather S. J., & Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc., 90, 1257–1270. Stemmler, M. (1996). A single spike suffices: The simplest form of stochastic resonance in model neurons. Network: Computations in Neural Systems, 7, 687–716. Wand, M. P., & Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London. Wiesenfeld, K., & Moss, F. (1995). Stochastic resonance and the benefits of noise: from ice ages to crayfish and SQUIDs. Nature, 373, 33–36.

Received 30 July 1998 Accepted 3 September 1999

Ursula U. M¨ uller Institut f¨ ur Statistik Fachbereich Mathematik/Informatik Universit¨ at Bremen 28334 Bremen Germany e-mail: [email protected]

12