Prediction Measures in Nonlinear Beta Regression Models

0 downloads 0 Views 697KB Size Report
May 22, 2017 - criteria P2 and the pseudo-R2 statistics are provided. ... vide an overview of varying dispersion beta regression modeling using the betareg ...
Prediction Measures in Nonlinear Beta Regression Models Patr´ıcia Leone Espinheira1

Luana C. Meireles da Silva1

Alisson de Oliveira Silva1

arXiv:1705.07941v1 [stat.ME] 22 May 2017

Raydonal Ospina1,# 1

Departamento de Estat´ıstica, Universidade Federal de Pernambuco, Cidade Universit´aria, Recife/PE, 50740–540, Brazil. # E-mail: [email protected] Abstract

Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely P 2 statistics. We aim at using both prediction coefficients and goodness-of-fit measures as a scheme of model select criteria. In this sense, we introduce for beta regression models under nonlinearity the use of the model selection criteria based on robust pseudo-R2 statistics. Monte Carlo simulation results on the finite sample behavior of both prediction-based model selection criteria P 2 and the pseudo-R2 statistics are provided. Three applications for real data are presented. The linear application relates to the distribution of natural gas for home usage in S˜ao Paulo, Brazil. Faced with the economic risk of too overestimate or to underestimate the distribution of gas has been necessary to construct prediction limits and to select the best predicted and fitted model to construct best prediction limits it is the aim of the first application. Additionally, the two nonlinear applications presented also highlight the importance of considering both goodness-of-predictive and goodness-of-fit of the competitive models. Keywords: Nonlinear beta regression; PRESS; prediction coefficient; pseudo-R2 , power prediction.

1

Introduction

Ferrari and Cribari-Neto (2004) introduced a regression model in which the response is beta-distributed, its mean being related to a linear predictor through a link function. The linear predictor includes independent variables and regression parameters. Their model also includes a precision parameter whose reciprocal can be viewed as a dispersion measure. In the standard formulation of the beta regression model it is assumed that the precision is constant across observations. However, in many practical situations this assumption does not hold. Smithson and Verkuilen (2006) consider a beta regression specification in which dispersion is not constant, but is a function of covariates and unknown parameters and (Simas et al., 2010) introduces the class of nonlinear beta regression models. Parameter estimation is carried out by maximum likelihood (ML) and standard asymptotic hypothesis testing can be easily 1

performed. Practitioners can use the betareg package, which is available for the R statistical software (http://www.r-project.org), for fitting beta regressions. Cribari-Neto and Zeileis (2010) provide an overview of varying dispersion beta regression modeling using the betareg package. Diagnostic tools and improve ML estimation were accomplished in Espinheira et al. (2008a,b); Ospina et al. (2006); Chien (2011) and others. Inference for beta regression also have been developed in an Bayesian context (Figuero-Z´ un ˜ iga et al. (2013); Brascum et al. (2007) and Cepeda-Cuervo and Gamerman (2005).) Recently Espinheira et al. (2014) build and evaluated bootstrap-based prediction intervals for the class of beta regression models with varying dispersion. However, a prior approach it is necessary, namely: the selection of the model with the best predictive ability, regardless of the goodness-offit. Indeed, the model selection is a crucial step in data analysis, since all inferential performance is based on the selected model. Bayer and Cribari-Neto (2017) evaluated the performance of different model selection criteria in samples of finite size in a beta regression model, such as Akaike Information Criterion (AIC) (Akaike, 1973), Schwarz Bayesian Criterion (SBC) (Schwarz, 1978) and various approaches based on pseudo-R2 such as the coefficient of determination adjusted RF2 C proposed by 2 Ferrari and Cribari-Neto (2004) and the version based on log-likelihood functions, namely by RLR . Indeed, the authors proposed two new model selection criteria and a fast two step model selection scheme considering both mean and dispersion submodels, for beta regression models with varying dispersion. However, these methods do not offer any insight about the quality of the predictive values in agreement with the findings of Spiess and Neumeyer (2010) for nonlinear models. In this context, Allen (1974), proposed the PRESS (Predictive Residual Sum of Squares) criterion, that can be used as a measure of the predictive power of a model. The PRESS statistic is independent from the goodness-of-fit of the model, since, that its calculation is made by leaving out the observations that the model is trying to predict (Palmer and O’Connell, 2009). The PRESS statistics can be viewed as a sum of squares of external residuals (Bartoli, 2009). Thus, similarly of the approach of R2 , Mediavilla et al. (2008) proposed a coefficient of prediction based on PRESS namely P 2. The P 2 statistic can be used to select models from a predictive perspective adding important information about the predictive ability of the model in various scenarios. Our chief goal in this paper is to propose versions of the PRESS statistics and the coefficients of prediction P 2 associated, for the linear and nonlinear beta regression models. As a second contribution, in 2 especial to beta regression under nonlinearity, we evaluate the behavior of RLR (Bayer and Cribari-Neto, 2017) 2 and RF C (Ferrari and Cribari-Neto, 2004) measures both when the model is correctly specified and when under model misspecification. The results of the simulations showed as the prediction coefficients can be useful in detecting misspecifications, or indicate difficulties on to estimate beta regression models when the data are close to the boundaries of the standard unit interval. Finally, the real data applications are the last and important contribution. Here we provide guidance for researchers in choosing and interpreting the measures proposed. In fact, based on these applications we can shown how it is important to consider both coefficients of prediction and coefficients of determination to build models more useful to describe the data.

2

The P 2 statistic measure

Consider the linear model, Y = Xβ + ε where Y is a vector n × 1 of responses, X is a known matrix of covariates of dimension n × p of full rank, β is the parameter vector of dimension p × 1 and ε is a vector b n × 1 of errors. We have the least squares estimators: βb = (X ⊤ X)−1 X ⊤ y, the residual et = yt − x⊤ t β and ⊤ b b the predicted value ybt = x⊤ t β, where xt = (xt1 , . . . , xtp ), and t = 1, . . . , n. Let β(t) be the least squares b estimate of β without the tth observation and yb(t) = x⊤ t β(t) be the predicted value of the case deleted, 2

such that e(t) = yt − yb(t) is the prediction error or external residual. Thus, for multiple regression, the classis statistic n n X X P RESS = e2(t) = (yt − yb(t) )2 , (1) Pn

t=1

t=1

which can be rewritten as P RESS = t=1 (yt − ybt ) /(1 − htt )2 , where htt is the tth diagonal element of the projection matrix X(X ⊤ X)−1 X ⊤ . Now, let y1 , . . . , yn be independent random variables such that each yt , for t = 1, . . . , n, is beta distributed, beta-distributed, denoted by yt ∼ B(µt , φt ), i.e., each yt has density function given by f (yt ; µt , φt ) =

2

Γ(φt ) ytµt φt −1 (1 − yt )(1−µt )φt −1 , Γ(µt φt )Γ((1 − µt )φt )

0 < yt < 1,

(2)

where 0 < µt < 1 and φt > 0. Here, E(yt ) = µt and Var(yt ) = V (µt )/(1 + φt ), where V (µt ) = µt (1 − µt ). Simas et al. (2010) proposed the class of nonlinear beta regression models in which the mean of yt and the precision parameter can be written as ⊤ g(µt) = η1t = f1 (x⊤ t ; β) and h(φt ) = η2t = f2 (zt , γ),

(3)

where β = (β1 , . . . , βk )⊤ and γ = (γ1 , . . . , γq )⊤ are, respectively, k × 1 and q × 1 vectors of unknown parameters (β ∈ IRk ; γ ∈ IRq ), η1t and η2t are the nonlinear predictors, x⊤ t = (xt1 , . . . , xtk1 ) and zt⊤ = (zt1 , . . . , ztq1 ) are vectors of covariates (i.e., vectors of independent variables), t = 1, . . . , n, k1 ≤ k, q1 ≤ q and k + q < n. Both g(·) and h(·) are strictly monotonic and twice differentiable link functions. Furthermore, fi (·), i = 1, 2, are differentiable and continous functions, such that the matrices J1 = ∂η1 /∂β and J2 = ∂η2 /∂γ have full rank (their ranks are equal to k and q, respectively). The parameters that index the model can be estimated by maximum likelihood (ML). In the Appendix, we present the log-likelihood function, the score vector and Fisher’s information matrix for the nonlinear beta regression model. In the nonlinear beta regression model, the ML estimator βb can be viewed as the least squares estimator of β (see Appendix) obtained by regressing b 1/2 W b 1/2 u1 on Jˇ1 = Φ b 1/2 W b 1/2 J1 , yˇ = Φ

(4)

with Φ = diag(φ1 , . . . , φn ), J1 = ∂η1 /∂β. Here, matrix W and u1 are given in (15)–(17) in the 1/2 1/2 1/2 1/2 ⊤ b ⊤ Appendix. Thus, the prediction error is yˇt − b yˇ(t) = φbt w bt u1,t − φbt w bt J1t β(t) , in which J1t is b the tth row of the J1 matrix. Using the ideas proposed by (Pregibon, 1981) we have that β(t) = 1/2 bW b J1 )−1 J1t φb1/2 βb−{(J1⊤ Φ bt rtβ }/(1 − h∗tt ), where rtβ is the weighted 1 residual (Espinheira et al., 2008a) t w defined as rtβ =

yt∗ − µ b∗ √ t, vt b

(5)

where, yt∗ = log{yt /(1 − yt )}, µ∗t = ψ(µt φt ) − ψ((1 − µt )φt ) and vt is given in 15 in the Appendix. Hence, we can write yˇt − yˆˇ(t) = rtβ /(1 − h∗tt ), where h∗tt is the tth diagonal element of projection matrix b Φ) b 1/2 J1 (J1 Φ bW b J1 )−1 J ⊤ (Φ bW b )1/2 . H ∗ = (W 1 3

Finally, for the nonlinear beta regressions models the classic PRESS statistic based on (1) is given by P RESS =

n X t=1

(ˇ yt − yˆˇ(t) )2 =

n X t=1

rtβ 1 − h∗tt

!2

.

(6)

Note that the tth observation in (6) is not used in fitting the regression model to predict yt , then both the external predicted values yˆ(t) and the external residuals e(t) are independent of yt . This fact enables the PRESS statistic to be a true assessment of the prediction capabilities of the regression model regardless of the overall model fit quality. Additionally, when the predictors in (3) are linear functions ⊤ of the parameters, i.e., g(µt) = x⊤ t β and h(φt ) = zt γ, the expression in (6) also represent the PRESS statistic for a class of linear beta regression models with p = k + q unknown regression parameters. Considering the same approach to construct the determination coefficient R2 for linear models, we can think in a prediction coefficient based on PRESS, namely P2 = 1 −

P RESS , SST(t)

(7)

P where SST(t) = nt=1 (yt − y¯(t) )2 and y¯(t) is the arithmetic average of the y(t) , t = 1, . . . , n.. It can be shown that SST(t) = (n/(n − p))2 SST , wherein p is the number of model parameters and SST is the Total Sum with varying dispersion, P of Squares 2for the full data. For a class of beta regressions models 1/2 1/2 SST = nt=1 (ˇ yt − y¯ˇ) , y¯ˇ is the is the arithmetic average of the yˇt = φbt w bt u1,t , t = 1, . . . , n. It is noteworthy that the measures R2 and P 2 are distinct, since that the R2 propose to measure the model fit quality and the P 2 measure the predictive power of the model. Cook and Weisberg (1982) suggest other versions of PRESS statistic based on different choices of residuals. Thus, we present another version of PRESS statistic and P 2 measure by considering the combined residual proposed by Espinheira et al. (2017). In this way, P RESSβγ =

n X t=1

βγ rp,t 1 − h∗tt

!2

2 and Pβγ =1−

P RESSβγ , SST(t)

(8)

respectively, where rtβγ =

(yt∗ − µb∗t ) + abt q , b ζt

at = µt (yt∗ − µ∗t ) + log(1 − yt ) − ψ((1 − µt )φt ) + ψ(φt )

(9)

and ζt = (1 + µt )2 ψ ′ (µt φt ) + µ2t ψ ′ ((1 − µt )φt ) − ψ ′ (φt ). 2 Note that, P 2 and Pβγ given in (7) and (8), respectively, are not positive quantifiers. In fact, the 2 PRESS/SST(t) is a positive quantity, thus the P 2 and the Pβγ are measures that take values in (−∞; 1]. The closer to one the better is the predictive power of the model. In order to check the model goodness-of-fit with linear or nonlinear predictors for a class of beta regression, we evaluate the RF2 C defined as the square of the sample coefficient of correlation between g(y) and ηb1 (Ferrari and Cribari-Neto, 2004), and its penalized version based on Bayer and Cribari-Neto (2017) given by RF2 Cc = 1 − (1 − RF2 C )(n − 1)/(n − (k1 + q1 )), where k1 and q1 are, respectively, the number of covariates of the mean submodel and dispersion submodel. We also evaluate two version of pseudo-R2 based on likelihood ratio. The first one proposed by 2 Nagelkerke (1991): RLR = 1 − (Lnull /Lf it )2/n , where Lnull is the maximum likelihood achievable (satu-

4

rated model) and Lf it is the achieved by the model under investigation. The second one is a proposal of Bayer and Cribari-Neto (2017) that takes account the inclusion of covariates both in the mean submodel and in the precision submodel, given by: 2 RLR c

= 1 − (1 −

2 RLR )



n−1 n − (1 + α)k1 − (1 − α)q1



,

where α ∈ [0, 1] and δ > 0. Based on simulation results obtained in Bayer and Cribari-Neto (2017) we 2 choose in this work the values α = 0.4 and δ = 1. Therefore, penalized versions of P 2 and Pβγ are 2 2 = 1 − (1 − P )(n − 1)/(n − respectively now given by: Pc2 = 1 − (1 − P 2 )(n − 1)/(n − (k1 + q1 )) and Pβγ βγ c (k1 + q1 )).

3

Simulation

In this section we simulate several different data generating processes to evaluate the performance of the predictive measures. The Monte Carlo experiments were carried out using both fixed and varying dispersion beta regressions as data generating processes. All results are based on 10,000 Monte Carlo replications. Linear models: Table 1 shows the mean values of the predictive statistics obtained by simulation of fixed dispersion beta regression model that involves a systematic component for the mean given by   µt log = β1 + β2 xt2 + β3 xt3 + β4 xt4 + β5 xt5 , t = 1, . . . , n, (10) 1 − µt The covariate values were independently obtained as random draws of the following distributions: Xti ∼ U(0, 1), i = 2, . . . , 5 and were kept fixed throughout the experiment. The precisions, the sample sizes and the range of mean response are, respectively, φ = (20, 50, 148, 400), n = (40, 80, 120, 400), µ ∈ (0.005, 0.12), µ ∈ (0.90, 0.99) and µ ∈ (0.20, 0.88). Under the model specification given in (10) we investigate the performances of the statistics by omitting covariates. In this case, we considered the Scenarios 1, 2 and 3, in which are omitted, three, two and one covariate, respectively. In a fourth scenario the estimated model is correctly specified (true model). The results in Table 1 show that the mean values of all statistics increase as covariates are included in the model and the value of φ increases. On the other hand, as the size of the sample increases, the misspecification of the model is evidenced by lower values of the statistics (Scenarios 1, 2 and 3). It shall be noted that the values of all statistics are considerably larger when µ ∈ (0.20, 0.88). Additionally, its values approaching one when the estimated model is closest to the true model. For instance, in Scenario 2 4 for n = 40, φ = 150 the values of P 2 and RLR are, respectively, 0.936 and 0.947. The behavior of the statistics for finite samples changes substantially when µ ∈ (0.90; 0.99). It is noteworthy the reduction of its values, revealing the difficulty in fitting the model and make prediction when µ ≈ 1 (The log-likelihood of the model tends to no longer limited). Indeed, in this range of µ is more difficult to make prediction than to fit the model. For example, in Scenario 1, when three covariates 2 are omitted from the model, n = 40 and φ = 150 the P 2 value equals to 0.071, whereas the RLR value is 0.243. Similar results were obtained for n = 80, 120. Even under true specification (Scenario 4) the model predictive power is more affected than the model quality of fit by the fact of µ ≈ 1. For instance, 2 2 when n = 120 and φ = 50 we have Pβγ = 0.046 and RLR = 0.565. The same difficulty in obtaining

5

predictions and in fitting the regression model occurs when µ ∈ (0.005, 0.12). Once again the greatest difficulty lies on the predictive sense. 2 2 2 2 , RF2 C , RLR , RLR , Pβγ Figure 1 present the boxplots of the 10,000 replications of the statistics: Pβγ c c and RF2 C c when the model is correctly specified (scenario 4), n = 40 and φ = 150. In all boxplots the “side point” represents the mean value of the replications of the statistics. In the panel (a) we present the boxplots when µ ≈ 0. In the panel (b) we present the boxplots when µ is scattered on the standard unit interval and in the panel (c) we present the boxplots for µ ≈ 0. This figure shows that the means and the medians of all statistics are close. We also can notice based on the Figure 1 that both prediction power and goodness-of-fit of the model are affected when µ is close to the boundaries of the standard unit interval. However, it is noteworthy the great difficult to make prediction. Additionally, is possible to notice that the versions of R2 displays similar behavior. In it follows we shall investigate the empirical distributions behaviour of the statistics proposed. Correct specification

Correct specification

µ ∈ (0.01,0.20)

µ ∈ (0.20,0.88)

µ ∈ (0.80,0.99)

0.5 0.0 −0.5

0.86

−0.5

0.88

0.90

0.0

0.92

0.94

0.5

0.96

0.98

Correct specification

P 2βγ

P 2βγc

R L2R

R L2R c

(a)

R F2C

R F2C c

P 2βγ

P 2βγc

R L2R

R L2R c

(b)

R F2C

R F2C c

P 2βγ

P 2βγc

R L2R

R L2R c

R F2C

R F2C c

(c)

Figure 1: Model estimated correctly: g(µt ) = β1 +β2 xt2 +β3 xt3 +β4 xt4 +β5 xt5 . µ ∈ (0.20, 0.88); β = (−1.9, 1.2, 1.0, 1.1, 1.3)⊤; µ ∈ (0.90, 0.99); β = (1.8, 1.2, 1, 1.1, 0.9)⊤; µ ∈ (0.005, 0.12); β = (−1.5, −1.2, −1.0, −1.1, −1.3)⊤ . In Figures 2 and 3 we consider µ ∈ (0.20, 0.88). In Figure 2 the model is estimated correctly, n = 40 and φ = (20, 50, 150, 400). We notice that the prediction power distortions increase as the precision parameter increases, as expected. In Figure 3 we consider a misspecification problem (three omitted covariates). For illustration, we consider only φ = 50 and n = 40, 80, 120, 400. It is important to notice that here the performance of the prediction measures does not deteriorate when the sample size is increased. Based on these figures we can reach some conclusions. First, the model precision affects both its predict power and goodness-of-fit. Second, for this range of µ the performance of the statistics are similar revealing the correct specification of the model (Figure 2). Third, when three covariates are omitted, with the increasing of sample size the replications values of the statistics tend being concentrated at small values due to the misspecification problem (Figure 3). In what follows, we shall report simulation results on the finite-sample performance of the statistics when the dispersion modeling is neglected. To that end, the true data generating process considers varying dispersion, but a fixed dispersion beta regression is estimated. We also used different covariates in the mean and precision submodels. The samples sizes are n = 40, 80, 120. We generated 20 values for each covariate and replicated them to get covariate values for the three sample sizes (once, twice and three times, respectively).

6

Table 1: Values of the statistics. True model versus misspecification models (omitted covariates (Scenarios 1, 2 and 3)).

Estimated

Scenario 1

Scenario 2

Scenario 3

Scenario 4

g(µt ) = β1 + β2 xt2

g(µt ) = β1 + β2 xt2 +β3 xt3

g(µt ) = β1 + β2 xt2 +β3 xt3 + β4 xt4

g(µt ) = β1 + β2 xt2 + β3 xt3 + β4 xt4 + β5 xt5

model

µ ∈ (0.20, 0.88); n

φ→

20

50

150

0.393

0.463

0.502

0.506

0.602

0.656

0.694

0.847

0.936

0.342

0.418

0.461

0.450

0.557

0.617

0.649

0.825

0.927

2 Pβγ 2 Pβγ

c

c

2 Pβγ 2 Pβγ

c

0.304

0.357

0.385

0.402

0.473

0.512

0.512

0.609

0.663

0.694

0.847

0.936

0.267

0.322

0.352

0.352

0.429

0.472

0.456

0.564

0.625

0.649

0.825

0.927

0.296

0.358

0.391

0.394

0.473

0.515

0.518

0.620

0.675

0.723

0.869

0.947

0.258

0.324

0.358

0.344

0.429

0.475

0.463

0.577

0.638

0.682

0.849

0.939

0.286

0.346

0.379

0.368

0.445

0.488

0.506

0.584

0.643

0.666

0.833

0.930

0.267

0.329

0.363

0.343

0.423

0.468

0.450

0.561

0.624

0.643

0.821

0.925

0.282

0.339

0.370

0.377

0.455

0.498

0.512

0.590

0.650

0.666

0.833

0.930

0.264

0.322

0.353

0.353

0.433

0.478

0.456

0.569

0.632

0.643

0.821

0.925

R2 LR

0.291

0.356

0.391

0.385

0.468

0.513

0.518

0.614

0.672

0.706

0.860

0.943

R2 LRc 2

0.273

0.339

0.375

0.361

0.447

0.494

0.463

0.593

0.655

0.686

0.851

0.939

P

0.279

0.340

0.374

0.360

0.439

0.483

0.469

0.578

0.639

0.656

0.833

0.928

Pc2

0.267

0.329

0.363

0.343

0.424

0.470

0.450

0.563

0.626

0.641

0.821

0.925

2 Pβγ 2 Pβγ

c

c

φ→

0.275

0.333

0.365

0.369

0.449

0.493

0.475

0.585

0.646

0.656

0.833

0.928

0.263

0.322

0.354

0.353

0.435

0.480

0.457

0.570

0.634

0.641

0.821

0.925

0.290

0.355

0.390

0.382

0.467

0.513

0.501

0.612

0.671

0.700

0.860

0.942

0.278

0.344

0.380

0.366

0.453

0.500

0.483

0.598

0.660

0.686

0.851

0.939

50

150

20

50

150 0.213

20

50

150

20

2

0.119

0.061

0.071

0.139

Pc2

0.071

0.010

0.021

0.067

2 Pβγ

0.119

0.061

0.071

0.139

P

2 Pβγ

c

R2 LR R2 LR P2

c

Pc2 2 Pβγ 2 Pβγ

c

50

150

20

0.062

0.072

0.171

0.072

0.156

0.149

0.089

−0.016

−0.006

0.076

−0.034

0.059

0.023

−0.045

0.097

0.062

0.071

0.171

0.072

0.155

0.148

0.089

0.213

0.072

0.010

0.020

0.068

−0.016

−0.007

0.076

−0.034

0.058

0.022

−0.045

0.097

0.164

0.196

0.243

0.221

0.266

0.336

0.271

0.374

0.466

0.444

0.593

0.774

0.119

0.153

0.203

0.157

0.205

0.281

0.188

0.303

0.405

0.362

0.533

0.741

0.093

0.036

0.044

0.112

0.038

0.046

0.149

0.045

0.120

0.123

0.056

0.175

0.070

0.011

0.019

0.077

0.000

0.008

0.103

−0.006

0.073

0.063

−0.007

0.119

0.094

0.036

0.043

0.113

0.038

0.045

0.149

0.045

0.119

0.122

0.056

0.175

0.070

0.011

0.018

0.078

0.000

0.008

0.104

−0.006

0.072

0.063

−0.008

0.119

0.158

0.190

0.240

0.211

0.253

0.327

0.268

0.356

0.451

0.416

0.571

0.760

R2 LRc 2

0.136

0.169

0.221

0.180

0.224

0.301

0.229

0.321

0.422

0.376

0.542

0.744

P

0.085

0.028

0.035

0.102

0.030

0.038

0.141

0.036

0.107

0.114

0.046

0.162

Pc2

0.069

0.012

0.018

0.079

0.005

0.013

0.111

0.002

0.076

0.075

0.004

0.125

2 Pβγ 2 Pβγ

c

R2 LR

c

0.085

0.028

0.034

0.103

0.030

0.037

0.141

0.036

0.107

0.113

0.046

0.162

0.069

0.012

0.018

0.080

0.005

0.012

0.112

0.002

0.076

0.075

0.004

0.125

0.156

0.188

0.239

0.207

0.249

0.324

0.268

0.349

0.447

0.406

0.565

0.756

0.142

0.175

0.226

0.186

0.230

0.306

0.242

0.327

0.428

0.380

0.545

0.745

50

150 0.212

µ ∈ (0.005, 0.12); φ→

50

150

20

0.128

0.063

0.056

0.108

Pc2

0.081

0.013

0.005

0.033

2 Pβγ

0.128

0.063

0.057

0.107

2 Pβγ

c

R2 LR R2 LR P2

c

Pc2

120

20

2

P

80

β = (1.8, 1.2, 1, 1.1, 0.9)⊤ .

R2 LR

R2 LR

40

150

0.393

µ ∈ (0.90, 0.99);

n

50

0.361

R2 LR

120

20

0.363

R2 LR

80

150

0.329

P2

40

50

0.307

Pc2

n

20

0.270

R2 LR

120

150

Pc2

R2 LR

80

50

2

P

40

20

β = (−1.9, 1.2, 1.0, 1.1, 1.3)⊤ .

2 Pβγ 2 Pβγ

c

β = (−1.5, −1.2, −1.0, −1.1, −1.3)⊤ . 50

150

20

50

150

20

0.059

0.028

−0.020

−0.053

0.153

0.070

0.202

0.149

0.090

0.056

−0.036

0.111

0.023

−0.044

0.055

0.026

0.096

0.153

0.071

0.203

0.150

0.090

0.212

0.081

0.013

0.006

0.032

−0.023

−0.055

0.056

−0.036

0.112

0.025

−0.044

0.097

0.199

0.215

0.254

0.265

0.349

0.379

0.326

0.415

0.548

0.442

0.595

0.774

0.156

0.172

0.214

0.204

0.295

0.327

0.249

0.348

0.496

0.360

0.535

0.741

0.105

0.040

0.032

0.083

0.043

0.012

0.128

0.038

0.165

0.123

0.057

0.174

0.081

0.015

0.006

0.047

0.005

−0.027

0.081

−0.013

0.121

0.064

−0.007

0.119

0.104

0.040

0.032

0.081

0.039

0.010

0.128

0.038

0.166

0.124

0.057

0.175

0.081

0.015

0.007

0.045

0.001

−0.030

0.081

−0.013

0.121

0.065

−0.007

0.119

R2 LR

0.197

0.211

0.251

0.253

0.340

0.372

0.311

0.394

0.534

0.416

0.572

0.760

R2 LRc 2

0.176

0.191

0.231

0.223

0.314

0.347

0.274

0.362

0.509

0.376

0.543

0.743

P

0.097

0.033

0.024

0.074

0.037

0.006

0.118

0.028

0.153

0.114

0.046

0.162

Pc2

0.081

0.016

0.007

0.050

0.012

−0.020

0.088

-0.006

0.123

0.075

0.004

0.125

2 Pβγ

0.096

0.032

0.024

0.072

0.034

0.004

0.118

0.028

0.153

0.115

0.046

0.162

0.081

0.016

0.007

0.048

0.009

−0.022

0.087

−0.006

0.124

0.076

0.004

0.125

0.195

0.209

0.250

0.247

0.337

0.370

0.304

0.388

0.530

0.407

0.565

0.755

0.181

0.196

0.237

0.228

0.320

0.354

0.280

0.367

0.513

0.381

0.546

0.744

2 Pβγ

c

R2 LR R2 LR

c

7

φ = 50

0.8 0.4

0.6

0.8 0.6 0.4

µ ∈ (0.20,0.88), n = 40

1.0

Correct specification

φ = 20 1.0

Correct specification

P 2βγ (a)

P2

R 2L R

P 2βγ (b) Correct specification

φ = 150

φ = 400

R 2L R

0.8 0.6 0.4

0.4

0.6

0.8

1.0

Correct specification 1.0

P2

P2

P 2βγ (c)

P2

R 2L R

P 2βγ (d)

R 2L R

Figure 2: Model estimated correctly: g(µt ) = β1 +β2 xt2 +β3 xt3 +β4 xt4 +β5 xt5 . µ ∈ (0.20, 0.88); (−1.9, 1.2, 1.0, 1.1, 1.3)⊤.

β=

This was done so that the intensity degree of nonconstant dispersion max {φt } φmax t=1,...,n = , λ= φmin min {φt }

(11)

t=1,...,n

would remain constant as the sample size changes. The numerical results were obtained using the following beta regression model g(µt ) = log(µt /(1 − µt )) = β1 + βi xti , and log(φt ) = γ1 + γi zti , xti ∼ U(0, 1), zti ∼ U(−0.5, 0.5), i = 2, 3, 4, 5, and t = 1, . . . , n under different choices of parameters (Scenarios): Scenario 5: β = (−1.3, 3.2)⊤ , µ ∈ (0.22, 0.87), [γ = (3.5, 3.0)⊤; λ ≈ 20], [γ = (3.5, 4.0)⊤ ; λ ≈ 50] and [γ = (3.5, 5.0)⊤ ; λ ≈ 150]. Scenario 6: β = (−1.9, 1.2, 1.6, 2.0)⊤ , µ ∈ (0.24, 0.88), [γ = (2.4, 1.2, −1.7, 1.0)⊤; λ ≈ 20], [γ = (2.9, 2.0, −1.7, 2.0)⊤; λ ≈ 50] and [γ = (2.9, 2.0, −1.7, 2.8)⊤; λ ≈ 100]. Scenarios 7 and 8 (Full models): β = (−1.9, 1.2, 1.0, 1.1, 1.3)⊤, µ ∈ (0.20, 0.88), [γ = (3.2, 2.5, −1.1, 1.9, 2.2)⊤; λ ≈ 20], [γ = (3.2, 2.5, −1.1, 1.9, 3.2)⊤; λ ≈ 50], and [γ = (3.2, 2.5, 1.1, 1.9, 4.0)⊤; λ ≈ 100]. All results were obtained using 10,000 replics Monte Carlo replications. Table 2 contain the values of the statistics. We notice that for each scenario the prediction power measure not present high distortion when we increase intensity degree of nonconstant dispersion. However, in the case of misspecification the statistics display smaller values in comparison with Scenario 8 (True specification), in which as greater is λ greater are the values of the statistics, as expected. Other important impression lies in the fact that the values of the RF2 C are considerably smaller than the values of the others statistics, in special when λ increases. That is a strong evidence that the RF2 C does not perform well under nonconstant dispersion models. 8

0.4 0.3 0.2

0.3

0.4

0.5

n = 80

0.5

Incorrect specification

n = 40

0.2

µ ∈ (0.20,0.88), φ = 50

Incorrect specification

P 2βγ (a)

P2

P 2βγ (b)

P2

R 2L R

R 2L R

n = 400

0.4 0.3 0.2

0.2

0.3

0.4

0.5

Incorrect specification

n = 120 0.5

Incorrect specification

P 2βγ (c)

P2

P 2βγ (d)

P2

R 2L R

R 2L R

Figure 3: Omitted covariates. Estimated model: g(µt) = β1 + β2 xt2 . Correct model: g(µt) = β1 + β2 xt2 + β3 xt3 + β4 xt4 + β5 xt5 ; µ ∈ (0.20, 0.88); β = (−1.9, 1.2, 1.0, 1.1, 1.3)⊤ In fact, under nonconstant dispersion models the better performances are of the P 2 statistics, both in identifying wrong and correct specifications. Table 2: Values of the statistics. Misspecified models, φ fixed: Scenarios 5, 6 and 7 versus Scenario 8 (correct specification). Scenario 5

Scenario 6

Scenario 7

Scenario 8

g(µt ) = β1 + β2 xt2

g(µt ) = β1 + β2 xt2 +

g(µt ) = β1 + β2 xt2 +

g(µt ) = β1 + β2 xt2 +

True

+β3 xt3

β3 xt3 + β4 xt4

β3 xt3 + β4 xt4 + β5 xt5

β3 xt3 + β4 xt4 + β5 xt5

models

h(φt ) = γ1 + γ2 zt2

h(φt ) = γ1 + γ2 zt2 +

h(φt ) = γ1 + γ2 zt2 +

h(φt ) = γ1 + γ2 zt2 +

+γ3 zt3

+γ3 zt3 + γ4 zt4

γ3 zt3 + γ4 zt4 + γ5 zt5

γ3 zt3 + γ4 zt4 + γ5 zt5

g(µt ) = β1 + β2 xt2

g(µt ) = β1 + β2 xt2 +

g(µt ) = β1 + β2 xt2 +

g(µt ) = β1 + β2 xt2 +

+β3 xt3

+β3 xt3 + β4 xt4

β3 xt3 + β4 xt4 + β5 xt5

β3 xt3 + β4 xt4 + β5 xt5

Estimated models

h(φt ) = γ1 + γ2 zt2 + γ3 zt3 + γ4 zt4 + γ5 zt5

λ→

20

50

100

20

50

100

20

50

100

20

50

100

P2

0.759

0.718

0.674

0.545

0.565

0.523

0.638

0.624

0.529

0.885

0.906

0.914

Pc2

0.739

0.695

0.647

0.493

0.515

0.469

0.585

0.569

0.460

0.851

0.878

0.888

2 Pβγ

0.758

0.716

0.671

0.546

0.567

0.528

0.637

0.624

0.530

0.885

0.906

0.913

0.738

0.693

0.643

0.494

0.517

0.474

0.584

0.568

0.460

0.851

0.878

0.888

0.782

0.743

0.700

0.580

0.611

0.577

0.670

0.653

0.554

0.796

0.816

0.840

0.764

0.722

0.675

0.532

0.567

0.529

0.622

0.602

0.488

0.735

0.761

0.792

2 Pβγ

c

R2 LR R2 LR

c

Figure 4 summarizes the predictive power performance for each measure. The graphs show that the mean and median performance of RF2 C and RF2 Cc are significantly worse than the performance of the 9

λ = 100

P 2βγ

P 2βγc

R L2R c

R L2R

(b) Correct specification

λ = 20

λ = 100

R F2C c

R F2C c

0.6

P 2βγ

(c)

R F2C

0.2

µ ∈ (0.20,0.88), n = 120 R F2C

R F2C c

−0.2

0.8 0.6

R L2R c

R L2R

R F2C

1.0

(a) Correct specification

0.4

P 2βγc

0.6

R F2C c

0.2

P 2βγ

0.2

µ ∈ (0.20,0.88), n = 40 R F2C

−0.2

0.8 0.6 0.4 0.2

R L2R c

R L2R

1.0

P 2βγc

0.0

µ ∈ (0.20,0.88), n = 120

P 2βγ

1.0

Correct specification

λ = 20 1.0

Correct specification

0.0

µ ∈ (0.20,0.88), n = 40

2 other measures. The comparison among the best measures indicate that the median of P 2 and RLR 2 performance are significantly better than the measures based on pseudo-R and besides reveal some asymmetry of the statistics when the intensity degree of nonconstant dispersion levels increasing. These findings hold in all observation scenarios.

P 2βγc

R L2R c

R L2R

(d)

Figure 4: True model: g(µt ) = β1 + β2 xt , h(φt ) = γ1 + γ2 zt .

Nonlinear models: In it follows we shall present Monte Carlo experiments for the class of nonlinear beta regression models. To that end we shall use the starting values scheme for the estimation by maximum likelihood proposed by Espinheira et al. (2017). The numerical results were obtained using the following beta regression model as data generating processes:   xt3 µt log = β1 + xβt22 + β3 log(xt3 − β4 ) + , t = 1, . . . , n 1 − µt β5 xt2 ∼ U(1, 2), xt3 ∼ U(4.5, 34.5) and φ were kept fixed throughout the experiment. The precisions and the sample sizes are φ = (20, 50, 150, 400), n = (20, 40, 60, 200, 400). Here, the vector of the parameters of the submodel of mean is β = (1.0, 1.9, −2.0, 3.4, 7.2)⊤ that produce approximately a range of values for the mean given by µ ∈ (0.36, 0.98). To evaluate the performances statistics on account of nonlinearity  of  µt negligence we consider the following model specification: log 1−µ = β1 + β2 xt2 + β3 xt3 . All results t are based on 10,000 Monte Carlo replications an for each replication, we generated the response values as yt ∼ B(µt , φt ), t = 1, . . . , n. Table 3 contains numerical results for the fixed dispersion beta regression model as data generating processes. Here, we compared the performances of the statistics both under incorrect specifications and under correct specification of the nonlinear beta regression model. The results presented in this table 10

2 reveal that the P 2 and Pβγ statistics outperform the R2 statistics in identifying more emphatically the misspecification. We must emphasize that the response mean is scattered on the standard unit interval. Thus, we should not have problems to make prediction and the smaller values of P 2 statistics in comparison with the values of the R2 statistics is due to the better performance of the statistics based on residuals in identifying misspecification problems. For example, fixing the precision on φ = 400, for 2 2 n = 20, we have values of P 2 , Pβγ , RLR and RF2 C equal to 0.576, 0.601, 0.700, 0.637, respectively. For n = 40 and n = 60 the values of the statistics are 0.568, 0.593, 0.698, 0.634 and 0.562, 0.588, 0.698, 0.633, respectively. We can also notice that the values of the penalized versions of the statistics tend to be greater as the sample size increasing, what it makes sense. Figure 5 summarizes the predictive power measure performance with boxplots over the Monte Carlo replics. The boxplots clearly show the statistical significance of the performance differences between the 2 measures. The outperformance of the P 2 and Pβγ statistics in identifying misspecification is more clear when we analyzed the plot. When the sample size increases, the distributions of the statistics based on residuals tend been concentrated in small values. For the other hand, the distributions of the R2 statistics tend been concentrated at the same values, considerably greater than the values of the P 2 and 2 Pβγ statistics. Figure 6 summarizes the empirical distribution behavior of predictive power measure when n = 60. 2 The graphs show that the median performance of RLR is significantly worse than the performance of the 2 2 P and Pβγ measures. However, under true specification the statistics perform equally well and as the precision of the model increases the values of the statistics tend being concentrated close to one. Also, we notice that the performance comparison among different levels of φ shows a systematic increase of the power performance.

Table 3: Values of the statistics. True model: g(µt) = β1 + xβt22 + β3 log(xt3 − β4 ) + xβt35 , xt2 ∼ U(1, 2), xt3 ∼ U(4.5, 34.5), β = (1.0, 1.9, −2.0, 3.4, 7.2)⊤, µ ∈ (0.36, 0.98), t = 1, . . . , n, φ fixed. Misspecification: : g(µt) = β1 + β2 xt2 + β3 xt3 (omitted nonlinearity). Estimated Model

With misspecification: g(µt ) = β1 + β2 xt2 + β3 xt3

n

20

φ→

20

50

Correctly

40 150

400

20

50

60 150

400

20

50

60 150

400

50

150

400

2

0.485

0.535

0.564

0.576

0.438

0.508

0.550

0.568

0.420

0.496

0.543

0.562

0.849

0.936

0.975

Pc2

0.388

0.448

0.483

0.497

0.391

0.467

0.513

0.532

0.388

0.469

0.518

0.539

0.835

0.930

0.973

2 Pβγ

0.502

0.556

0.588

0.601

0.456

0.531

0.575

0.593

0.439

0.520

0.568

0.588

0.849

0.936

0.975

2 Pβγ c

0.409

0.473

0.511

0.526

0.411

0.492

0.539

0.559

0.409

0.494

0.545

0.566

0.835

0.930

0.973

R2 LR

0.578

0.647

0.684

0.700

0.563

0.639

0.681

0.698

0.557

0.636

0.680

0.698

0.883

0.953

0.982

R2 LRc

0.499

0.581

0.625

0.643

0.526

0.608

0.654

0.673

0.533

0.616

0.662

0.681

0.863

0.945

0.979

P

R2 FC R2 RC

c

0.486

0.574

0.619

0.637

0.448

0.556

0.612

0.634

0.437

0.550

0.609

0.633

0.879

0.951

0.981

0.389

0.494

0.548

0.569

0.402

0.519

0.580

0.604

0.407

0.526

0.588

0.613

0.867

0.946

0.979

Nonlinearity on dispersion model: The last simulations consider two nonlinear submodels both to mean and dispersion, namely:   µt log = β1 + xβt 2 and log (φt ) = γ1 + ztγ2 . 1 − µt 11

0.8 0.7 0.6 0.5 0.4

0.5

0.6

0.7

0.8

n = 40

0.4

µ ∈ (0.36,0.98), φ = 150

n = 20

P 2βγ (a)

P2

P 2βγ (b)

P2

R 2L R

n = 60

R 2L R

0.8 0.7 0.6 0.5 0.4

0.4

0.5

0.6

0.7

0.8

n = 200

P 2βγ (c)

P2

P 2βγ (d)

P2

R 2L R

R 2L R

Figure 5: Misspecification: omitted nonlinearity. Estimated model: g(µt ) = β1 + β2 xt2 + β3 xt3 . True model: g(µt ) = β1 + xβt22 + β3 log(xt3 − β4 ) + xβt35 , xt2 ∼ U(1, 2), xt3 ∼ U(4.5, 34.5), β = (1.0, 1.9, −2.0, 3.4, 7.2)⊤, t = 1, . . . , n, φ = 150, µ ∈ (0.36, 0.98).

P2

P 2βγ

(a)

R L2R

0.80

0.90

1.00

φ = 400

P2

P 2βγ

(b)

R L2R

0.70

0.80

0.90

1.00

φ = 150

0.70

0.70

0.80

0.90

1.00

φ = 50

P2

P 2βγ

R L2R

(c)

Figure 6: Model correctly specified. True model: g(µt) = β1 + xβt22 + β3 log(xt3 − β4 ) + xβt35 , t = 1, . . . , n, xt2 ∼ U(1, 2), xt3 ∼ U(4.5, 34.5), β = (1.0, 1.9, −2.0, 3.4, 7.2)⊤. Range of values for mean µ ∈ (0.36, 0.98). We fixed: n = 400, β = (−1.1, 1.7)⊤ , xt ∼ U(0.3, 1.3); (µ ∈ (0.28, 0.61)), zt ∼ U(0.5, 1.5) and we varying γ such that γ = (2.6, 3.0)⊤; λ ≈ 25, γ = (1.6, 3.1)⊤ ; λ ≈ 29, γ = (0.9, 3.2)⊤ ; λ ≈ 35 and 2 2 γ = (−0.3, 3.9)⊤ ; λ ≈ 100, t = 1, . . . , n. In Figure 7 we present the boxplots of the P 2 , Pβγ and RLR   µt = β1 + β2 xt and log (φt ) = under negligence of nonlinearity, that is the estimated model is log 1−µ t 12

γ1 + γ2 zt . Based on this figure we notice that once again the statistics based on residuals outperform 2 the R2 statistics since that the values of the P 2 and Pβγ are considerably smaller than the values of the 2 RLR statistic, in especial when the nonconstant dispersion is more several (when λ increases). γ = (2.6,3.0), λ = 25

γ = (1.6,3.1), λ = 29

0.5 0.1

0.3

0.5 0.3 0.1

µ ∈ (0.28,0.61), n = 400

0.7

Incorrect specification

0.7

Incorrect specification

P 2βγ (a)

P2

R 2L R

P 2βγ (b)

P2

γ = (0.9,3.2), λ = 34

γ = (−0.2,3.9), λ = 100

0.5 0.3 0.1

0.1

0.3

0.5

0.7

Incorrect specification

0.7

Incorrect specification

R 2L R

P2

P 2βγ (c)

P2

R 2L R

P 2βγ (d)

R 2L R

Figure 7: Misspecificated model: g(µt ) = β1 + β2 xt , h(φt ) = γ1 + γ2 zt . True model: g(µt ) = β1 + xβt 2 , h(φt ) = γ1 + ztγ2 , xt ∼ U(0.3, 1.3), zt ∼ U(0.5, 1.5), β = (−1.1, 1.7)⊤ , t = 1, . . . , n, µ ∈ (0.28, 0.61), n = 400. Figure 8 summarizes the predictive power measure performance with boxplots over the Monte Carlo replics. Here, we evaluated the empirical distribution of the RF2 C statistics under nonconstant dispersion for two models estimated correctly. The plots reveals evidences that the pseudo-R2 measures are not a good statistics for model selection when the dispersion varying along the observations. It is clear that 2 P 2 and Pβγ measures become more powerful as the λ increases.

4

Applications

In what follows we shall present an application based on real data. Application I: The application relates to the distribution of natural gas for home usage (e.g., in water heaters, ovens and stoves) in S˜ao Paulo, Brazil. Such a distribution is based on two factors: the simultaneity factor (F ) and the total nominal power of appliances that use natural gas, computed power Qmax . Using these factors one obtains an indicator of gas release in a given tubulation section, namely: Qp = F × Qmax . The simultaneity factor assumes values in (0, 1), and can be interpreted as the probability of simultaneous appliances usage. Thus, based on F the company that supplies the gas decides how much gas to supply to a given residential unit. 13

Correct specification

λ = 25

λ = 100

0.80 0.75 0.70 0.65 0.55

0.60

µ ∈ (0.28,0.69), n = 400

0.70 0.65 0.60

0.50

0.55

µ ∈ (0.28,0.69), n = 400

0.75

0.85

Correct specification

P 2βγ

P 2βγc

R L2R

R L2R c

R F2C

R F2C c

P 2βγ

(a)

Figure 8: True model: g(µt ) = β1 + γ = (2.3, 5.3)⊤, n = 400.

P 2βγc

R L2R

R L2R c

R F2C

R F2C c

(b)

xβt 2 ,

h(φt ) = γ1 +

ztγ2 ,

xt ∼ U(0.3, 1.3), zt = xt , β = (−1.1, 1.7)⊤ ,

The data were analyzed by Zerbinatti (2008), obtained from the Instituto de Pesquisas Tecnol´ogicas ´ (IPT) and the Companhia de G´as de S˜ao Paulo (COMGAS). The response variable (y) are the simultaneity factors of 42 valid measurements of sampled households, and the covariate is the computed power (x1 ). The simultaneity factors ranged from 0.02 to 0.46, being the median equals 0.07. Zerbinatti (2008) modeled such data and concluded that the best performing model was the beta regression model based on logit link function for the mean and logarithmic form (log) of computed power used as covariate. However, the author shows that the beta regression model can underpredict the response. Thus, Espinheira et al. (2014) argue that it is important to have at disposal prediction intervals that can be used with beta regressions. To that end, the authors built and evaluated bootstrap-based prediction intervals for the response for the class of beta regression models. They applied the approach to the data on simultaneity factor. However, a important step in this case was the selection of the model with the best predictive power. P42 To reach2 this aim the authors used a simplified version of PRESS statistic given by P RESS = t=1 (yt − yb(t) ) /42 which selected the same model of Zerbinatti (2008). Here we 2 aim at selecting the better predictive model to the data on simultaneity factor using the P 2 and Pβγ 2 2 statistics. We also consider the RLR and RRC as measures of goodness-of-fit model. The response is the simultaneity factor and the covariate X2 is the log of computed power. We adjusted four beta regression models, considering constant and nonconstant dispersion and logit or log-log link function for µ. For the varying dispersion model we used the log link function for φ. The values of the statistics are presented in Table 4. Here, we should emphasize that the model predictive power is better when the measures P 2 2 and Pβγ are close to one. The Table 4 displays two important informations. First, we notice that by the R2 measures the 2 models equally fits well. Second, the P 2 and Pβγ measures lead to the same conclusions, selecting the beta regression model with link log-log for the mean submodel and link log for the dispersion submodel, as the best model to make prediction to the data on simultaneity factor. The maximum likelihood parameter estimates are βb1 = −0.63, βb2 = −0.31, b γ1 = 3.81 and b γ2 = 0.77. Furthermore, the estimative b of intensity of nonconstant dispersion is λ = 21.16 (see (16)), such that φbmax = 242.39 and φbmin = 11.45. Selected among the candidates the best model in a predictive perspective, we still can use the PRESS statistic to identifying which observations are more difficult to predict. In this sense, we plot the individual components of PRESSβγ versus the observations index and we added a horizontal line at P 3 nt=1 PRESSβγ t /n and singled out points that considerably exceeded this threshold. 14

Table 4: Values of the statistics from the candidate models. Data on simultaneity factor Candidate models Mean

log(µt /(1 − µt )) =

− log(− log (µt )) =

log(µt /(1 − µt )) =

− log(− log (µt )) =

submodel

β1 + β2 xt1

β1 + β2 xt1

β1 + β2 xt1

β1 + β2 xt1

log(φt ) =

log(φt ) =

Dispersion





γ1 + γ2 xt1

γ1 + γ2 xt1

P2

0.66

0.42

0.70

0.88

Pc2

0.64

0.39

0.68

0.87

2 Pβγ

0.65

0.42

0.70

0.88

2 Pβγ c R2LR R2LRc R2F C R2F C c

0.64

0.39

0.68

0.87

submodel

0.72

0.70

0.74

0.74

0.69

0.65

0.70

0.70

0.69

0.72

0.70

0.72

0.67

0.70

0.67

0.70

Figure 9 shows that the cases 3, 11, 33 and 33 arise as the observations with more predictive difficulty and are worthy of further investigation. Log−Log Varying Dispersion Model

5

33

31

11

3 0

1

2

PRESSβ, γ

4

3

0

10

20

30

40

observations indices

Figure 9: PRESS plot. Data on simultaneity factor

Application II: The second application consider a nonlinear beta regression used to modeling the proportion of killed grasshopper (y) at an assays on a grasshopper Melanopus sanguinipes with the insecticide carbofuran and the synergist piperonyl butoxide. This model was proposed by Espinheira et al. (2017) after careful building the scheme of starting values to iterative process of maximum likelihood estimation and after a meticulous residual analysis. Our aim here is applies both predictive power statistics and goodness-of-fit statistics to confirm or not the choice of the model made by residual analysis. The covariates are the dose of the insecticide (x1 ) and the dose of the synergist (x2 ). The data can be found in McCullagh and Nelder (1989, p. 385). Additionally, y ∈ [0.04, 0.84], µy = 0.4501 and the median of the response is equal to 0.4967. 15

The model selected with its estimates and respective p-values is present √ in Espinheira et al. (2017) t2 and and is given by log(µt /1 − µt ) = β1 + β2 log (xt1 − β3 ) + β4 xt2x+β φt = γ1 + γ2 xt1 + γ3 xt2 , t = 5 1, . . . , 15. Now, by using residual analysis as diagnostic tools several linear beta regression models are compared with nonlinear √ model. After competition the nonlinear model: log(µt /1 − µt ) = β1 + β2 log (xt1 + 1.0) + β3 xt2 φt = γ1 + γ2 x1t t = 1, . . . , 15 was selected. The estimatives of parameters are βb1 = −4.25; βb2 = 1.79; βb3 = 0.04; b γ1 = 1.19 and b γ2 = 0.19. The values of the P 2 and R2 measures for the two candidate models are present Table 5. Based on this table we can note that the nonlinear model outperforms the linear model under all statistics, that is, the nonlinear model has both better predictive power and better goodness-of-fit. With aim in identifying observations for which to make prediction can be a hard task we plot values of PRESS statistic versus indices of the observations. In Figure 10 it is noteworthy how the case 14 is strongly singled out. In fact, this case was also singled out in plots of residual analysis made by Espinheira et al. (2017). However, the observation 14 is not an influential case, in sense of to affect inferential results. Besides, the choose model was capable to estimated well this case. Thus, we confirm by the model selection measures that the beta nonlinear model proposed by Espinheira et al. (2017) is a suitable alternative to modeling of the data of insecticide carbofuran and the synergist piperonyl butoxide McCullagh and Nelder (1989, p. 385). Table 5: Values of the statistics from the candidate models. Data on insecticide. Candidate models Linear Models

Nolinear models

Mean

log(µt /(1 − µt )) =

log(µt /(1 − µt )) =

submodel Dispersion

β1 + β2 log (xt1 + 1.0) + β3 xt2 √ φt =

t2 β1 + β2 log (xt1 − β3 ) + β4 x x+β t2 5 √ φt =

submodel

γ1 + γ2 xt1

γ1 + γ2 x1 + γ3 x2 + γ4 (x1 x2 )

P2

0.89

0.99

Pc2

0.85

0.99

2 Pβγ 2 Pβγ c R2LR R2LRc R2F C R2F C c

0.89

0.99

0.86

0.99

0.83

0.99

0.70

0.99

0.79

0.97

0.71

0.94

Application III: In the latter application we will use the dataset about available chlorine fraction after weeks of manufacturing from an investigation performed at Proctor & Gamble. A certain product must have a fraction of available chlorine equal to 0.50 at the time of manufacturing. It is known that chlorine fraction of the product decays with time. Eight weeks after the production, before the product is consumed, in theory there is a decline to a level 0.49. The theory related to the problem indicates that the available chlorine fraction (y) decays according to a nonlinear function of the number of weeks (x) after fabrication of the product and unknown parameters (Draper and Smith, 1981 p. 276), given by ηt = β1 + (0.49 − β1 )exp{β2 (xt − 8)}. 16

(12)

25

Nonlinear Model

15 10 0

5

PRESSβ, γ

20

14

2

4

6

8

10

12

14

observations indices

Figure 10: PRESS plots. Data on insecticide. The level 0.49 depends on several uncontrolled factors, as for example warehousing environments or handling facilities. Thus, the predictions based on theoretical model can be not reliable. Cartons of the product were analyzed over a period aiming answer some questions like as: “When should warehouse material be scrapped?” or “When should store stocks be replaced?” According to knowledgeable chemists an equilibrium asymptotic level of available chlorine should be expected somewhere close to 0.30. From predictor based on (12) we can note that when x = 8 the nonlinear model provides a true level for the available chlorine fraction (no error), wherein η = 0.49. We consider a new logit nonlinear beta regression model. We replaced the deterministic value 0.49 by an additional parameter at the predictor. Thus, the new nonlinear predictor for mean submodel is given by ηt = β1 + (β3 − β1 )exp{−β2 (xt − 8)}, t = 1, . . . , 42. Here the available chlorine fraction ranged from 0.38 to 0.49, being the mean and median are approximately equal 0.42. We investigated some competitive models. Table 6 the results of final candidates models and its statistics. The findings reveals that the model log(µt /(1 − µt )) = β1 + (β3 − β1 )exp{β2 (xt1 − 8) and log(φt ) = γ1 + γ2 logxt1 + exp{γ3 (xt1 − 8) is the best performer in sense that it displays the higher statistics values. To estimate this model was necessary to build a starting values procedure for log-likelihood maximization as proposed by Espinheira et al. (2017). Since that we have more parameters than covariates firstly we used the theoretical information about the asymptotic level and found a initial guess to β1 equal to 0.30. Thus, based on equation we took some values to y (0) and x1 and found a initial guess to β2 , β2 = 0.02. Then we carried out the scheme of starting values to be used in nonlinear beta regression maximum likelihood estimation (Espinheira et al., 2017). The parameters estimatives are βb1 = −0.45963 βb2 = −0.04166 βb3 = 0.09479 b γ1 = 13.18335 b γ2 = −0.05413 -b γ3 = 2.63158. It is importance emphasize that the β2 estimative conduce to a level of chlorine fraction equal to 0.4896 ≈ 0.49 that for this dataset confirm the theory that there is a decline to a level 0.49.

5

Conclusion

2 In this paper we develop the P 2 and Pβγ measures based on two versions of PRESS statistics for the class 2 of beta regression models. The P coefficient consider the PRESS statistic based on ordinary residual

17

Table 6: Values of the statistics from the candidate models. Data on chlorine fraction Candidate models η1t

log(µt /(1 − µt )) = β1 +

− log(− log (µt )) = β1 +

log(µt /(1 − µt )) = β1 +

− log(− log (µt )) = β1 +

(β3 − β1 )exp{β2 (xt1 − 8)} (β3 − β1 )exp{β2 (xt1 − 8)} (β3 − β1 )exp{β2 (xt1 − 8)} (β3 − β1 )exp{β2 (xt1 − 8)} log(φt ) = γ1 + γ2 logxt1

log(φt ) = γ1 + γ2 logxt1

+exp{γ3 (xt1 − 8)}

+exp{γ3 (xt1 − 8)}

0.96

0.95

0.86

0.96

0.95

0.87

0.96

0.95

0.87

0.86

0.96

0.95

0.87

0.87

0.88

0.88

0.85

0.85

0.86

0.86

η2t





P2

0.88

0.87

Pc2

0.87

2 Pβγ

0.88

2 Pβγ c R2LR R2LRc

2 obtained from the Fisher’s scoring iterative algorithm for estimating β whereas Pβγ is based on a new residual which is a combination of ordinaries residuals from the Fisher’s scoring iterative algorithm for estimating β and γ. We have presented the results of Monte Carlo simulations carried out to evaluate the 2 performance of predictive coefficients. Additionally, to access the goodness-of-fit model we used the RLR and RF2 C measures. We consider different scenarios including misspecification of omitted covariates and negligence of varying dispersion, simultaneous increase in the number of covariates in the two submodels (mean and dispersion) and and negligence of nonlinearity. 2 In general form, the coefficients P 2 and Pβγ perform similar and both enable to identify when the model are not reliable to predict or when is more difficult to make prediction. In particular it is noteworthy that when the response values are close to one or close to zero the power predictive of the model is substantially affected even under correct specification. In these situations, the R2 statistics also revel that the model does not fit well. Other important conclusion is about the bad performance of the R2 F C for beta regression models with varying dispersion, in sense that even when the model is well specified the values of this statistic tend to be too smaller than the values of the others statistics. Finally, three empirical applications were performed and yield to a relevant information. In cases that the R2 statistics evaluate the candidate models quite equally the predictive measures were decisive to choose the best between the candidate models. But as suggestion, to selected a model even in a predictive sense way it is also important to use goodness-of-fit measures and our recomendation for the 2 class of nonlinear models is to use the version of RLR considered by Bayer and Cribari-Neto (2017) as the more appropriated model select criteria to linear beta regression models with varying dispersion.

Acknowledgement This work was supported in part by Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico (CNPq) and Funda¸c˜ao de Amparo `a Ciˆencia e Tecnologia de Pernambuco (FACEPE).

Conflict of Interest The authors have declared no conflict of interest.

18

Appendix Fisher’s scoring iterative algorithm: In what follows we shall present the score function and Fisher’s information for β and γP in nonlinear beta regression models (Simas et al., 2010). The log-likelihood function for model (2) is n given by ℓ(β, γ) = t=1 ℓt (µt , φt ), and ℓt (µt , φt ) = log Γ(φt ) − log Γ(µt φt ) − log Γ((1 − µt )φt ) + (µt φt − 1) log yt + {(1 − µt )φt − 1} log(1 − yt ). The score function for β is Uβ (β, γ) = J1⊤ ΦT (y ∗ − µ∗ ),

(13)

where J1 = ∂η1 /∂β (an n × k matrix), Φ = diag{φ1 , . . . , φn }, the tth elements of y ∗ and µ∗ being given in (7). Also, T = diag{1/g ′ (µ1 ), . . . , 1/g ′ (µn )}.. The score function for γ can be written as Uγ (β, γ) = J2⊤ Ha, where J2 = ∂η2 /∂γ (an n × q matrix), at is given in (9) and H = diag{1/h′ (φ1 ), . . . , 1/h′ (φn )}. The components of Fisher’s information matrix are ⊤ ⊤ Kββ = J1⊤ ΦW J1⊤ , Kβγ = K⊤ and Kγγ = J2⊤ DJ2⊤ . (14) γβ = J1 CT HJ2 Here, W = diag{w1 , . . . , wn }, where wt = φt vt [1/{g ′(µt )}2 ] and vt = {ψ ′ (µt φt ) + ψ ′ ((1 − µt )φt )} .

(15)

Also, C = diag{c1 , . . . , cn }; ct = φt {ψ ′ (µt φt )µt − ψ ′ ((1 − µt )φt )(1 − µt )}, D = diag{d1 , . . . , dn }; dt = ξt /(h′ (µt ))2 and ξt = ψ ′ (µt φt )µ2t + ψ ′ ((1 − µt )φt )(1 − µt )2 − ψ ′ (φt ) . To propose PRESS statistics for a beta regression we shall based on Fisher iterative maximum likelihood scheme and weighted least square regressions. Fisher’s scoring iterative scheme used for estimating β, both to linear and nonlinear regression model, can be written as (m)

(m)

β (m+1) = β (m) + (Kββ )−1 Uβ

(β).

(16)

Where m = 0, 1, 2, . . . are the iterations which are carried out until convergence. The convergence happens when the difference |β (m+1) − β (m) | is less than a small, previously specified constant. From (13), (14) and (16) it follows that the mth scoring iteration for β, in the class of linear and nonlinear regression model, can be written as β (m+1) = β (m) + (J1⊤ Φ(m) W (m) J1 )−1 J1⊤ Φ(m) T (m) (y ∗ − µ∗(m) ), where the tth elements of the vectors y ∗ and µ∗ are given in (5). It is possible rewrite this equation in terms of weighted least squares estimator as (m) (m) (m) (m) (m) β (m+1) = (J⊤ W (m) J1 )−1 Φ(m) J⊤ u1 .Here, u1 = J1 β (m) + W −1 T (m) (y ∗ − µ∗ (m) ). Upon convergence, 1Φ 1W −1 b ⊤ b bb βb = (J⊤ ΦJ1 W u1 1 Φ W J1 )

where

b −1 Tb(y ∗ − µ b∗ ). u1 = J1 βb + W

(17)

c , Tb, H b and D b are the matrices W , T , H and D, respectively, evaluated at the maximum likelihood estimates. We Here, W b 1/2 W b 1/2 u1 on Φ b 1/2 W b 1/2 J1 . note that βb in (17) can be viewed as the least squares estimates of β obtained by regressing Φ

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971), pp. 267–281. Budapest: Akad´emiai Kiad´ o. Allen, D. M. (1974). The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125–127. Bartoli, A. (2009). On computing the prediction sum of squares statistic in linear least squares problems with multiple parameter ormeasurement sets. International Journal of Computer Vision 85 (2), 133–142. Bayer, F. M. and F. Cribari-Neto (2017). Model selection criteria in beta regression with varying dispersion. Communications in Statistics, Simulation and Computation 46, 720–746. Brascum, A. J., E. O. Johnson, and M. C. Thurmond (2007). Bayesian beta regression: applications to household expenditures and genetic distances between foot-and-mouth disease viruses. Australian and New Zealand Journal of Statistics 49 (3), 287–301. Cepeda-Cuervo, E. and D. Gamerman (2005). Bayesian methodoly for modeling parameters in the two parameter exponential family. Estad´ıstica 57, 93–105.

19

Chien, L.-C. (2011). Diagnostic plots in beta-regression models. Journal of Applied Statistics 38 (8), 1607–1622. Cook, R. D. and S. Weisberg (1982). Residuals and Influence in Regression. Chapman and Hall. Cribari-Neto, F. and A. Zeileis (2010). Beta regression in R. Journal of Statistical Software 34 (2), 1–24. Espinheira, P., S. Ferrari, and F. Cribari-Neto (2008a). On beta regression residuals. Journal of Applied Statistics 35 (4), 407–419. Espinheira, P. L., S. Ferrari, and F. Cribari-Neto (2014). Bootstrap prediction intervals in beta regressions. Computational Statistics 29 (5), 1263–1277. Espinheira, P. L., S. L. P. Ferrari, and F. Cribari-Neto (2008b). Influence diagnostics in beta regression. Computational Statistics and Data Analysis 52, 4417–4431. Espinheira, P. L., E. G. Santos, and F. Cribari-Neto (2017). On nonlinear beta regression residuals. Biometrical Journal n/a(n/a), n/a–n/a. Ferrari, S. and F. Cribari-Neto (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31 (7), 799–815. Figuero-Z´ un ˜iga, J. I., R. B. Arellano-Valle, and S. L. Ferrari (2013). Mixed beta regression: A bayesian perspective. Computational Statistics and Data Analysis 61, 137–147. McCullagh, P. and J. A. Nelder (1989). Generalized Linear Models (2 ed.). London: Chapman and Hall. Mediavilla, F., L. F, and V. A. Shah (2008). A comparison of the coefficient of predictive power, the coefficient of determination and aic for linear regression. In K. JE (Ed.), Decision Sciences Institute, Atlanta, pp. 1261–1266. Nagelkerke, N. (1991). A note on a general definition of the coefficient of determination. Biometrika 78 (3), 691–692. Ospina, R., F. Cribari-Neto, and K. L. Vasconcellos (2006). Improved point and interval estimation for a beta regression model. Computational Statistics & Data Analysis 51 (2), 960 – 981. Palmer, P. B. and D. G. O’Connell (2009, sep). Regression analysis for prediction: Understanding the process. Cardiopulmonary Physical Therapy Journal 20 (3), 23–26. Pregibon, D. (1981, 07). Logistic regression diagnostics. The Annals of Statistics 9 (4), 705–724. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6 (2), 461–464. Simas, A. B., W. Barreto-Souza, and A. V. Rocha (2010). Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis 54 (2), 348–366. Smithson, M. and J. Verkuilen (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables. Psychological Methods 11 (1), 54–71. Spiess, A.-N. and N. Neumeyer (2010). An evaluation of r2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a monte carlo approach. BMC Pharmacology 10 (1), 6. Zerbinatti, L. (2008). Predi¸ca˜o de fator de simultaneidade atrav´es de modelos de regress˜ao para propor¸co˜es cont´ınuas. Msc thesis, University of S˜ ao Paulo.

20