Model selection in quantile regression models - Semantic Scholar

1 downloads 0 Views 352KB Size Report
Sep 2, 2014 - proposed prior, we propose a criterion for model selection in quantile regression models. The proposed criterion can be applied to classical ...
Model selection in quantile regression models Rahim Alhamzawi∗ Department of Mathematical Sciences, School of Information Systems, Computing and Mathematics Brunel University, Uxbridge UB8 3PH, UK Al-Qadisiyah University, Al Diwaniyah, Iraq

Abstract Lasso methods are regularization and shrinkage methods widely used for subset selection and estimation in regression problems. From a Bayesian perspective, the Lasso-type estimate can be viewed as a Bayesian posterior mode when specifying independent Laplace prior distributions for the coefficients of independent variables (Park and Casella, 2008). A scale mixture of normal priors can also provide an adaptive regularization method and represents an alternative model to the Bayesian Lasso-type model. In this paper, we assign a normal prior with mean zero and unknown variance for each quantile coefficient of independent variable.

Then, a simple MCMC-based computation technique is developed for quantile

regression models, including continuous, binary and left-censored outcomes. Based on the proposed prior, we propose a criterion for model selection in quantile regression models. The proposed criterion can be applied to classical least squares, classical quantile regression (QReg), classical Tobit QReg and many others. For example, the proposed criterion can be applied to rq(), lm() and crq() which is available in an R package called Brq. Through simulation studies and analysis of a prostate cancer dataset, we assess the performance of the proposed methods. The simulation studies and the prostate cancer dataset analysis confirm that our methods perform well, compared to other approaches. Keywords: Bayesian quantile regression, Binary, Lasso, Scale mixture of normals, Tobit. ∗

Corresponding author. Email address: [email protected] (Rahim Alhamzawi)

September 2, 2014

1. Introduction Since the pioneering research of Koenker and Bassett (1978), QReg has been attracted considerable theoretical attention as well as of numerous practical applications in a number of fields such as agricultural (Davidova and Kostov, 2013), climate change (Reich, 2011), ecology (Cade et al., 2008), economics (Komunjer, 2005), growth chart (Wei and He, 2006), survival analysis (Yang, 1999) and so on. Quantile regression (QReg) models have emerged as flexible and robust statistical models, widely used for non-normal errors or with outliers. QReg models can provide more complete investigation of the relationship of between an outcome variable and a set of predictors in a regression model, than ordinary mean regression does (Koenker, 2005). Suppose yi , i = 1, · · · , n, is a continuous outcome variable and xi is a k × 1 vector of predictors. Then the linear QReg model for the pth quantile is yi = x0i β + εi , where x0i is R0 a 1 × k vector of predictors and the residuals εi are restricted so that −∞ fp (εi )dεi = p. Following Koenker and Bassett (1978), the k × 1 dimensional vector of QReg coefficients vector β can be found as the solution to the problem

min β

n X

ρp (yi − x0i β),

(1)

i=1

where ρp (h) = h{p − I(h < 0)} is the check function and I(·) denotes the usual indicator function. Koenker and Machado (1999) noted that minimizing (1) is equivalent to maximising the likelihood function under the assumption that the residual comes from an asymmetric Laplace (AL) distribution. Yu and Moyeed (2001) described a Bayesian analysis of QReg, where the errors are given as independent AL distributions. Later, this Bayesian approach was extended by many authors and the evidence shows 2

that the AL distribution is a working model with artificial assumptions (Yuan and Yin, 2010). For example, this approach was extended to Bayesian Tobit and binary QReg by Yu and Stander (2007) and Benoit and Poel (2011), respectively. Geraci and Bottai (2007) and Yuan and Lin (2005) developed Bayesian QReg methods for longitudinal data using the AL distribution for the errors. The asymmetric Laplace distribution can also be motivated as a scale mixture of normals family (Reed and Yu, 2009; Kozumi and Kobayashi, 2011). This motivation connects the linear QReg model for the response variable to a normal linear regression model. If we assume that εi ∼N((1 − 2p)vi , 2σvi ), where vi is a mixing variable and σ is a scale parameter, then AL distribution for εi arises when vi ∼Exp(σ −1 p(1 − p)). Here, N(m, τ 2 ) and Exp(α) denote the densities of a normal distribution with mean m and variance τ 2 and an exponential distribution with rate parameter α, respectively. Under this motivation, the QReg coefficients vector β has desirable conditional conjugacy properties for building a simple and efficient MCMC algorithm for fitting the model to the data. QReg has recently been used in variable selection procedures (Wang et al., 2007; Li and Zhu, 2008; Wu and Liu, 2009; Li et al., 2010; Bradic et al., 2011; Alhamzawi et al., 2012). Likewise standard mean regression models, identifying the active predictors plays the most active role in constructing a QReg model. The removal of inactive predictors from the selection process gives a high prediction performance and highlights those predictors which are the most active in the regression. However, classical variable selection techniques such as AIC and BIC are usually highly time consuming owing to the number of candidate subsets grows exponentially with k and they also suffer from inherent instability (Chen and Dunson, 2003). Several stochastic search variable selection approaches (SSVS) for finding the significant variables in QReg models have been proposed recently. See for example, the SSVS proposed by Alhamzawi and Yu (2011), Ji et al. (2012) and (Yu et al., 2013). However, there is no evidence that all of the subsets will be visited when k is large and consequently the number of the candidate models are eliminated (Kinney and Dunson, 2007). In addition, 3

SSVS is computationally very demanding when there are many regressors which is typical in areas such as genetic studies (Griffin and Brown, 2010a). Similar to Park and Casella (2008), who gave a Bayesian interpretation of the Lasso estimator (Tibshirani, 1996), Li et al. (2010) developed Bayesian regularized QReg by specifying an independent Laplace prior to each QReg coefficient. However, this approach relies on shrinkage rather than selection, in that the irrelevant predictor is not set exactly to zero (Bae and Mallick, 2004). A version of variable selection technique that has attracted considerable interest recently, employs adaptive shrinkage models which assumes a mixture prior distribution that favors sparseness (Bae and Mallick, 2004; Yi and Xu, 2008; Griffin and Brown, 2010b, 2012). Bae and Mallick (2004) illustrate the importance of the adaptive shrinkage models for selection a small number of influential genes through high dimensional gene data. They employed a zero mean normal prior for regression coefficients with unknown variances. However, the complicated correlation structure among predictors can prevent many irrelevant predictors being set exactly to zero. In this paper, we assign a zero mean normal prior for quantile regression (QReg) coefficients with unknown variances. Based on this prior, we propose a criterion for variable selection in QReg models, including continuous, binary and left-censored outcomes. The proposed criterion can be applied to classical least squares, classical QReg, classical Tobit QReg and many others. The rest of this paper is outlined in the following way. Section 2 presents the model and a two-level prior distribution. We outline the Bayesian MCMC algorithm in Section 2.1, and in Section 3, we propose the model selection criterion. In Section 4, we perform simulation examples in order to test the performance of the fitted criterion and our Bayesian QReg approach and in Section 5, the fitted criterion is illustrated using the popular prostate cancer (PC) dataset. A brief paper summary follows in Section 6.

4

2. The model and prior assumptions In this paper, to contact Bayesian inference we assume that εi |vi , σ ∼ N((1 − 2p)vi , 2σvi ) and vi |σ ∼Exp (σ −1 p(1 − p)), which is equivalent to assigning an asymmetric Laplace distribution (ALD) for εi , i = 1, 2, ..., n. It is worth pointing out that assuming ALD for εi is merely artificial assumption used to achieve the possible parametric connection between the minimization in (1) and the maximum likelihood theory. Thus, at the pth quantile, the conditional distribution for the ith observation is N (x0i β + (1 − 2p)vi , 2σvi ). Here, we use the mixture representation of the ALD to facilitates the computation from the posterior by using an efficient MCMC-based computation method. Khare and Hobert (2012) prove that this mixture representation provide an MCMC algorithm that converges at a geometric rate. To conduct a Bayesian inference, we need to assign priors for β and σ. In this paper, we assign adaptive shrinkage on β by assuming β ∼N(0, 2RW ), where R is a diagonal constant matrix and W =diag(w1 , ..., wk ). Here, R= diag(ry,x1 , ry,x2 , · · · , ry,xk ), where ry,xj is the correlation coefficient between the outcome of interest y and the predictor xj . We define the prior of βj , j = 1, 2, ..., k, as follows (Andrews and Mallows, 1974) Z exp{−|ab|} = 0



1 a √ exp{− (a2 w + b2 w−1 )}dw, 2 2πw

a > 0.

(2)

q √ 2 Letting a = 1/ 2, b = βj / 2ry,x and w = wj yields j 1 βj q exp{−| q |} 2 2 4 ry,x 2 r y,xj j Z ∞ βj2 1 wj q )}dwj . = exp{−( + 2 4 4wj ry,x 2 0 4 4πwj ry,x j j 1 2 = N(βj ; 0, 2wj ry,x )Exp(wj ; ), j 4

5

(3)

The model (4) can be seen as a Bayesian Lasso version, where wj is supported by the power of the correlation between the outcome of interest y and the predictor xj . However, this two level prior distribution is attractive, as there is no need for the penalty parameter reported in the Bayesian Lasso model (Park and Casella, 2008; Li et al., 2010), nor is there the need to specify the hyper-parameters in the two level model reported in Bae and Mallick (2004). 2 The combination between wj and ry,x plays an active role in the shrinkage of the regression j

coefficients. A weak correlation coefficient between y and the predictor xj can support wj to give more shrinkage of βj . For the scale parameter σ, we put a prior takes the form of p(σ) ∝ σ −a01 −1 exp {−a02 /σ}. In summary, the proposed hierarchical Bayesian QReg model is given by

yi |vi , σ ∼ N((1 − 2p)vi , 2σvi ), vi |σ ∼ Exp(σ −1 p(1 − p)), 2 βj ∼ Nk (0, 2wj ry,x ), j

(4)

1 wj ∼ Exp( ), 4 σ ∼ σ −a01 −1 exp {−a02 /σ}.

The above model produces an efficient MCMC algorithm by sampling vi , wj , σ, and β from their full conditional distributions. 2.1. Posterior inference The posterior distribution of β, σ, w = (w1 , · · · , wk )0 and v = (v1 , v2 , · · · , vn )0 can be updated using a simple and efficient MCMC-based computation technique. Let InvGamma and InvG denote Inverse-Gamma and Inverse-Gaussian distributions, respectively. • Updating β 6

The full conditional distribution of β is Nk (µβ , Σβ ), where Σβ = (X 0 V X + Λ−1 )−1

and

µβ = ΣX 0 V (y − ξv).

(5)

Here, ξ = 1−2p, Nk denotes a k-dimensional multivariate normal distribution, V = diag 2 2 ((2σv1 )−1 , (2σv2 )−1 , · · · , (2σvn )−1 ) and Λ = 2RW where R = diag(ry,x , · · · , ry,x )0 is 1 k

a diagonal matrix. • Updating vi−1 The full conditional distribution of each vi−1 for i = 1, . . . , n, is InvG(ϕi , θ), where p ϕi = 1/ (yi − x0i β)2 and θ = 1/(2σ). • Updating wj−1 The full conditional distribution of each wj for j = 1, . . . , k, is InvG(ϕj , θ), where s ϕj =

2 ry,x j

1 and θ = . 2

βj2

• Updating σ

σ|β, v ∼ InvGamma(G1 , G2 ),

where G1 = 3n/2+a01 and G2 = (y−Xβ−ξv)0 V 0 (y−Xβ−ξv)/4+p(1−p)

Pn

i=1

vi +

a02 , where V 0 = diag (v1−1 , v2−1 , · · · , vn−1 ). 2.2. Model selection in binary and Tobit QReg Binary and Tobit QReg are important special cases of QReg, which are widely used in bioinformatics, ecology, economics, education, geology and medicine. A serious challenge in Binary and Tobit QReg lies in the identification of the active independent variables in 7

regression. Until now, AIC and BIC are not discussed for Binary and Tobit QReg models and often highly time consuming owing to the number of candidate subsets grows exponentially with k. Ji et al. (2012) proposed SSVS algorithm to find the active independent variables in Tobit and Binary QReg. However, when k is large, this algorithm is very computationally demanding as well as there is no guarantee that the best model will be visited. Benoit et al. (2013) proposed Bayesian Lasso binary QReg which can be extended easly to Bayesian Lasso Tobit QReg. Because no point mass at 0 is assigned in the Bayesian lasso approaches, draws the inactive coefficients from the posterior are never set exactly to 0. Thus some of ad hoc approaches should be used to find the active coefficients. The procedure in section (2) can be employed, with some modifications, to identify the active coefficients in binary and Tobit QReg models. Under the p-th quantile, binary and Tobit QReg models are defined as in Manski (1975, 1985) and Powell (1986), respectively,

yi∗ = x0i β + εi , and yi = η(yi∗ ),

(6)

where η() is a link function. For a binary QReg model η(yi∗ ) = I(yi∗ > 0) and for a Tobit QReg model η(yi∗ ) = max{y 0 , yi∗ }, where y 0 is a known censoring point. Following Manski (1975, 1985) and Powell (1986), the k × 1 dimensional vector of QReg coefficients can be found as the solution to the problem

min β

n X

ρp (yi − η(yi∗ )),

(7)

i=1

Yu and Stander (2007) and Benoit and Poel (2011) developed Bayesian approaches for Tobit and binary quantile regression, respectively. Based on the model in (4), we propose the

8

following hierarchical Bayesian modelling for Tobit and binary quantile regression models

yi = η(yi∗ ), yi∗ |vi , σ ∼ N((1 − 2p)vi , 2σvi ), vi |σ ∼ Exp(σ −1 p(1 − p)), 2 βj ∼ Nk (0, 2wj ry,x ), j

(8)

1 wj ∼ Exp( ), 4 σ ∼ σ −a01 −1 exp {−a02 /σ}.

Under model (8), the proposed Gibbs sampler in Section 2 can be employed to find promising subsets in binary and Tobit QReg models, by replacing yi everywhere with yi∗ and sampling yi∗ from the posterior distribution. Based on model (8), the full conditional distribution of yi∗ in binary and Tobit QReg models are respectively given by    N (x0 β + ξvi , 2σvi )I(y ∗ > 0), i i ∗ yi |yi , β, vi , σ ∼   N (x0i β + ξvi , 2σvi )I(yi∗ ≤ 0),    δ(yi ), ∗ yi |yi , β, vi , σ ∼   N (x0i β + ξvi , 2σvi )I(yi∗ ≤ yi0 ),

if yi = 1; otherwise, if yi > y 0 ; otherwise,

where δ(.) denotes a degenerate distribution.

3. Variable selection criterion Although, there is no point mass at 0 is assigned in the Bayesian lasso approaches, these approaches can give a very important summary to assess the importance of each independent variable in the regression. For example, the credible intervals (CrIs) can be used for finding the inactive independent variables when 0 lies within these CrIs (Fahrmeir et al., 2010). The posterior estimates of βj and the heritability h2j = (σxj /σy × βj )2 can be employed as 9

guide model selection (Yi and Xu, 2008). Here, σxj is the standard deviation of the jth independent variable and σy is the standard deviation of the outcome variable. In genetic studies, Bae and Mallick (2004) used the posterior mean of wj to select the significant variables by deleting each variable which has wj < 10−12 . The main drawback to using this criterion is that only a very small number of genes can achieve it. In fact, the complicated correlation structure among independent variables can prevent many inactive independent variables achieving this criterion. Hoti and Sillanp¨a¨a (2006) suggested that the variable xj is included if the standardised effect of variable xj is greater than 0.1, σxj /σy × βj2 > 0.1. However, variable selection criterion remains one of the serious challenging, mainly due to the complicated correlation structure among variables. In this paper, we construct an alternative criterion based on σy , σxj , βj , wj and ry,xj for j = 1, 2, · · · , k. From 8, if we set ry,xj = 1, then we have βj ∼ N(0, 2wj ).

As the normal prior is a location-scale family, we p can convert βj to the standard normal, so that zj = βj / 2wj . The Bayesian lasso methods shrink inactive coefficients to zero, and hence the posterior estimates of βj and zj can guide variable selection. In practice, many of inactive variables have |zj | less than 0.2, implying those variables can be pruned from the model under the assumption that the variable xj is ∗ ∗ = |zj |. However, under extensive simulation studies, we found < 0.2. Here, z1j excluded if z1j ∗ > 0.2 and it is very difficult to exclude them from the final some of inactive variables have z1j ∗ ∗ model. To remedy this situation, we reduce the value of z1j by assuming z2j = |zj ||ry,xj |. ∗ Here, the variable xj is excluded from the model if z2j < 0.2. Clearly, it can be shown

that a weak correlation coefficient between y and the variable xj , j = 1, ..., k, can effectively ∗ support our criterion z2j < 0.2 to exclude inactive variables from the model. Because of

some active variables may have weak relationship with the response variable, some of these ∗ variables may be excluded from the final model by using the criterion z2j < 0.2, which is ∗ undesirable situation. To over come this problem, we combined the criterion z1j < 0.2 with ∗ ∗ ∗ z2j < 0.2 so that any variable satisfy z1j + z2j < 0.4 can be excluded from the model. For

10

more accuracy, we added the criterion of Hoti and Sillanp¨aa¨ (2006), σxj /σy × βj2 < 0.1, to ∗ ∗ the criterion z1j + z2j < 0.4.

Now, to make a judgement about whether the variable xj is included or excluded, we propose a criterion for variable selection given by

∗ ∗ z1j + z2j + βj2 σxj /σy > 0.5

(9)

Equivalently, we may write (9) as

p |zj |(1 + |ry,xj | + |βj | 2wj σxj /σy ) > 0.5

(10)

Here, any variable satisfy (10) can be included in the final model. The criterion (10) can be used during MCMC iteration or after finding the posterior median or mean for βj and then sampling wj based on this estimator.

Therefore, it can be easily linked to other

methods, such as classical least squares, classical QReg, Tobit QReg, Tobit regression, logistic regression, and many others. For example, the proposed criterion is fitted to lm(), rq() and crq() and is available in an R package called Brq.

4. Simulation Studies Simulation scenarios were considered in this section to assess the performance of our criterion and our Bayesian QReg approach. Although our focus in this paper is on QReg models, it is quite important to find a criterion that can be fitted to different models. Therefore, first of all, we fitted the proposed criterion to the lm() function and we compared the results to the AIC and BIC approaches, using the least squares approximation (LSA) technique discussed in Wang and Leng (2007). Secondly, the proposed Bayesian model selection in QReg is compared to the LSA method and Lasso quantile regression, referred to as “RQL”. Thirdly, the proposed Bayesian model selection in Tobit QReg is compared 11

with the SSVS method discussed in Ji et al. (2012) and the classical Tobit QReg using the crq() function in R (Koenker, 2013).

Example 1.

In this example, four simulation studies are considered to examine the

performance of the proposed criterion for ordinary and quantile regression. Specifically, the data are simulated from the linear model

yi = x0i β + εi , i = 1, ..., n.

The independent variables were sampled independently from Nk (0, Σx ) with (Σx )j1 j2 = ρ|j1 −j2 | , where j1 j2 refers to the (j1 , j2 )th entry of the matrix Σx . The true value for the β’s were set as follows: Simulation 1: β = (5, 0, 0, 0, 0, 0, 0, 0)0 Simulation 2: β = (3, 1.5, 0, 0, 2, 0, 0, 0)0 Simulation 3: β = (3, 1.5, 0, 0, 2, 0, ..., 0)0 | {z } 25

Simulation 4: β = (3, 1.5, 0, 0, 2, 0, ..., 0)0 | {z } 45

In each simulation study, we set ρ ∈ {0.25, 0.50, 0.75} and εi ∼ N(0, 1). For each ρ ∈ {0.25, 0.50, 0.75}, we generated 1000 datasets, each with n = 100 observations. Methods in the comparison are assessed based on the median of mean absolute deviations (MMAD), standard deviation and the mean number of true zero components of β. Here, P 0ˆ 0 true MMAD=median(1/100 100 |)), where the median is calculated over the total i=1 (|xi β −xi β number of replications. For classical regression, we can observe from Table 1 that the proposed criterion was highly efficient for all the simulation scenarios under consideration.

With regards to

Bayesian QReg, whose results are listed in Tables 2 and 3, it can be observed that our method seems to perform well, compared to LSA and RQL. The gap in the median of mean 12

Table 1: Comparing our criterion using the link function lm() with the LSA method, based on the mean number of true zero components of β, referred to as “correct”, for the simulated data in Example 1 using classical regression. The mean number of false zero components of β are also listed, referred to as “wrong”.

Model LSA.aic LSA.bic Brq

ρ 0.25 0.25 0.25

Simulation 1 correct 5.932 6.836 7.000

wrong (0.000) (0.000) (0.000)

Simulation 2 correct 4.253 4.853 5.000

wrong (0.000) (0.000) (0.000)

LSA.aic LSA.bic Brq

0.50 0.50 0.50

5.950 6.818 7.000

(0.000) (0.000) (0.000)

4.238 4.857 5.000

(0.000) (0.000) (0.000)

LSA.aic LSA.bic Brq

0.75 0.75 0.75

(0.000) (0.000) (0.000)

ρ 0.25 0.25 0.25

wrong (0.000) (0.000) (0.000)

4.216 4.852 4.925 Simulation 4 correct 40.555 46.493 47.000

(0.000) (0.000) (0.000)

Model LSA.aic LSA.bic Brq

5.948 6.828 6.941 Simulation 3 correct 23.026 26.595 27.000

LSA.aic LSA.bic Brq

0.50 0.50 0.50

22.762 26.576 26.999

(0.000) (0.000) (0.000)

40.556 46.490 46.951

(0.000) (0.000) (0.000)

LSA.aic LSA.bic Brq

0.75 0.75 0.75

23.388 26.667 26.743

(0.000) (0.000) (0.000)

40.875 46.441 46.023

(0.000) (0.000) (0.000)

13

wrong (0.000) (0.000) (0.000)

Table 2: MMADs and the standard deviations of MADs (SD) for the simulated data in Example 1, using quantile regression methods.

Model LSA.aic LSA.bic RQL Brq

p 0.25 0.25 0.25 0.25

ρ 0.25 0.25 0.25 0.25

Simulation 1 MMAD (SD) 0.236 (0.103) 0.159 (0.107) 0.165 (0.077) 0.079 (0.071)

LSA.aic LSA.bic RQL Brq

0.25 0.25 0.25 0.25

0.50 0.50 0.50 0.50

0.238 0.157 0.158 0.099

(0.112) (0.111) (0.077) (0.067)

4.439 6.074 4.127 6.979

0.263 0.218 0.225 0.214

(0.095) (0.097) (0.081) (0.098)

3.233 4.337 2.737 5.000

LSA.aic LSA.bic RQL Brq

0.25 0.25 0.25 0.25

0.75 0.75 0.75 0.75

0.245 0.156 0.160 0.120

(0.108) (0.113) (0.072) (0.109)

4.390 6.086 4.036 6.904

0.289(0.097) 0.232 (0.098) 0.221 (0.078) 0.248 (0.131)

2.907 4.171 2.712 4.983

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25

0.176 0.094 0.152 0.066

(0.100) (0.087) (0.074) (0.059)

5.463 6.652 4.508 7.000

0.216 0.180 0.225 0.186

(0.088) (0.084) (0.080) (0.074)

3.930 4.729 3.057 5.000

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.50 0.50 0.50 0.50

0.166 0.090 0.144 0.064

(0.101) (0.086) (0.071) (0.065)

5.545 6.697 4.518 6.987

0.207 0.173 0.221 0.183

(0.088) (0.081) (0.077) (0.084)

3.942 4.724 3.015 4.987

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.75 0.75 0.75 0.75

0.176 0.090 0.138 0.094

(0.100) (0.089) (0.069) (0.099)

5.494 6.645 4.347 6.634

0.206 0.171 0.207 0.244

(0.086) (0.082) (0.077) (0.114)

3.961 4.766 2.921 4.659

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.25 0.25 0.25 0.25

0.242 0.158 0.170 0.072

(0.104) (0.107) (0.077) (0.077)

4.459 6.067 4.083 6.999

0.257 0.215 0.238 0.186

(0.092) (0.093) (0.083) (0.102)

3.293 4.354 2.788 5.000

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.50 0.50 0.50 0.50

0.238 0.148 0.160 0.092

(0.106) (0.111) (0.077) (0.079)

4.466 6.169 4.098 7.000

0.262 0.218 0.235 0.196

(0.097) (0.096) (0.081) (0.076)

3.273 4.351 2.770 4.986

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.75 0.75 0.75 0.75

0.237 0.156 0.160 0.126

(0.108) (0.109) (0.075) (0.111)

4.417 6.124 4.027 7.000

0.269 0.229 0.229 0.234

(0.092) (0.092) (0.082) (0.123)

3.240 4.340 2.633 4.979

14

correct 4.485 6.116 4.108 7.000

Simulation 2 MMAD (SD) 0.255 (0.099) 0.214 (0.101) 0.240 (0.083) 0.180 (0.085)

correct 3.322 4.362 2.694 5.000

Table 3: MMADs and the standard deviations of MADs (SD) for the simulated data in Example 1, using quantile regression methods.

Model LSA.aic LSA.bic RQL Brq

p 0.25 0.25 0.25 0.25

ρ 0.25 0.25 0.25 0.25

Simulation 3 MMAD (SD) 0.568 (0.091) 0.514 (0.109) 0.355 (0.084) 0.187 (0.077)

correct 7.821 12.453 14.409 27.000

Simulation 4 MMAD (SD) 0.760 (0.091) 0.738 (0.094) 0.419 (0.088) 0.206 (0.101)

correct 6.818 10.326 26.341 46.998

LSA.aic LSA.bic RQL Brq

0.25 0.25 0.25 0.25

0.50 0.50 0.50 0.50

0.572 0.516 0.356 0.236

(0.096) (0.115) (0.089) (0.106)

7.620 12.347 14.217 26.972

0.768 0.752 0.419 0.244

(0.093) (0.096) (0.095) (0.117)

6.493 9.534 26.786 46.912

LSA.aic LSA.bic RQL Brq

0.25 0.25 0.25 0.25

0.75 0.75 0.75 0.75

0.572 0.522 0.357 0.267

(0.091) (0.109) (0.081) (0.134)

7.270 11.543 14.302 26.255

0.770 0.750 0.416 0.314

(0.093) (0.095) (0.096) (0.187)

6.344 9.516 27.183 45.922

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25

0.479 0.400 0.313 0.183

(0.084) (0.109) (0.076) (0.076)

9.871 16.127 16.415 27.000

0.674 0.670 0.355 0.197

(0.070) (0.071) (0.079) (0.087)

2.594 3.787 30.777 47.000

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.50 0.50 0.50 0.50

0.486 0.410 0.314 0.191

(0.084) (0.115) (0.079) (0.084)

9.587 15.797 16.284 26.980

0.678 0.677 0.347 0.223

(0.077) (0.078) (0.086) (0.110)

2.453 3.538 31.526 46.962

LSA.aic LSA.bic RQL Brq

0.50 0.50 0.50 0.50

0.75 0.75 0.75 0.75

0.483 0.410 0.306 0.246

(0.087) (0.120) (0.080) (0.125)

9.537 15.784 16.557 26.408

0.679 0.675 0.348 0.243

(0.072) (0.072) (0.090) (0.118)

2.493 3.636 31.041 46.177

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.25 0.25 0.25 0.25

0.567 0.509 0.359 0.225

(0.092) (0.112) (0.082) (0.088)

7.808 12.646 14.383 26.999

0.761 0.744 0.414 0.231

(0.092) (0.095) (0.091) (0.099)

6.354 9.775 26.390 46.999

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.50 0.50 0.50 0.50

0.570 0.519 0.353 0.210

(0.096) (0.113) (0.086) (0.099)

7.606 11.976 14.714 26.964

0.771 0.752 0.420 0.246

(0.094) (0.096) (0.098) (0.114)

6.489 9.787 26.649 46.928

LSA.aic LSA.bic RQL Brq

0.75 0.75 0.75 0.75

0.75 0.75 0.75 0.75

0.567 0.517 0.358 0.258

(0.096) (0.114) (0.084) (0.163)

7.353 11.806 14.328 26.800

0.761 0.743 0.422 0.302

(0.094) (0.096) (0.096) (0.165)

6.391 9.568 26.919 46.895

15

absolute deviations (MMAD), standard deviation criterion and the number of true (correct) zero components of β, based on 1000 generated dataset is very large between our method and the other methods, which suggests that good performance of the proposed criterion.

Example 2.

In this example, two simulation studies are considered.

Specifically,

the latent response y ∗ was simulated from the linear models

Simulation1 :

yi∗ = 5x1i + εi ,

Simulation2 :

yi∗ = 3x1i + 1.5x2i + 2x5i + εi , i = 1, ..., 100,

i = 1, ..., 100,

where the independent variables were sampled from Nk (0, Σx ) with (Σx )j1 j2 = 0.5|j1 −j2 | . The observed response yi was obtained by applying the link function yi = max{0, yi∗ }. We consider our Bayesian Tobit QReg method (BLcrq) and the Bayesian Tobit QReg method (TQR) reported in Ji et al. (2012). These methods were compared with the standard Tobit QReg method. We ran BLcrq and TQR for 18,000 iterations following a burn in of 3,000. In Table 4, MMAD and the number of true (correct) and false (wrong) zero components of β are reported. It can be seen that the proposed Bayesian approach produces much lower MMAD than the others. This indicates that our approach performs better than the other approaches. By analysing of the average number of wrong zeros, similar conclusion can be derived.

16

Table 4: MMADs and the number of true and false zero components of β for the simulated data in Example 2, using quantile regression methods.

Model crq TQR BLcrq

p 0.25 0.25 0.25

Simulation 1 MMAD 0.668 0.666 0.366

correct (wrong) - ( - ) 6.01 (0.000) 6.80 (0.000)

Simulation 2 MMAD 0.580 0.578 0.362

correct (wrong) - ( - ) 5.000 (0.000) 4.727 (0.000)

crq TQR BLcrq

0.50 0.50 0.50

0.513 0.511 0.227

- ( - ) 6.010 (0.000) 6.810 (0.000)

0.532 0.531 0.316

- ( - ) 5.000 (0.005) 4.839 (0.000)

crq TQR BLcrq

0.75 0.75 0.75

0.556 0.555 0.340

- ( - ) 6.000 (0.006) 6.811 (0.000)

0.551 0.551 0.363

- ( - ) 5.000 (0.012) 4.689 (0.000)

5. Prostate cancer data analysis To illustrate the performance of our criterion, we consider the prostate cancer dataset (Stamey et al., 1989). This study had a total of 97 male patients who suffer from prostate cancer and is available in the R package “bayesQR” (Benoit et al., 2011). The response variable is the level of prostate antigen (lpsa) and there are eight predictors. These predictors are logarithm of cancer amount (x1 ), logarithm of the weight of prostate (x2 ), age (x3 ), logarithm of the volume of benign enlargement of the prostate (x4 ), seminal vesicle invasion (x5 ), logarithm of Capsular penetration in prostate cancer (x6 ), Gleason score (x7 ) and percentage of Gleason scores 4 or 5 (x8 ). We estimate a QReg model between the response lpsa and the 8 independent variables without intercept. Since the proposed criterion is based on the correlation coefficients between the outcome variable and the independent variables, we assume the data are standardised before analysis and the intercept is excluded from the model. Using the stochastic search variable selection method (SSVSquantreg) in the R package MCMCpack (Martin et al., 2011), Table 5 reports the five most visited models at each quantile level p ∈ {0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95}. As was used in the the seminal work of Yu et al. (2013), we ran SSVSquantreg for 10,000 iterations, removing

17

the first 1000 as burn in (see also, Reed, 2011). We see that the top model picked out by SSVSquantreg in each quantile level has a significantly higher posterior probability than the others. We also see that the top model (x1 ) was picked out as a best model for each quantile level under consideration. Next, we carried out variable selection by using our criterion (Brq) and the LSA method. Table 6 reports the model selected at each quantile level by using LSA and Brq. We observe that the top model picked out by our proposed method (Brq) matched the top model selected by SSVSquantreg in each quantile under consideration. On the other hand, there is no match between the models selected by LSA and the five top models selected by SSVSquantreg. Hence, both the simulation and the prostate cancer data show strong support for the use of the Brq criterion.

6. Summary In this paper we have proposed a Bayesian hierarchical model for subset selection in QReg models including continuous, binary and censored responses.

We have assigned

a normal prior with mean zero and unknown variance for each quantile coefficient of independent variable. MCMC-based computation algorithms are outlined based on the proposed prior to generate samples from the posterior distributions. Based on the proposed prior, we proposed Bayesian information criterion for model selection in QReg models. There are several advantages to our proposed criterion over existing methods. Firstly, it can be used to different models, such as classical QReg, classical least squares, Tobit QReg, Tobit regression, logistic regression and many others. Secondly, the proposed model selection approach is computationally fast and has been verified to be useful in practice. The simulation studies and prostate cancer data analysis both indicate that the proposed methods behave quite well and perhaps preferred over current existing methods. The work considered in this paper opens the door to new research directions for model selection in 18

Table 5: Top five models visited by SSVSquantreg at p ∈ 0.50, 0.60, 0.70, 0.80, 0.90, 0.95} with their estimated posterior probability.

{0.10, 0.20, 0.30, 0.40,

p Model Probability p Model Probability 0.10 0.60 x1 0.49 x1 0.47 x1 ,x2 0.08 x1 ,x2 0.08 x1 ,x5 0.05 x1 ,x5 0.05 x1 ,x8 0.04 x1 ,x4 0.04 x1 ,x4 0.04 x1 ,x8 0.04 0.20

0.70 x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.49 0.08 0.05 0.03 0.03

x1 x1 ,x2 x1 ,x5 x1 ,x8 x1 ,x4

0.48 0.08 0.05 0.04 0.04

x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.48 0.07 0.05 0.04 0.04

0.30

x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.47 0.08 0.05 0.04 0.04

x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.47 0.08 0.05 0.04 0.04

x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.48 0.08 0.05 0.04 0.03

x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.47 0.08 0.05 0.04 0.04

0.80

0.40

0.90

0.50

0.95 x1 x1 ,x2 x1 ,x5 x1 ,x4 x1 ,x8

0.46 0.08 0.05 0.04 0.04

19

Table 6: The top model selected by using LSA and Brq p 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95}.

p Method Model 0.10 LSA.aic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 LSA.bic x1 , x2 , x3 , x4 , x5 , x6 , x7 Brq x1 0.20 LSA.aic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 LSA.bic x1 , x2 , x3 , x4 , x5 , x7 Brq x1 0.30 LSA.aic LSA.bic Brq

x1 , x2 , x3 , x4 , x5 , x6 , x7 x1 , x2 , x3 , x4 , x5 , , x7 x1

0.40 LSA.aic LSA.bic Brq

x1 , x2 , x3 , x4 , x5 , x7 x1 , x2 , x4 , x5 x1

0.50 LSA.aic LSA.bic Brq

x1 , x2 , x3 , x4 , x5 , x7 x1 , x2 , x4 , x5 x1

0.60 LSA.aic LSA.bic Brq

x1 , x2 , x3 , x4 , x5 , x7 x1 , x2 , x4 , x5 x1

0.70 LSA.aic LSA.bic Brq

x1 , x2 , x3 , x4 , x5 , x6 , x7 x1 , x2 , x3 , x4 , x5 , x7 x1

0.80 LSA.aic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 LSA.bic x1 , x2 , x3 , x4 , x5 , x7 Brq x1 0.90 LSA.aic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 LSA.bic x1 , x2 , x3 , x4 , x5 , x6 , x7 Brq x1 0.95 LSA.aic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 LSA.bic x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 Brq x1

20



{0.10, 0.20, 0.30,

QReg models. For example, the approach can be extended to the Bayesian QReg models with right-censored or interval censored responses. There are, also, many other possible extensions such as using our criterion in Bayesian single index QReg. We hope that by making the code of our methods available, we will lower the barrier for other researcher to use our methods in their studies.

Acknowledgements The author wish to thank the Editor, the Associate Editor and the referees for helpful comments and suggestions, which have led to an improvement of this paper.

21

Bibliography Alhamzawi, R. and K. Yu (2011). Variable selection in quantile regression via Gibbs sampling. Journal of Applied Statistics. Alhamzawi, R., K. Yu, and D. F. Benoit (2012). Bayesian adaptive Lasso quantile regression. Statistical Modelling 12, 279–297. Andrews, D. F. and C. L. Mallows (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B 36, 99–102. Bae, K. and B. Mallick (2004). Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20, 3423–3430. Benoit, D., R. Alhamzawi, and K. Yu (2013). Pbayesian lasso binary quantile regression. Comput Stat 22, 28:28612873. Benoit, D. F. and D. V. D. Poel (2011). Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution. Journal of Applied Econometrics 27, 1174– 1188. Benoit, F., R. Alhamzawi, K. Yu, and D. V. P. Poel (2011). bayesQR: Bayesian quantile regression. R package version 2.1. Bradic, J., J. Fan, and W. Wang (2011). Penalized composite quasi-likelihood for ultrahighdimensional variable selection. Journal of the Royal Statistical Society, Ser. B 73, 325–349. Cade, B. S., J. W. Terrell, and M. T. Porath (2008). Estimating fish body condition with quantile regression. North American Journal of Fisheries Management 28, 349–359. Chen, Z. and D. Dunson (2003). Random effects selection in linear mixed models. Biometrics 59, 762–769. Davidova, S. and P. Kostov (2013). A quantile regression analysis of the effect of farmers’ attitudes and perceptions on market participation. Journal of Agricultural Economics 64, 2013. Fahrmeir, L., T. Kneib, and S. Konrath (2010). Bayesian regularisation in structured additive regression: a unifying perspective on shrinkage, smoothing and predictor selection. Statistics and Computing 20, 203–219. Geraci, M. and M. Bottai (2007). Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154. Griffin, J. E. and P. J. Brown (2010a). Bayesian adaptive Lasso with non-convex penalization. Technical report, Institute of Mathematics, Statistics and Actuarial Science, University of Kent. 22

Griffin, J. E. and P. J. Brown (2010b). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 1, 171–188. Griffin, J. E. and P. J. Brown (2012). Structuring shrinkage: some correlated priors for regression. Biometrika 10. doi:10.1093/biomet/asr082. Hoti, F. and M. J. Sillanp¨a¨a (2006). Bayesian mapping of genotype × expression interactions in quantitative and qualitative traits. Heredity 97, 4–18. Ji, Y., N. Lin, and B. Zhang (2012). Model selection in binary and Tobit quantile regression using the Gibbs sampler. Computational Statistics & Data Analysis 56, 827–839. Khare, K. and J. P. Hobert (2012). Geometric ergodicity of the Gibbs sampler for Bayesian quantile regression. Journal of Multivariate Analysis 112, 108–116. Kinney, S. K. and D. B. Dunson (2007). Fixed and random effects selection in linear and logistic models. Biometrics 63, 690–698. Koenker, R. (2005). Quantile Regression. Cambridge Books. Cambridge University Press. Koenker, R. (2013). quantreg: Quantile Regression. R package version 5.05. Koenker, R. and G. J. Bassett (1978). Regression quantiles. Econometrica 46, 33–50. Koenker, R. and J. A. F. Machado (1999). Goodness of fit and related inference processes for quantile regression. Journal of the American Statistical Association 94, 1296–1310. Komunjer, I. (2005). Quasi-maximum likelihood estimation for conditional quantiles. Journal of Econometrics 128, 137–164. Kozumi, H. and G. Kobayashi (2011). Gibbs sampling methods for Bayesian quantile regression. Journal of Statistical Computation and Simulation 81, 1565–1578. Li, Q., R. Xi, and N. Lin (2010). Bayesian regularized quantile regression. Bayesian Analysis 5, 533–556. Li, Y. and J. Zhu (2008). l1 -norm quantile regressions. Journal of Computational and Graphical Statistics 17, 163–185. Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice. Journal of Econometrics 3, 205–228. Manski, C. F. (1985). Semiparametric analysis of discrete response: asymptotic properties ofthe maximum score estimator. Journal of Econometrics 27, 313–333. Martin, A., K. Quinn, and J. Park (2011). MCMCpack: Markov chain Monte Carlo. R package version 1.0-10.

23

Park, T. and G. Casella (2008). The Bayesian Lasso. Journal of the American Statistical Association 103, 681–686. Powell, J. (1986). Censored regression quantiles. Journal of Econometrics 32, 143–155. Reed, C. (2011). Bayesian parameter estimation and variable selection for quantile regression. Technical report, Department of Mathematics. School of Information Systems, Computing and Mathematics. Brunel University. Reed, C. and K. Yu (2009). A partially collapsed Gibbs sampler for Bayesian quantile regression. Technical report, Brunel University, Department of Mathematical Sciences. Reich, B. J., a. F. M. a. D. D. B. (2011). Bayesian spatial quantile regression. Journal of the American Statistical Association 106, 6–20. Stamey, T., J. Kabalin, J. McNeal, I. Johnstone, F. Freiha, E. Redwine, and N. Yang (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate, II: Radical prostatectomy treated patients. Journal of Urology 141, 1076–1083. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267–288. Wang, H. and C. Leng (2007). Unified Lasso estimation by least squares approximation. Journal of the American Statistical Association 102, 1039–1048. Wang, H., G. Li, and G. Jiang (2007). Robust regression shrinkage and consistent variable selection through the LAD-Lasso. Journal of Business & Economic Statistics 25, 347–355. Wei, Y., a. P. A. a. K. R. and X. He (2006). Quantile regression methods for reference growth curves. Statist. Medicine 25, 1369–1382. Wu, Y. and Y. Liu (2009). Variable selection in quantile regression. Statistica Sinica 19, 801–817. Yang, S. (1999). Censored median regression using weighted empirical survival and hazard functions. Journal of the American Statistical Association 94, 137–145. Yi, N. and S. Xu (2008). Bayesian Lasso for quantitative trait loci mapping. Genetics 179, 1045–1055. Yu, K., C. W. Chen, C. Reed, and D. Dunson (2013). Bayesian variable selection in quantile regression. Statistics and Its Interface 6, 261–274. Yu, K. and R. A. Moyeed (2001). Bayesian quantile regression. Statistics & Probability Letters 54, 437–447. Yu, K. and J. Stander (2007). Bayesian analysis of a Tobit quantile regression model. Journal of Econometrics 137, 260–276. 24

Yuan, M. and Y. Lin (2005). Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association 100, 1215–1225. Yuan, Y. and G. Yin (2010). Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66, 105–114.

25