This short course provides an overview of generalized linear models. (GLMs). We
shall see that these models extend the linear modelling framework to variables ...
Introduction to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK
WU, 2008–04–22-24 c Heather Turner, 2008 Copyright
Introduction to Generalized Linear Models
Introduction
This short course provides an overview of generalized linear models (GLMs). We shall see that these models extend the linear modelling framework to variables that are not Normally distributed. GLMs are most commonly used to model binary or count data, so we will focus on models for these types of data.
Introduction to Generalized Linear Models Outlines
Plan
Part I: Introduction to Generalized Linear Models Part II: Binary Data Part III: Count Data
Introduction to Generalized Linear Models Outlines Part I: Introduction to Generalized Linear Models
Part I: Introduction
Review of Linear Models Generalized Linear Models GLMs in R Exercises
Introduction to Generalized Linear Models Outlines Part II: Binary Data
Part II: Binary Data Binary Data Models for Binary Data Model Selection Model Evaluation Exercises
Introduction to Generalized Linear Models Outlines Part III: Count Data
Part III: Count Data
Count Data Modelling Rates Modelling Contingency Tables Exercises
Introduction
Part I Introduction to Generalized Linear Models
Introduction Review of Linear Models Structure
The General Linear Model
In a general linear model yi = β0 + β1 x1i + ... + βp xpi + i the response yi , i = 1, . . . , n is modelled by a linear function of explanatory variables xj , j = 1, . . . , p plus an error term.
Introduction Review of Linear Models Structure
General and Linear Here general refers to the dependence on potentially more than one explanatory variable, v.s. the simple linear model: yi = β0 + β1 xi + i The model is linear in the parameters, e.g. yi = β0 + β1 x1 + β2 x21 + i yi = β0 + γ1 δ1 x1 + exp(β2 )x2 + i but not e.g. yi = β0 + β1 xβ1 2 + i yi = β0 exp(β1 x1 ) + i
Introduction Review of Linear Models Structure
Error structure We assume that the errors i are independent and identically distributed such that E[i ] = 0 and var[i ] = σ 2 Typically we assume i ∼ N (0, σ 2 ) as a basis for inference, e.g. t-tests on parameters.
Introduction Review of Linear Models Examples
Some Examples bodyfat = −14.59 + 0.7 * biceps − 0.9 * abdomin
60
bodyfat
40 20 0 25 30 bic 35 eps
40 45 160
140
100 120 abdomin
80
60
Introduction Review of Linear Models
200
Examples
100 ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ●
●
50
● ●
Length = 75.7 + 17.2 * Age Length = 143.3 + 232.8 * Age − 66.6 * Age^2 0
Length
150
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1
2
3
4 Age
5
6
7
Introduction Review of Linear Models Examples
36
particle sizeij = operatori + resinj + operator:resinij
1 3 2
2 1 2 1
operator 1
2
3 3
32
3 3
30
3 1 1 2
1 1 2 2 2 3 2 1 1
2
1 2 1 2
3
2 3 1 2 1
3
3
5
6
3 3
3
28
particle size
34
2 3
1 2 1 2
1
2
3
3
3
4 resin
7
8
Introduction Review of Linear Models Restrictions
Restrictions of Linear Models
Although a very useful framework, there are some situations where general linear models are not appropriate I
the range of Y is restricted (e.g. binary, count)
I
the variance of Y depends on the mean
Generalized linear models extend the general linear model framework to address both of these issues
Introduction Generalized Linear Models Structure
Generalized Linear Models (GLMs) A generalized linear model is made up of a linear predictor ηi = β0 + β1 x1i + ... + βp xpi and two functions I
a link function that describes how the mean, E(Yi ) = µi , depends on the linear predictor g(µi ) = ηi
I
a variance function that describes how the variance, var(Yi ) depends on the mean var(Yi ) = φV (µ) where the dispersion parameter φ is a constant
Introduction Generalized Linear Models Structure
Normal General Linear Model as a Special Case For the general linear model with ∼ N (0, σ 2 ) we have the linear predictor ηi = β0 + β1 x1i + ... + βp xpi the link function g(µi ) = µi and the variance function V (µi ) = 1
Introduction Generalized Linear Models Structure
Modelling Binomial Data Suppose Yi ∼ Binomial(ni , pi ) and we wish to model the proportions Yi /ni . Then E(Yi /ni ) = pi
var(Yi /ni ) =
1 pi (1 − pi ) ni
So our variance function is V (µi ) = µi (1 − µi ) Our link function must map from (0, 1) → (−∞, ∞). A common choice is µi g(µi ) = logit(µi ) = log 1 − µi
Introduction Generalized Linear Models Structure
Modelling Poisson Data Suppose Yi ∼ Poisson(λi ) Then E(Yi ) = λi
var(Yi ) = λi
So our variance function is V (µi ) = µi Our link function must map from (0, ∞) → (−∞, ∞). A natural choice is g(µi ) = log(µi )
Introduction Generalized Linear Models Structure
Transformation vs. GLM In some situations a response variable can be transformed to improve linearity and homogeneity of variance so that a general linear model can be applied. This approach has some drawbacks I
response variable has changed!
I
transformation must simulateneously improve linearity and homogeneity of variance
I
transformation may not be defined on the boundaries of the sample space
Introduction Generalized Linear Models Structure
For example, a common remedy for the variance increasing with the mean is to apply the log transform, e.g. log(yi ) = β0 + β1 x1 + i ⇒ E(log Yi ) = β0 + β1 x1 This is a linear model for the mean of log Y which may not always be appropriate. E.g. if Y is income perhaps we are really interested in the mean income of population subgroups, in which case it would be better to model E(Y ) using a glm : log E(Yi ) = β0 + β1 x1 with V (µ) = µ. This also avoids difficulties with y = 0.
Introduction Generalized Linear Models Structure
Exponential Family Most of the commonly used statistical distributions, e.g. Normal, Binomial and Poisson, are members of the exponential family of distributions whose densities can be written in the form yθ − b(θ) f (y; θ, φ) = exp φ + c(y, φ) where φ is the dispersion parameter and θ is the canonical parameter. It can be shown that E(Y ) = b0 (θ) = µ and
var(Y ) = φb00 (θ) = φV (µ)
Introduction Generalized Linear Models Structure
Canonical Links For a glm where the response follows an exponential distribution we have g(µi ) = g(b0 (θi )) = β0 + β1 x1i + ... + βp xpi The canonical link is defined as g = (b0 )−1 ⇒ g(µi ) = θi = β0 + β1 x1i + ... + βp xpi Canonical links lead to desirable statistical properties of the glm hence tend to be used by default. However there is no a priori reason why the systematic effects in the model should be additive on the scale given by this link.
Introduction Generalized Linear Models Estimation
Estimation of the Model Parameters A single algorithm can be used to estimate the parameters of an exponential family glm using maximum likelihood. The log-likelihood for the sample y1 , . . . , yn is l=
n X yi θi − b(θi )
φi
i=1
+ c(yi , φi )
The maximum likelihood estimates are obtained by solving the score equations n
s(βj ) =
X yi − µ i xij ∂l =0 = × 0 ∂βj φi V (µi ) g (µi ) i=1
for parameters βj .
Introduction Generalized Linear Models Estimation
We assume that φi =
φ ai
where φ is a single dispersion parameter and ai are known prior weights; for example binomial proportions with known index ni have φ = 1 and ai = ni . The estimating equations are then n
X ai (yi − µi ) xij ∂l = × 0 =0 ∂βj V (µi ) g (µi ) i=1
which does not depend on φ (which may be unknown).
Introduction Generalized Linear Models Estimation
A general method of solving score equations is the iterative algorithm Fisher’s Method of Scoring (derived from a Taylor’s expansion of s(β)) In the r-th iteration , the new estimate β (r+1) is obtained from the previous estimate β (r) by −1 β (r+1) = β (r) + s β (r) E H β (r) where H is the Hessian matrix: the matrix of second derivatives of the log-likelihood.
Introduction Generalized Linear Models Estimation
It turns out that the updates can be written as −1 X T W (r) z (r) β (r+1) = X T W ( r)X i.e. the score equations for a weighted least squares regression of z (r) on X with weights W (r) = diag(wi ), where (r) (r) (r) (r) 0 zi = ηi + yi − µi g µi a (r) i and wi = 2 (r) (t) V µi g 0 µi
Introduction Generalized Linear Models Estimation
Hence the estimates can be found using an Iteratively (Re-)Weighted Least Squares algorithm: (r)
1. Start with initial estimates µi
(r)
2. Calculate working responses zi
(r)
and working weights wi
3. Calculate β (r+1) by weighted least squares 4. Repeat 2 and 3 till convergence For models with the canonical link, this is simply the Newton-Raphson method.
Introduction Generalized Linear Models Estimation
Standard Errors ˆ have the usual properties of maximum likelihood The estimates β ˆ is asymptotically estimators. In particular, β N (β, i−1 ) where i(β) = φ−1 X T W X Standard errors for the βj may therefore be calculated as the square roots of the diagonal elements of ˆ = φ(X T W ˆ X)−1 cov( ˆ β) ˆ X)−1 is a by-product of the final IWLS iteration. in which (X T W If φ is unknown, an estimate is required.
Introduction Generalized Linear Models Estimation
There are practical difficulties in estimating the dispersion φ by maximum likelihood. Therefore it is usually estimated by method of moments. If β was known an unbiased estimate of φ = {ai var(Y )}/v(µi ) would be n 1 X ai (yi − µi )2 n V (µi ) i=1
Allowing for the fact that β must be estimated we obtain n
1 X ai (yi − µi )2 n−p V (µi ) i=1
Introduction GLMs in R glm Function
The glm Function
Generalized linear models can be fitted in R using the glm function, which is similar to the lm function for fitting linear models. The arguments to a glm call are as follows glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = glm.control(...), model = TRUE, method = ”glm.fit”, x = FALSE, y = TRUE, contrasts = NULL, ...)
Introduction GLMs in R glm Function
Formula Argument The formula is specified to glm as, e.g. y ∼ x1 + x2
where x1, x2 are the names of I
numeric vectors (continuous variables)
I
factors (categorical variables)
All specified variables must be in the workspace or in the data frame passed to the data argument.
Introduction GLMs in R glm Function
Other symbols that can be used in the formula include I
a:b for an interaction between a and b
I
a*b which expands to a + b + a:b
I
. for first order terms of all variables in data
I
- to exclude a term or terms
I
1 to include an intercept (included by default)
I
0 to exclude an intercept
Introduction GLMs in R glm Function
Family Argument The family argument takes (the name of) a family function which specifies I
the link function
I
the variance function
I
various related objects used by glm, e.g. linkinv
The exponential family functions available in R are I
binomial(link = "logit")
I
gaussian(link = "identity")
I
Gamma(link = "inverse")
I
inverse.gaussian(link = "1/mu2 ")
I
poisson(link = "log")
Introduction GLMs in R glm Function
Extractor Functions The glm function returns an object of class c("glm", "lm"). There are several glm or lm methods available for accessing/displaying components of the glm object, including: I
residuals()
I
fitted()
I
predict()
I
coef()
I
deviance()
I
formula()
I
summary()
Introduction GLMs in R Example with Normal Data
Example: Household Food Expenditure Griffiths, Hill and Judge (1993) present a dataset on food expenditure for households that have three family members. We consider two variables, the logarithm of expenditure on food and the household income: dat