Slide 1. Logistic Regression – 14 Oct 2009. Logistic Regression. Essential
Medical Statistics. Applied Logisitic Regression Analysis. Kirkwood and Sterne.
Logistic Regression
Slide 1
Essential Medical Statistics
Applied Logisitic Regression Analysis
Kirkwood and Sterne
Menard
Logistic Regression – 14 Oct 2009
Linear regression: two quantitative variables
Often we have a binary variable, such as affection status.
Slide 2
Logistic Regression – 14 Oct 2009
Slide 3
Logistic Regression – 14 Oct 2009
Slide 4
Logistic Regression – 14 Oct 2009
Slide 5
Logistic Regression – 14 Oct 2009
This can arise from a nonlinear relationship between the probability of the outcome, and a predictor variable A given change in x leads to a smaller change in y when y is closer to 0 or to 1.
Slide 6
Logistic Regression – 14 Oct 2009
This can arise from a nonlinear relationship between the probability of the outcome, and a predictor variable A given change in x leads to a smaller change in y when y is closer to 0 or to 1. So the probability that (y = 1) plotted against x, is going to be a curve, not a straight line This is continuous, but is still bounded within 0 and 1 Could we use a model like this?
PY =abx
Slide 7
Logistic Regression – 14 Oct 2009
Probability and Odds Probability = # of successes / total # of attempts Odds
= # of successes / # of failures = P(success) / P(failure) = P(success)/ 1 P(success)
Odds Ratio
Odds in' exposed ' Odds Ratio= Oddsin ' baseline '
Odds in ' exposed ' =Odds in ' baseline ' ·Odds Ratio
Slide 8
Logistic Regression – 14 Oct 2009
Odds are convenient because they lie between 0 and infinity
OddsY =1=abx PY =1 =abx 1− PY =1 Taking the natural log allows this to vary between –∞ and +∞ :
ln
Slide 9
〚
〛
P Y =1 =abx 1−P Y =1
Logistic Regression – 14 Oct 2009
〚
〛
P Y =1 ln =abx 1−P Y =1
ln 〚 OddsY 〛 =abx OddsY =e abx e abx PY = abx 1e
Slide 10
Logistic Regression – 14 Oct 2009
OddsY =e
abx
a
Odds Y =e · e
bx
e b=Odds Ratio
So if our model is: then,
Slide 11
ln 〚 OddsY 〛 =abx
b=ln Odds Ratio Logistic Regression – 14 Oct 2009
Taking the natural log allows this to vary between –∞ and +∞ :
logit Y =abx We can't use Least Squares! We instead use an iterative process called Maximum Likelihood. The likelihood function will give the probability of the data, as a function of your parameters. What function will depend on what data you're using... You'll end up with maximum likelihood estimates of a and b
Slide 12
Logistic Regression – 14 Oct 2009
logit Y =−8.6023.026 X
An output for the example data:
Call: glm(formula = Affection ~ Predictor, family = binomial(link = logit)) Deviance Residuals: Min 1Q Median -2.2348 -0.7052 0.3778
3Q 0.7525
Max 1.6651
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.602 3.236 -2.658 0.00785 ** Predictor 3.026 1.096 2.762 0.00575 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 40.381 Residual deviance: 28.741 AIC: 32.741
on 29 on 28
degrees of freedom degrees of freedom
Number of Fisher Scoring iterations: 4
Slide 13
Logistic Regression – 14 Oct 2009
ln Odds Ratio=3.026 Odds Ratio=e3.026=20.6 This is the Odds Ratio for the influence of X on the dependent variable In other words it is the change in logit(Y), for a 1 unit increase in X, but remember that Y and X, or P(Y) and X, do not have linear relationships
Slide 14
Logistic Regression – 14 Oct 2009
Prediction
logit Y =−8.6023.026 X
oddsY =e−8.6023.026 X e−8.6023.026 X PY =1= 1e−8.6023.026 X
The same guidelines about extrapolation apply Slide 15
Logistic Regression – 14 Oct 2009
An output for the example data: Call: glm(formula = Affection ~ Predictor, family = binomial(link = logit)) Deviance Residuals: Min 1Q Median -2.2348 -0.7052 0.3778
3Q 0.7525
Max 1.6651
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.602 3.236 -2.658 0.00785 ** Predictor 3.026 1.096 2.762 0.00575 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 40.381 Residual deviance: 28.741 AIC: 32.741
on 29 on 28
degrees of freedom degrees of freedom
Number of Fisher Scoring iterations: 4
Slide 16
Logistic Regression – 14 Oct 2009
Wald test
The coefficient divided by its standard error should be distributed as a standard normal variable, and can be tested with:
b W b= ~Normal 0,1 S.E.b We can also calculate the confidence interval:
b±Z 0.975 S.E.b
Slide 17
Logistic Regression – 14 Oct 2009
Likelihood Ratio test
Compares the model fitted, to a model without the parameter of interest, in a chisquare test. For the parameter b: 2
G b=G M −G 0~1 Where GM is the full model chisquare, and G0 is the model chisquare for the model without the term.
Slide 18
Logistic Regression – 14 Oct 2009
Evaluating the model
In linear regression we had the F ratio test, and R2 In logistic regression, computer programs will give you a log likelihood (LL), or sometimes 2LL The LL for the full model, minus the LL for a model with only an intercept can be used to evaluate the significance of your model:
−2 LL full model − LLintercept−only ~2 With degrees of freedom equal to the difference in the number of parameters (e.g. 1). SPSS refers to this as the Likelihood Ratio chisquare test.
Slide 19
Logistic Regression – 14 Oct 2009
Evaluating the model
You may also see: ● AIC (Akaike Information Criterion) ● BIC (Bayesian Information Criterion) ● Deviance These are not great as measures for goodness of fit, and are probably most useful for comparing models (if they are comparable!)
Slide 20
Logistic Regression – 14 Oct 2009
Assumptions ● ● ●
The model is correctly specified Linear relationship between Independent variable and the Logit Zero cells
If you have more than one Independent variables: ● Additivity – modifying factors, interaction ● Multicollinearity – the independent variables shouldn't correlate
Slide 21
Logistic Regression – 14 Oct 2009
Checking Residuals As in logistic regression, we want to make sure: ● ●
Outlier points aren't influencing the model fit Points aren't poorly predicted by the model
Residuals (always standardized) should be Binomially distributed, so we shouldn't worry if they don't look Normal You may see highly influential cases (leverage, or Cook's distance)
Slide 22
Logistic Regression – 14 Oct 2009
Slide 23
Logistic Regression – 14 Oct 2009
Examples
Slide 24
Logistic Regression – 14 Oct 2009
Slide 25
Logistic Regression – 14 Oct 2009
Slide 26
Logistic Regression – 14 Oct 2009