geographically weighted regression

Geographically Geographically Weighted Weighted Regression Regression A Stewart Fotheringham Martin Charlton Chris Brunsdon [email protected] http://ncg.nuim.ie/GWR

Regression In a typical linear regression model applied to spatial data we assume a stationary process -the same stimulus provokes the same response in all parts of the study region process:

yi = β0 + β1x1i + β2x2i +… βnxni + εi

so that... The parameter estimates obtained in the calibration of such a model are constant over space: β’ = (XT X)-1 XT Y

The assumption of stationarity in regression yi = α + β x i

β2 β1

Assumption is that the values of β are the same everywhere.

Why might measured relationships vary spatially? Sampling variation Relationships intrinsically different across space e.g. differences in attitudes, preferences or

different administrative, political or other contextual effects produce different responses to the same stimuli

Model misspecification - suppose a global

statement can ultimately be made but models not properly specified to allow us to make it. Local models good indicator of how model is misspecified.

Can all contextual effects ever be removed? Can all significant variations in local relationships be removed?

Consequently…if there is spatial nonstationarity, we only see it through the residuals We might map the residuals from the regression to determine whether there are any spatial patterns. Or compute an autocorrelation statistic for the residuals We might even try to ‘model’ the error dependency with various types of spatial regression models.

However...

Why not address the issue of spatial nonstationarity directly and allow the relationships we are measuring to vary over space? This is the essence of GWR

yi = β0(i) + β1 (i) x1i + β2 (i) x2i +… βn (i) xni+ εi

… with the estimator β’(i) = (XTW(i) X)-1 XT W(i) Y where W(i) is a matrix of weights specific to location i such that observations nearer to i are given greater weight than observations further away.

W(i)

=

wi1 0 .……..…..0 0 wi2 …..……..0 0 0 wi3 ……..0 . . . . 0 0 0 ………win

where win is the weight given to data point n for the estimate of the local parameters at location i.

A Typical Spatial Weighting Function

Weighting schemes Numerous weighting schemes can be used although they tend to be Gaussian or ‘Gaussian-like’ reflecting the type of dependency found in most spatial processes. Weighting schemes can be either fixed or adaptive. adaptive

Fixed Weighting Scheme

A Fixed Weighting Scheme For each location i at which the local regression model is calibrated,

wij = exp [ - ½ (dij / h)2 ] where dij is the distance between locations i and j h is the bandwidth – as h increases, the gradient of the kernel becomes less steep and more data points are included in the local calibration. We need to find the optimal value of h in the GWR routine.

Spatially Adaptive Weighting Scheme

A Spatially Adaptive Weighting Function wij = [1-(dij2 / h2)]2 =0

if j is one of the Nth nearest neighbours of i otherwise

Here, we find the optimal value of N in the GWR routine

Calibration The results of GWR appear to be relatively insensitive to the choice of weighting function as long as it is a continuous distance-based function Whichever weighting function is used, the results will, however, be sensitive to the degree of distance-decay. Therefore an optimal value of either h or N has to be obtained. This can be found by minimising a crossvalidation score (CV) or the Akaike Information Criterion (AICc)

where... CV = ∑i [yi - y ≠i* (h)]2 where y ≠i* (h) is the fitted value of yi with data from point i omitted from the calibration Lower values of CV indicate better model fits.

AICc = Deviance + 2k [n/(n-k-1)] where n is the number of data points and k is the number of parameters in the model Lower values of AICc indicate better model fits.

Bandwidth Selection Optimal bandwidth selection is a tradeoff between bias and variance Too small a bandwidth leads to large variance in the local estimates Too large a bandwidth leads to large bias in the local estimates

Bias-Variance Trade-Off

Output from GWR Main output from GWR is a set of location-specific parameter estimates which can be mapped and analysed to provide information on spatial nonstationarity in relationships.

An Example using Educational Attainment Data in Georgia

In GWR, we can also ... estimate local standard errors derive local t statistics calculate local goodness-of-fit measures perform tests to assess the significance of the spatial variation in the local parameter estimates perform tests to determine if the local model performs better than the global one, accounting for differences in degrees of freedom

A Simulation Experiment Yi = αi + β1i X1i + β2i X2i Data on X1 and X2 drawn randomly for 2500 locations on a 50 x 50 matrix s.t. r(X1, X2) is controlled. Results shown to be independent of r(X1,X2)

Experiment 1: (parameters spatially invariant) αi = 10 for all i β1i = 3 for all i Β2i = -5 for all i Yi obtained from above Data used to calibrate model by global regression and by GWR

Results… Global: Adj. R2 = 1.0 AIC = -59,390 K = 3 α (est.) = 10; β1 (est.) = 3; β2 (est.) = -5

GWR: Adj. R2 = 1.0 AIC = -59,386 N = 2,434 αi (est.) = 10 for all i β1i (est.) = 3 for all i β2i (est.) = -5 for all i

K = 6.5

Conclusion: GWR does NOT appear to suggest any spurious nonstationarity when relationships are constant

Experiment 2: (parameters spatially variant) 0 ≤ i ≤ 50

0 ≤ j ≤ 50

αi = 0 + 0.2i + 0.2j β1i = -5 + 0.1i + 0.1j Β2i = -5 + 0.2i + 0.2j

0 to 20 -5 to 5 -5 to 15

Yi obtained in same way Data used to calibrate model by global regression and by GWR

Results… Global: Adj. R2 = 0.04 AIC = 17,046 K = 3 α (est.) = 10.26; β1 (est.) = -0.1; β2 (est.) = 5.28 These are close to the averages of the local estimates (10;0;5)

GWR: Adj. R2 = 0.997 AIC = 2,218 K = 167 N = 129 αi (est.) range = 2 to 18.6 β1i (est.) range = -4.3 to 4.7 β2i (est.) range = -3.9 to 13.6

Conclusion: GWR identifies spatial nonstationarity in relationships; global model fails completely.

0 ≤ α(i) ≤ 20

-5 ≤ β1(i) ≤ 5

-5 ≤ β2(i) ≤ 15

An An Empirical Empirical Example: Example: House House Prices Prices in in London London 1990 sales price data for 12,493 houses in London along with various attributes of each property and a postcode so locations down to 100m can be obtained via the Central Postcode Directory neighbourhood data obtained for enumeration districts (via postcode-to-ED LUT)

Locations of house sales in data set

Hedonic Price Modelling Very common tool to examine determinants of house prices House prices related to determinants (usually) in the form of a linear-inparameters model generally calibrated by some form of regression Problem: A global model is almost always assumed where “one size fits all”

Explanatory Variables Floor area House type (detached; semi-d; flat etc) Date built Garage Central Heating 2+ bathrooms % professionals in neighbourhood % unemployed in neighbourhood distance to centre of London

Global Regression Parameter Estimates Variable Intercept

Parameter Estimate 58,900

T value 23.3

FLRAREA

697

49.3

FLRDETACH* FLRFLAT* FLRBNGLW* FLRTRRCD*

205 -123 -87 -119

7.5 -5.6 -1.4 -6.2

BLDPWW1** BLDPOSTW** BLD60S** BLD70S** BLD80S**

-2,340 -2,786 -5,177 -2,421 6,315

-3.9 -3.1 -5.0 -2.1 6.9

GARAGE CENHEAT BATH2+

5,956 7,777 22,297

10.6 12.4 19.1

72 -211

3.0 -5.5

-18,137

-30.1

PROF UNEMPLOY ln(DISTCL) R2 = 0.60

* Excluded house type is Semi-detached ** Excluded age is Inter-war 1914-1939

Using GWR In this case an adaptive kernel is used a bisquare function Calibration yielded an optimal number of nearest neighbours = 931 Results presented in a series of parameter surfaces - those shown all have significant spatial variation

Value of terraced property £/m2 (global estimate = £578)

Pre-1914 housing compared to inter-war (global estimate = £-2,340)

1960s housing compared to inter-war (global estimate = £-5,177)

Residuals from GWR are generally much lower and are not spatially autocorrelated GWR models give much better fits to data, even accounting for increases in number of parameters GWR residuals are generally not spatially autocorrelated so reducing/removing the need for spatial regression models

Global Regression Parameter Estimates Variable Intercept

Parameter Estimate 58,900

T value 23.3

FLRAREA

697

49.3

FLRDETACH* FLRFLAT* FLRBNGLW* FLRTRRCD*

205 -123 -87 -119

7.5 -5.6 -1.4 -6.2

BLDPWW1** BLDPOSTW** BLD60S** BLD70S** BLD80S**

-2,340 -2,786 -5,177 -2,421 6,315

-3.9 -3.1 -5.0 -2.1 6.9

GARAGE CENHEAT BATH2+

5,956 7,777 22,297

10.6 12.4 19.1

72 -211

3.0 -5.5

-18,137

-30.1

PROF UNEMPLOY ln(DISTCL) R2 = 0.60

* Excluded house type is Semi-detached ** Excluded age is Inter-war 1914-1939

Residuals from Global Model

Residuals from GWR Model

Assessing whether the spatial variation in measured relationships might be important (i.e. the variation is unlikely to be just a product of sampling variation)

1. Monte-Carlo tests 2. Local t values 3. Variability of local parameter estimates

1. Monte- Carlo Tests 1. 2. 3. 4. 5. 6.

Obtain local parameter estimates and calculate variance of estimates Rearrange data randomly across the zones (keeping Yi X1i X2i …Xni) together Compute new set of local parameter estimates based on rearranged data Repeat steps 2 and 3 LOTS of times each time computing the variance of the local estimates Compare variance of local parameter estimates in step 1 with those from steps 2 and 3 p value associated with 1 is then the proportion of variances that lie above that for 1 in a list of variances sorted high to low.

Can do this very easily within GWR3.0!

An example from the Georgia data ************************************************* * * * Test for spatial variability of parameters * * * ************************************************* Tests based on the Monte Carlo significance test procedure due to Hope [1968,JRSB,30(3),582-598] Parameter ---------Intercept TotPop90 PctRural PctEld PctFB PctPov PctBlack

P-value ------------0.29000 0.10000 0.24000 0.75000 0.00000 0.59000 0.02000

Spatial variation in the % FB local parameter estimate

Spatial variation in the % black local parameter

2. Local t values ti = βi* / SE (βi* ) Map these values. Look for areas on the map with ti values > 2 and/or ti values < -2 Local t values are provided automatically in the GWR3.0 output file and can be mapped in ArcGIS

3. Variability of local estimates To examine the ‘importance’ of the spatial variability of any relationship, compare the variation of the local parameter estimates from GWR with the SE of the global parameter estimate. This can be done in two ways. 3.1 Calculate the following index: I = S.D. of local estimates / S.E. global estimate

An example from the Georgia data set Par.

S.D. local

S.E. global I

Int %Rur %Eld %FB %Pov %Bla

1.744 0.012 0.095 0.985 0.099 0.045

1.753 0.014 0.130 0.307 0.072 0.026

0.99 0.86 0.73 3.21 1.38 1.73

M-C p value 0.42 0.41 0.60 0.00 0.50 0.01

Very rough ‘Rule of Thumb’: Potentially interesting spatial variation if I > 1.5

3.2 Use the inter-quartile range in the listing file 50% of the local parameter estimates will lie within the inter-quartile range Approx. 68% of the values in a Normal distribution lie between ± 1 SD of the mean. The global parameter estimate is the mean of a Normal distribution. Therefore if the inter-quartile range of the local estimates is greater than 2 SD of the global mean, this is indicative of a possible non-stationary process.

An example from the Georgia data set Parameter

2 x S.E. global

Inter-quartile range (local)

Int %Rur %Eld %FB %Pov %Bla

3.506 0.028 0.260 0.614 0.144 0.052

3.400 0.019 0.109 2.034 0.094 0.080

Same interpretations as M-C tests

X X X √ X √

Can Use GWR as a ‘Spatial Microscope’ Instead of determining an optimal bandwidth during the calibration of a GWR model, a bandwidth can be input a priori. A series of bandwidths can be selected and the resulting parameter surfaces examined at different levels of smoothing For example, consider a very simple model of house prices regressed on floor area for 570 houses in Tyne & Wear, North East England. Surfaces of the local floorspace parameter are derived for bandwidths corresponding to 400, 350, 300, 250, 200, 150, 100 and 50 NN

OK, So how do you all this? Running the GWR software: GWR 3.0

First, create a data file… File type xxxx.dat (xxxx.csv) First line is a comma separated list of variable names (] symbol

The Model Editor To specify independent variables, highlight them individually in the list on the left and click on the [->] symbol

The Model Editor To specify location variables, highlight them individually in the list on the left and click on the corresponding [->] symbols

The Model Editor Next you specify the type of kernel: this can be fixed (Gaussian) or adaptive (bisquare)

The Model Editor You can either preset the bandwidth in the units that the location variables are measured in (for example, metres)

Or if you want the program to determine the optimal bandwidth, specify either crossvalidation or AIC minimisation. For large files, there is a sampling option to speed the process.

The Model Editor Select the type of coordinate system you are using for your location variables – choice is either Cartesian or spherical

The Model Editor The type of output in the printed listing can also be controlled by clicking on Model Options

Bandwidth selection Bandwidth AICc 56.043532255000 913.159190588348 84.500000000000 885.119969660068 112.956467745000 872.910381423844 130.543532046749 868.887720190066 141.412935569545 869.149708997055 123.825871267796 870.450868861077 134.695274741431 869.114420384913 127.977613962479 869.551269557617 ** Convergence after 8 function calls ** Convergence: Local Sample Size= 131

Useful if you want to plot the relationship to see how steep or flat it is

List predictions… Predictions from this model... Obs Y(i) Yhat(i) 1 8.200 9.006 2 6.400 6.958 3 6.600 8.524 4 9.400 8.308 5 13.300 13.835 6 6.400 8.910 7 9.200 11.760 8 9.000 11.446 9 7.600 10.231 10 7.500 9.104

Res(i) -0.806 -0.558 -1.924 1.092 -0.535 -2.510 -2.560 -2.446 -2.631 -1.604

X(i) -82.286 -82.875 -82.451 -84.454 -83.251 -83.501 -83.712 -84.839 -83.220 -83.232

Y(i) 31.753 31.295 31.557 31.331 33.072 34.353 33.993 34.238 31.759 31.274

F F F F F F F F F F

If you have requested an output file, this information and the diagnostics are also written to this file.

List Pointwise Diagnostics

Obs Observed Predicted Residual Std Resid R-Square Influence Cook's D ----- -------------- -------------- -------------- ----------- ----------- ----------- ----------1 8.20000 8.84819 -0.64819 -0.182251 0.784156 0.032346 0.000073 2 6.40000 6.39738 0.00262 0.000759 0.775286 0.085459 0.000000 3 6.60000 8.48954 -1.88954 -0.549871 0.782090 0.096687 0.002126 4 9.40000 8.35258 1.04742 0.302776 0.808351 0.084519 0.000556 5 13.30000 14.60358 -1.30358 -0.377493 0.834522 0.087768 0.000901 6 6.40000 8.29036 -1.89036 -0.538081 0.839070 0.055846 0.001125 7 9.20000 12.02529 -2.82529 -0.794934 0.841344 0.033697 0.001448 8 9.00000 10.97210 -1.97210 -0.554246 0.846250 0.031492 0.000656 9 7.60000 10.73917 -3.13917 -0.892715 0.778960 0.054083 0.002994 10 7.50000 9.04908 -1.54908 -0.444086 0.778295 0.069182 0.000963

The Model Editor Once the model specification is completed, give the file a title and save it before you run it. The saved file can be used in later sessions.

- the listing of the output will be saved here and then click on ‘Run’

The local parameter estimates will be saved in your named output file e.g. georgia.e00 This can be used for subsequent mapping in ArcGIS…

Summary GWR appears to be a useful method to investigate spatial non-stationarity - simply assuming relationships are stationary over space is no longer tenable GWR can be likened to a ‘spatial microscope’ allows us to see variations in relationships that were previous unobservable Can use GWR as a model diagnostic or to identify interesting locations for investigation. Windows-based software makes it easy to apply to any spatial data set.

Things to watch out for... Local colinearity occurs sometimes esp. with binary variables

Inference in GWR be careful about multiple hypothesis testing issues

Software limits max. no. of explanatory variables = 50 max. no. of observations = 80,000

Running time running a ‘full’ GWR with a large data set and a large model can be time consuming!

Think about your model GWR is unlikely to be able to rescue a poor global model

End of presentation

Bandwidth and Effective Numbers of Parameters As the bandwidth → ∞, the local model will tend to the global model with number of parameters = k. As the bandwidth → 0, the local model ‘wraps itself around the data’ so the number of parameters = n The number of parameters in local models therefore ranges between k and n and depends on the bandwidth. This number need not be an integer and we refer to it as the effective number of parameters in the model

An Example from the Georgia Data Bandwidth and The Effective Number of Parameters Effective Number of Paramaters

160 140 120 100 80 60 40 20 0 0

50

100

Bandwidth

150

200

The reason … reason… Suppose we have a non-stationary process that can be modelled by: yi = α + βi xi but we model it incorrectly with a global model of the form: yi = α + β xi

Real values of βi .9 .8 .7 .6 .5

.8 .7 .6 .5 .4

.8 .6 .5 .4 .3

.7 .5 .4 .3 .2

.5 .4 .4 .2 .1

Estimated value of βi from global model .5 .5 .5 .5 .5

.5 .5 .5 .5 .5

.5 .5 .5 .5 .5

.5 .5 .5 .5 .5

.5 .5 .5 .5 .5

Residuals (yi - yi’) + + + + 0

+ + + 0 -

+ + 0 -

+ 0 -

0 -