A Geotaxonomic Approach to Classification in ... - Wiley Online Library

11 downloads 238303 Views 410KB Size Report
Auto Repair 0.53. 4. Mix. Repair 0.69. 5. Amusement. 0.43. 6. .... gorithm offers the necessary resolution capability for geotaxonomic class- ification in urban and ...
A Geotaxonomic Approach to Classification in Urban And Regional Studies

K . S . 0. Beavon and A . V . Hall*

The recognition of central place hierarchies is fundamental to most studies in urban geography. The work of Christaller, Berry and Garrison, has established the nature Brush, Bracey, and Clark [S,7,8,9,10,12,14] of these hierarchies within both regions and cities. In determining the hierarchies, some form of classification is implicit. The basis for many such classifications is the two-parameter approach first proposed by Berry and Garrison [S], using Clark‘s [I31 definition of groups. Subsequent studies using the parameters of population and functional units are those of King [ZO] in his study of small towns in New Zealand, Stafford’s [ZS] study in Iowa, and Davies’ [IS] analysis of the South African urban hierarchy. Recently, Abiodun [I] has questioned the relevance of accurate population data as a sine qua non for grouping procedures. Multivariate approaches to the general problem of grouping in urban and regional studies offers a solution to this particular problem. Like the earlier studies of Berry [S], those of Murdie, Rees, and Berry [22,24, 51 focus attention on factor and principal components analyses and the use of linkage algorithms to distinguish grouping within hierarchies and regions. The factor analytic procedure involves the use of a product-moment correlation matrix based on normal or normalized data. Much of the data *The authors are indebted to Professor P. D. Tyson and Dr. J. C. Doornkamp for their helpful comments, and to Mrs.H. D. Marais for drawing the diagrams.

Keith Beavon is senior lecturer in human geography, University of the Witwatersrand, Johannesburg. Anthony Hall is senior lecturer in botany, University of Cape Town.

408 / Geographical Analysis required for urban and regional research is difficult to normalize. Consequently, it has been recognized that there is a need to develop further generalized multidimensional measures of similarity as a preliminary to grouping procedures (Spence and Taylor [25] ) . This paper presents a geotaxonomic approach to classification in urban and regional studies. SUGGESTED

MEASURES OF

SIMILARITY

A satisfactory measure of similarity must insure that every item of information contributes to the homogeneity or similarity index in such a way that there will be no undue weighing within the data matrix comprising items (eg., shopping centers, administrative regions) and properties (eg., types of shop, employment categories). Secondly, each property should contribute independently to the measure of similarity, to insure that groups are maximally homogeneous for the greatest number of properties. Two measures that meet these requirements are the relative heterogeneity ( H , ) and the frequency modulated relative homogeneity (H',,)functions.

Relative Heterogeneity In order to calculate the heterogeneity values for dichotomous data, the procedure is to relate each property row in the data matrix to an imaginary maximally heterogeneous group that acts as a standard against which the heterogeneity of the property is compared. The standard has the same number of information items as the data set and is made up of equiprobable proportions of zeros and repeats of the largest property value per property row. The measure of heterogeneity between the actual data set and the standard set is the standard deviation of the property row. The relative heterogeneity H , of i samples averaged over p properties is:

where ajn is the standard deviation of the values for a particular property across a set of items and U I , ~is the same for the standard group with maximum heterogeneity. Whereas the procedure outlined above is suitable for dichotomous data, geographers deal most frequently with ratio data. In such circumstances the H , function reduces to a simple modulus when the relative heterogeneity for a pair of items is calculated. For a property j that has been scaled to unity, the relative heterogeneity function is : I 1In practice each property is divided b the largest value in the property row to give resultant values ranging between zero adunit)..

Research Notes and Comments / 409

More generally

This function resembles the Mean Character Difference of Cain and Harrison [ I l l ,derived on different grounds. This form of H, given in equation (2) is particularly useful in computing a similarity matrix of all possible pairs of items. It is preferable to the taxonomic distance measure d, where larger differences between property values for any pair of items tend to be overemphasized in the process of being squared before addition (Hall [I91) . This is shown for some data sets in Table 1. It will be noticed that the differences between results obtained by H , and d are not exceptionally large. However, although its semigeometric derivation may be attractive, the taxonomic distance function has distortions that make it undesirable, a view also put forward by Colless [IS]. Relative homogeneity H’* is given by H,=l-H,. (3)

Frequency Modulated Relative Homogeneity In order to emphasize similarity among the property rows that have higher frequencies of occurrence the frequency modulated relative homoTABLE 1 A CoMPARISON OF RESULTS FOR THE RELATIVE HETEROGENEITY FUNCTION Ha AND THE TAXONOMIC DISTANCE MEASURE d USING HYPOTHETICAL DATA

1 2 1

0.1

1.0

0.1

0.2

0.1 0.1

05

2 1 2 1

0.1 0.1

0.3

0.1

0.2 0.2 1.0

2

1 2 3

0.1 0.1 0.1 0.1

0.2 0.2

0.2 0.2

S(O.9 + 0.1) =0.50 S(0.4 0.1) =0.25 S(0.2 + 0.1)

+

-0.15

+

W(O.1 0.1) =0.10

+

W(O.9 0.1 -0.37

+ 0.1)

CW(0.81 + O.Ol)]”’ =0.64

[W(O.lS + O.Ol)]“z =0.29 CW(O.04 + 0.01)l”~ -0.16

[% (0.01

=0.10

[%(0.81 =0.53

+ 0.01)

11’2

+ 0.01 + O.Ol)]”’

410 / Geographical Analysis geneity function H, is best used. For a set of k items and j - 1 properties H q n is given by

. . .p

‘, can be rewritten In a manner similar to the derivation of equation (2) H for ease of computing a similarity matrix for all possible pairs of items. Thus for a set of t to k items and j - 1 . . . p properties H, is given by

where a j t and afkare the number of units of property j in items t and k respectively. The primed variables indicate scaled property data. The first expression in square brackets serves to give the frequency-modulating factor. The homogeneity is found by the second expression in square brackets where the absolute difference between the scaled values for a given property is subtracted from unity. The calculation of H’, for a set of items t and k is shown in Table 2.

THECLUSTERING PROCEDURE In general, the object of many clustering algorithms is to give a balance between maximized packing and the greatest informational homogeneity at any level of the classification. To make allowance for sampling variations, the clustering algorithm is written in such a way to c o n w e space between distinctive minority sets. For most purposes, a classification using space TABLE 2 HYPOTHETICAL SCALED DATA FOR T W O ITEMS t I’mperty mW

scaled pBta

for two items t b

0.0 1.0 0.1 0.6 0.0 0.8

0.0 1.o

0.1 0.4 1.o 0.8

AND

k

WITH S I X STRATA’

Modulating Factor, M

Honmgenei!y

MH

015.810 215.8-0.35 0.215.8 =0.04

1.o 1.0

0.00 0.35

1.0

0.04

1/5.8=0.17 115.8 0.17 1.6/5.8=0.28

0.8 0.0 1.0 Ww

0.14 0.00 0.28 0.81

p.

value,

H

=

T h e sum of the M H producb gives n frequency modulnM relative homogeneity of 0.81

Research Notes and Comments / 411 conservation is desirable. The calculated average member linkage algorithm is of such a nature. In this procedure every unlinked item and cluster is treated as a separate group. The values of each property are averaged over the items comprising a cluster to give a measure of central tendency which is in essence a centroid, a term already reserved for a somewhat different situation in group-forming (see Lance and Williams [ZI]). The term calculated average member is used instead. The best cluster involves finding the link among the average members and items that shows the highest similarity level. After each fusion the average member values are calculated and the procedure of scaling properties to unity and calculating a new similarity matrix for the remaining items (or clusters) is repeated, until all clusters and/or single items are grouped into one set. Further details concerning the techniques here proposed for geo-taxonomic purposes of classification can be obtained from the work Beavon [Z] and Hall [ M I .

DERIVING MULTIFACTOR UNIFORM

REGIONS

In order to illustrate the use of the frequency modulated relative homogeneity function and the use of calculated average member linkage in determining multifactor Uniform regions, Berry’s data [3],consisting of the number of service establishments per loo0 persons over six categories in each of the nine census divisions of the U.S.A. is used. These are set out in Table 3. Berry derives a linkage tree based on direct factor analysis to show four major groups, Figure 1. In offering alternative methods of regionalization for the same data and using squared Euclidian distance as a measure of similarity, Pocock and Wishart [23] suggest different groupings based on group-average and centroid algorithms, Figure 2. In Figure 2b, based on the TABLE 3 SERVICES PER

THOUSAND POPULATION FOR U.S.A. CENSUS TRACTS, 1954.*

Services

1. 2. 3. 4. 5. 6.

Personal Business Auto Repair Mi x. Repair Amusement Hotels, etc.

(1)

(2)

2.56 0.57 0.53 0.69 0.43 0.46

2.70 0.72 0.54 0.72 0.41 0.25

*Source: Berry [3].

ki

d v j ,

ki

(3)

(4)

(5)

(6)

2.10 0.50 0.52 0.68 0.46 0.30

2.11 0.47 0.71 0.84 0.56 0.53

1.74 0.38 0.49 0.53 0.42 0.42

1.38 0.25 0.38 0.41 0.33 0.22

(7)

2.04 0.45 0.68 0.80 0.45 0.40

(8)

1.92 0.57 0.70 0.78 0.55 1.24

(9)

2.37 0.87 0.82 0.87 0.51 0.63

412

/ Geographical Analysis

1

3

2

4

7

8

9

5

6

1st

2nd

3 rd VI

2 4th ._ v1

1 5th

6th 7th Final Fourfold Regional Grouping

1, 3 , 2, 4,

7

a,

9

5. 6

FIG 1. Linkage Tree of U.S.A. Census Divisions, Source: Berry [3].

I 2 3 7 4 9 8 1

CENTROID METHOD

I n i t i a l and final interpretation

GroaPI

?VO

I,

2.

5. f,

I.

7. 4 , 9. 8

1 2 3 7 4 9 8 !

Y

b GROUP AVERAGE METHOD

I.

2.

1.

i.

4.

q.

a

5. 6

FIG.2. Dendrograms of U.S.A. Census Divisions Based on ( a ) Centroid and (b) Group Average Methods of Linkage. (The relationship between groups is given by the vertical scale of squared Euclidian distance. The number of groups a t any level is emphasized by the thickness of the stems in the dendrogram. Source: Pocock and Wishart [23].)

Research Notes and Comments / 413

I

- 1.00

5

6

- .95 - .90 -

3

- ,a5

Y

c

M Y

E

- .80 - .75 - .70

Three main g r o u p s a t t h e !f'rl,,,l e v e l of 0 . 8 5 v i z . : 1, 2 , 4 , 7 . 3 , 9 8

5, 6 A t the H'

'im

l e v e l of 0 . 7 9 t h e r e dre t w o

main g r o u p s v i z . 1, 2 , 4 ,

7 , 3,

9,

8

FIG.3. U.S.A. Census Divisions Grouped by the Calculated Average Member Linkage Algorithm Based on the Frequency Modulated Relative Homogeneity Measure of Similarity.

group-average method, three groups comprising census tracts 1, 2, 3, 7, 4 and 8, 9 and 5, 6 are apparent. By contrast in Figure 2a, based on the centroid method, only two groups (1, 2, 3, 7, 4, 9, 8 and 5, 6) are evident. Pocock and Wishart point out that the centroid method fuses samples 4, 9, 8 to the group (1, 2, 3, 7) individually, while the group-average forms the group 8, 9 before fusing the two groups. Since the level of fusion for the two groups by group-average is a relatively small increase on the previous level, the authors point out that the validity of 8, 9 as a separate group is questionable. They suggest that from inspection of the two dendrograms,

414 / Geographical Analysis the two final groupings 1, 2, 3, 7, 4, 8, 9 and 5, 6 are mutually dissimilar and probably constitute the realistic classification. In the alternative approach suggested in this paper only two groups can be distinguished at the 80 percent level of homogeneity Figure 3. In this respect this result accords with that of Figure 2a and bears little resemblance to the groups proposed by Berry. On the question of substructural detail it is clear from Figure 3 that at a homogeneity level of at least 85 percent 3 groups 1, 2, 4, 7, 3, 9 and 8 and 5, 6 are evident, suggesting that at this level tract 8 is to be regarded as a complete separate entity. This situation is emphasized when the fusion of tract 9 to the group 1, 2, 4, 7, 3 is considered. Group 1, 2 , 4 , 7 , 3 consists of two subgroups 1 , 2 and 3 , 4 , 7 which link at a lower level than tract 9 links to the overall group 1, 2, 4, 7, 3. CONCLUSIONS

In this paper two alternative measures of similarity, the relative heterogeneity and frequencymodulated relative homogeneity functions, have been presented. Both these functions enable a wide range of geographical data matrices to be analysed. The two functions are closely related and appear less liable to distort relationships between sets of properties than is the case with the taxonomic distance measure. Both offer a useful and refined alternative to group-average and centroid linkage algorithms and other factor analytic techniques. In the analysis of large data matrices, substructural variations assume increasing importance and require a sensitive technique to isolate their detail. It is suggested that the frequency modulated relative homogeneity function in association with the calculated average member linkage algorithm offers the necessary resolution capability for geotaxonomic classification in urban and regional studies. 1,ITEHATURE CITED 1 . ABIODUN,J. 0. “Urban Hierarchies in a Developing Country,” Economic Geography.

43 (1967),347-67. 2. BEAVON, K. S. 0. “An Alternative Approach to the Classification of Urban Hierarchies,” South African Geographical Journal, 52 ( 1970),129-33. 3. BERRY, B. J. L. “A Method for Deriving hlultifactor Uniform Regions,” Prteglod Ceograficzny, - . . 33 (19611,263-82. 4. “Basic Patterns of Economic Development,” in N. Ginsberg (Ed.), Atlas of Economic Development, (Chicago, University of Chicago, Department of Geography, Research Paper, 68,1961),11&19. 5. “Grouping and Regionalizing: An Approach to the Problem Using Multivariate Analysis,” in W. L. Garrison and D. F. rclarble (Eds.), Quantitatipe Ceogruphy, Part I: Economic and Cultural Topics, (Evanston, Northwestern University, Studies in Geography, 13,1967),219-51. 6. BERRY,B. J. L. and W. L. GARRISON. “Functional Bases of the Central Place Hierarchy,” Economic Geography, 34 (1958),134-54.

. .

Research Notes and Comments / 415 7. a. 9. 10. 11.

12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

. .

“Recent Developments of Central Place Theory,” Regional Science Association Papers and Proceedings, 4 ( 1958),107-120. “A Note on Central Place Theory and the Range of a Good,” Economic Geography, 34 (1958),304-311. BRUSH;J. E. “The Hierarchy of Central Places in Southwestern Wisconsin,” Geographrcal Review, 43 (1953),380-402. BRUSH,J. E. and H. E. BRACEY.“Rural Service Centres in Southwestern Wisconsin and Southern England,” Geographical Review, 45 (1955),559-69. CAIN, A. J. and J. A. HARRISON.“An Analysis of the Taxonomic Judgment of Afiinity,” Proceedings of the Zoological Society of London, 131 (19581,85-98. CHRISTALLER, W. Central Places in Southern Germany. (Translated by C . 1%’. Baskin). Englewood Cliffs, N.J.: Prentice-Hall, 1966. CLARK,P. J. “Grouping in Spatial Distribution,” Science, 123 ( 19561,373-74. CLARK,W. A. V. “The Spatial Structure of Retail Functions in a New Zealand City,” New Zealand Geographer, 23 (1967),23-33. COLLESS,D. H. “An Examination of Certain Concepts in Phenetic Taxonomy,” Systematic Zoology, 16 (1967),627. DAVIES,R. J. “The South African Urban Hierarchy,” South African Geographical Journal. 49 (1967),9-19. HALL,A. V. “Methods for Demonstrating Resemblance in Taxonomy and Ecology,” Nature, 214 (1967),830-31. “Avoiding Informational Distortions in Automatic Grouping Programs,” Systematic Zoology, 18 (1969),318-29. “Group Forming and Discrimination with Homogeneity Functions,” in A. J. Cole (ed.), Numerical Taxonomy. London: Academic Press, 1969,53-68. KING,L. J. “The Function Role of Small Towns in Canterbury,” Proceedings of the Third New Zealand Geographical Conference, ( Palmerston North, 1962). 139-49. LANCE,G. N. and W. T. WILLIAMS.“Computer Programs for Hierarchical Polythetic Classification (“Similarity Analyses”),” Computer journal, 9 (1966),60-64. MURDIE.R. A. The Factorid Ecology of Metropolitan Toronto, 1951-1961. Chicago: University of Chicago, Dept. of Geography Research Paper 116, 1968. POCOCK, D. C. D. and D. WISHART.“Methods of Deriving Multifactor Uniform Regions,” Transactions of the Institute of British Geographers, 47 (1969)73-98. REES, P. H. “The Factorial Ecology of Metropolitan Chicago,” in B. J. L. Berry and F. E. Horton (Eds.), Geographic Perspectives on Urban Systems. Englewood Cliffs, N.J.: Prentice Hall, 1970.Pp. 306-397. SPENCE,N. A. and P. J. TAYLOR. “Quantitative Methods in Re ’onal Taxonomy,” in C. Board, R. J. Chorley, P. Haggett, and D. R. Stoddart (2.1 Progress , in Geography, 2 (1970),1-64. STAFFORD, H. A. “The Functional Bases of Small Towns,” Economic Geography, 39

. .

(1963),16S75.