View - Universidade de Lisboa

4 downloads 254 Views 311KB Size Report
governmental decisions. Jo˜ao Oliveira Soares a,*, Maria Manuela Lourencflo MarquÍes b, ... 218417979. E-mail addresses: [email protected] (J.O. Soares), .
European Journal of Operational Research 145 (2003) 121–135 www.elsevier.com/locate/dsw

O.R. Applications

A multivariate methodology to uncover regional disparities: A contribution to improve European Union and governmental decisions Jo~ ao Oliveira Soares a,*, Maria Manuela Lourencßo Marqu^es b, Carlos Manuel Ferreira Monteiro a a

b

CEG-IST – Instituto Superior T ecnico, Av. Rovisco Pais, 1049-001 Lisbon, Portugal Departamento de Matem atica, Faculdade de Ci^ encias e Tecnologia – U.N.L., Lisbon, Portugal Received 2 March 2001; accepted 12 December 2001

Abstract The aim of this paper is to present a new methodology to classify the levels of socio-economic development of a countryÕs territory, in order to support regional development policy. This classification is obtained through the use of multivariate statistical methods – factor and cluster analysis, and is based on a wide number of demographic, economic, health, education, employment and culture indicators. The Portuguese continental territory is used as the working example. Results lead to the identification of nine axes of socio-economic characterisation, and the division of the Portuguese territory into four regions with differing degrees of development, reflecting the well-known asymmetry between coastal and inland zones. The Ôsocio-economic regionsÕ uncovered with this methodology allow a much more useful characterisation of the Portuguese territory, for policy making, than does the NUTS-2 classification scheme used by the European Union. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Government; Regional development; Multivariate statistics; Factor analysis; Cluster analysis

1. Introduction In Continental Portugal there are essentially two levels of (elected) political authority: national government and municipal government. Between

* Corresponding author. Tel.: +351-218417729; fax: +351218417979. E-mail addresses: [email protected] (J.O. Soares), [email protected] (C.M.F. Monteiro).

them, there is an intermediate level – the administrative regions. They constitute the basis for regional development policy, and correspond to 5 of the 206 NUTS-2 regions of the European Union – Norte, Centro, Lisboa e Vale do Tejo, Alentejo and Algarve (see Fig. 1). According to the last figures published by the European Commission (see European Commission, 1999), three of these regions – Norte, Centro and Alentejo – are amongst the poorest 25 of the Union. They have very similar GDP per capita

0377-2217/03/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 0 2 ) 0 0 1 4 6 - 7

122

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

Fig. 1. NUTS-2 Regions of Continental Portugal.

(measured in terms of purchasing power standards, PPS) – respectively 62%, 60% and 58% of the UnionÕs average in 1995. This percentage is little higher for Algarve (70), but only that of Lisboa e Vale do Tejo (89) exceeds the threshold of 75, established to limit the regions that constitute a priority in the application of the Structural Funds (the so-called Ôobjective 1Õ regions: see European Council, 1999). These figures give us a rough idea about the relative situation of the five regions. The question under study here is whether they provide a relevant framework to establish a regional development policy. The same question could obviously be extrapolated for other countries. In fact, as noted by Lipshitz and Raveh (1998), Ôresearch on regional disparities and on policies for reducing them devotes little attention to socio-economic distinctions

within peripheral and core regions, rather stressing disparities between regionsÕ. As we will try to prove, a deeper analysis must take into account smaller geographic units and a broader spectrum of indicators than GDP per capita, Ôthe standard measure of the size and performance of a regional economyÕ (European Commission, 1999). From our point of view, this line of reasoning is consistent with the task of the European Regional Development Fund, as stated by the European Parliament and Council (1999): to provide assistance within the framework of a comprehensive and integrated strategy for sustainable development. It is also consistent with a multifaceted definition of quality of life, which is implicit in the European documents that refer to this matter. However, while this reasoning may be unanimously accepted, its practical implementation requires the development of a methodological framework that can support the political decisions. In this context, our contribution based on the illustrative example studied in this paper focuses on two main objectives: first, to identify a small number of socio-economic dimensions that adequately summarise the information contained in a range of regional indicators; second, to look for homogeneous regions in terms of socio-economic development. The study deals with all the 275 municipalities (concelhos) into which the continental Portuguese territory was divided in 1995. The Portuguese statistical authority (INE – Instituto Nacional de Estatıstica) has provided data concerning demographic, economic, health, education, employment and cultural characteristics of each municipality (covering the different areas mentioned in European Council, 1999; European Parliament and Council, 1999). The methodology used in this work includes multivariate statistical methods, namely factor and cluster analysis. Similar studies, but with different variables and goals, have been conducted for the US and the UK by Ozimek (1993) and Openshaw (1995). For Portugal, one can mention the pioneer work of Lema and Mather (1977) (cf. Lema and Mather, 1970), Ferr~ao and Jensen-Butler (1988) and, more recently, Brand~ao et al. (1998). Nevertheless, none of these studies included as many indicators to mea-

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

sure different aspects of the quality of life of the populations, nor did they fully exploit the potential of the methodology to build an alternative classification of the territory which was useful for regional development policy. The remaining part of this paper is organised as follows. Details of the socio-economic data for each Portuguese municipality are presented in Section 2. Section 3 deals with the identification of the smaller number of socio-economic dimensions underlying the original data structure. In Section 4, cluster analysis is used to uncover groups of municipalities that can be considered homoge-

123

neous in terms of socio-economic development. Finally, the main conclusions of this study are presented in Section 5.

2. The data: Main local socio-economic indicators The variables used in this paper consist of 33 local socio-economic indicators published by the Portuguese statistical authority (INE, 1991, 1995a,b). Their code, description and type can be found in Table 1. With the exception of two variables (the ‘‘labour force as percentage of total

Table 1 Description of variables and respective codes Code

Description

DENSPOP POP_0–24 POP_25–64 TNAT TMORT TMIG IMPORT EXPORT ALOJAM EMP_ABC EMP_DEF EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PODCOMP

Dem – population density (inhabitants per km2 ) Dem – percentage of total population aged up to 24 years Dem – percentage of total population between 25 and 64 years of age Dem – number of births per 1000 inhabitants Dem – number of deaths per 1000 inhabitants Dem – migration arrivals minus migration departures per 1000 inhabitants Eco – value of imports per inhabitant (104 Pte.) Eco – value of exports per inhabitant (104 Pte.) Eco – number of hotel beds per 1000 inhabitants Eco – number of firms in the primary sector per 1000 inhabitants Eco – number of firms in the secondary sector per 1000 inhabitants Eco – number of firms in the tertiary sector per 1000 inhabitants Eco – bank deposits per inhabitant (103 Pte.) Eco – Town Council expenditure per inhabitant (103 Pte.) Eco – domestic consumption of electricity per inhabitant (10 kW h) Eco – industrial consumption of electricity per inhabitant (10 kW h) Eco – amount of water distributed by the Town Council per inhabitant (10 Pte.) Eco – number of telephones per 100 inhabitants 100 or 1000 Eco – sales of fuel per inhabitant (kg) Hea – number of hospitals per 1000 inhabitants Hea – number of health centres per 1000 inhabitants Hea – number of hospital beds per 1000 inhabitants Hea – number of medical doctors per 1000 inhabitants Hea – number of deaths before the age of 1 year per 1000 births Edu – number of schools at the basic level per km2 Edu – number of schools at the secondary level per km2 Edu – number of schools at the college level per km2 Cul – number of attendances of public performances per inhabitant Cul – libraries per 1000 inhabitants Env – ratio of Town Council environmental expenditure to T.C. total expenditure Emp – labour Force as percentage of total population Emp – unemployment rate Econ – purchasing power indicator, per capita (composite index)

There are six types of indicators Dem – demography; Eco – economy; Hea – health; Edu – education; Cul – culture; Emp – employment.

124

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

population’’ and the ‘‘unemployment rate’’), which refer to 1991, all variables report to the year of 1995. It is important to notice that there is a greater number of economic indicators compared to the other areas due to the need for characterising different aspects within this area – importance of agriculture, industry and services, tourism, import–export activity, consumption of the households, town council expenditures and the purchasing power. Concerning the other indicators, despite our intention to be parsimonious, we still retained many of the available statistics, after consulting different experts in the respective areas.

The descriptive statistics shown in Table 2 reflect some huge differences in the 275 municipalities. For example, the population density (DENSPOP) has a range of more than 7000 inhabitants per km2 , and its maximum value is about 1000 times larger than its minimum value; the figures of the different consumption per capita (ELECDOM, ELECIND, AGUA and COMBUST) are also remarkably disparate; and one should also stress the major variation coefficient of the variable that measures the migration flow (TMIG) – 27. In order to eliminate the effects due to variables having been measured on different scales, further analysis will be carried out with

Table 2 Descriptive statistics Variable

Mean

DENSPOP POP_0–24 POP_25–64 TNAT TMORT TMIG IMPORT EXPORT ALOJAM EMP_ABC EMP_DEF EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

281.1 32.2 48.9 9.3 12.8 0.2 19.8 19.6 19.1 17.1 26.3 56.2 1307.2 82.1 25178.6 37045.3 197862.6 31.8 972.5 0.0 0.5 2.3 1.2 7.2 0.3 0.0 0.0 0.3 0.2 9.1 40.3 6.5 60.9

Median 70.7 31.9 49.0 9.1 12.3 0.6 5.4 3.9 0.0 16.1 23.5 50.1 1134.5 74.9 7696.3 4917.7 41301.0 31.3 749.3 0.0 0.4 1.1 0.8 5.5 0.2 0.0 0.0 0.0 0.1 7.4 40.2 5.7 51.2

Note: See variables description on Table 1.

S.D.

Kurtosis

Skewness

Minimum

849.3 4.6 3.4 2.5 3.9 6.4 46.3 44.3 93.2 11.6 22.0 43.2 1108.7 34.8 54331.0 82369.5 585056.2 8.0 3536.1 0.0 0.4 4.1 1.7 8.7 0.5 0.1 0.1 0.6 0.1 6.6 5.7 3.4 33.9

48.9 0.0 )0.6 )0.1 3.7 1.9 52.4 35.4 163.8 3.4 101.6 137.8 61.8 2.3 36.1 14.9 60.2 2.7 253.1 55.9 2.9 23.6 55.6 5.2 51.0 88.2 134.7 29.2 2.7 4.0 )0.5 4.5 14.8

6.5 0.2 0.0 0.3 1.2 )0.1 6.2 5.2 11.8 1.2 9.1 10.7 6.9 1.4 5.3 3.6 6.9 0.8 9.6 5.8 1.6 4.2 6.4 1.9 6.5 8.9 11.6 4.7 1.5 1.8 0.0 1.7 2.9

7.2 20.6 39.5 3.6 5.6 )26.0 0.0 0.0 0.0 0.3 3.1 4.8 0.0 34.9 45.8 6.8 159.0 6.4 104.4 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 25.7 1.7 22.8

Maximum 7786.3 43.1 56.8 16.6 34.2 22.2 511.4 412.6 1369.5 82.0 294.5 649.8 12652.8 222.5 492354.1 581223.9 6483982.0 68.7 58210.1 0.3 2.2 34.4 18.7 56.6 4.9 1.2 1.1 5.7 0.6 41.5 53.0 24.4 314.1

Var. coef. 3.0 0.1 0.1 0.3 0.3 26.8 2.3 2.3 4.9 0.7 0.8 0.8 0.8 0.4 2.2 2.2 3.0 0.3 3.6 2.3 0.8 1.8 1.3 1.2 1.9 4.1 8.6 2.2 0.7 0.7 0.1 0.5 0.6

Table 3 Correlation matrix TMORT

TMIG

IMPORT

EXPORT

ALOJAM

EMP_ABC

EMP_DEF

DENSPOP POP_0–24 POP_25–64 TNAT TMORT TMIG IMPORT EXPORT ALOJAM EMP_ABC EMP_DEF EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

1.00 0.08 0.39 0.21 )0.25 )0.09 0.40 0.14 )0.01 )0.34 )0.02 0.10 0.17 )0.12 0.78 0.35 0.79 0.39 0.00 0.20 )0.25 0.26 0.57 0.01 0.97 0.87 0.70 0.48 0.06 0.15 0.37 0.01 0.69

1.00 0.08 0.77 )0.72 0.22 0.09 0.21 0.00 )0.46 )0.07 )0.14 )0.08 )0.54 0.12 0.22 0.02 )0.26 0.01 0.02 )0.59 )0.03 0.01 0.10 0.17 0.00 )0.06 )0.02 )0.32 )0.02 0.34 )0.28 0.00

1.00 0.37 )0.51 0.38 0.38 0.34 0.11 )0.23 0.15 0.17 0.01 )0.31 0.46 0.50 0.40 0.45 0.14 0.22 )0.43 0.20 0.40 )0.09 0.36 0.27 0.14 0.29 )0.10 0.44 0.79 0.08 0.66

1.00 )0.38 )0.23 )0.31 )0.01 0.48 )0.04 0.02 0.03 0.61 )0.31 )0.42 )0.21 0.02 )0.04 )0.09 0.61 )0.05 )0.21 0.01 )0.30 )0.13 )0.02 )0.11 0.36 )0.22 )0.61 0.20 )0.30

1.00 0.14 0.23 0.11 )0.18 0.13 0.07 )0.05 )0.19 0.01 0.20 )0.05 0.07 0.11 )0.08 )0.37 )0.18 )0.08 )0.06 )0.11 )0.22 )0.29 )0.03 )0.24 0.31 0.38 )0.13 0.07

1.00 0.79 )0.04 0.20 0.01 0.05 0.07 )0.10 0.41 0.41 0.42 0.28 0.26 0.12 )0.22 0.09 0.28 )0.03 0.41 0.38 0.31 0.23 )0.02 0.26 0.35 )0.04 0.45

1.00 )0.04 )0.23 0.02 0.00 )0.02 )0.16 0.19 0.43 0.13 0.15 0.40 0.05 )0.23 0.01 0.10 0.00 0.15 0.12 0.06 0.08 )0.12 0.25 0.37 )0.08 0.25

1.00 )0.04 0.06 0.17 0.04 0.23 0.05 )0.03 0.06 0.36 0.03 )0.02 )0.08 )0.01 0.04 0.03 )0.01 0.00 0.02 0.16 )0.09 0.13 0.13 )0.09 0.18

1.00 0.37 0.36 0.28 0.34 )0.39 )0.40 )0.31 )0.12 )0.03 )0.17 0.37 )0.20 )0.30 )0.09 )0.39 )0.26 )0.15 )0.24 0.22 )0.08 )0.31 0.20 )0.32

1.00 0.90 0.75 0.05 )0.01 0.02 )0.02 0.12 0.02 )0.03 )0.01 )0.06 )0.02 )0.03 )0.01 )0.03 )0.03 )0.03 0.05 0.02 0.11 )0.06 0.05

EMP_GQ

DEPOSITO DESPCAM ELECDOM

ELECIND

AGUA

TELEFONE COMBUST HOSP

CENTSAU

CAMAS

EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND

1.00 0.81 0.18 0.10 0.02

1.00 0.13 0.17 0.00

1.00

1.00 )0.18 )0.29

1.00 )0.67 0.27 0.19 0.29 0.18 )0.45 0.10 0.05 )0.02 )0.44 0.30 0.36 0.20 0.09 0.01 0.14 )0.57 0.10 0.21 0.06 0.27 0.12 0.03 0.17 )0.30 0.18 0.59 )0.23 0.31

1.00 0.58

125

POP_25–64 TNAT

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

DENSPOP POP_0–24

126

Table 3 (continued)

MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

EMP_GQ

DEPOSITO

DESPCAM ELECDOM

ELECIND

AGUA

TELEFONE COMBUST

HOSP

CENTSAU CAMAS

0.12 0.28 0.03 0.02 0.01 0.02 0.12 )0.03 0.09 0.12 0.10 0.13 0.13 0.08 0.08 0.01 0.21

0.22 0.22 0.00 0.06 )0.03 0.12 0.21 )0.02 0.21 0.28 0.29 0.24 0.23 )0.03 )0.02 )0.06 0.25

)0.08 0.14 0.07 )0.08 0.46 )0.07 )0.13 0.02 )0.16 )0.01 0.05 0.03 0.40 )0.18 )0.42 0.19 )0.14

0.47 0.20 0.05 0.13 )0.31 0.16 0.40 0.00 0.35 0.24 0.17 0.24 )0.13 0.28 0.51 )0.08 0.45

1.00 0.49 0.00 0.22 )0.23 0.34 0.68 0.00 0.83 0.82 0.78 0.63 0.09 0.24 0.36 0.00 0.75

1.00 0.11 0.20 )0.02 0.24 0.48 0.00 0.36 0.40 0.34 0.49 0.12 0.37 0.28 0.04 0.73

1.00 )0.06 0.78 0.36 0.11 0.22 0.26 0.23 0.31 0.06 0.19 0.18 0.03 0.31

1.00 )0.05 )0.21 0.05 )0.29 )0.16 )0.09 )0.17 0.24 )0.18 )0.54 0.24 )0.32

0.94 0.43 0.00 0.21 )0.30 0.33 0.70 0.00 0.82 0.77 0.73 0.56 )0.01 0.28 0.45 )0.07 0.73

1.00 )0.01 )0.05 )0.02 0.02 )0.06 )0.02 0.00 0.00 0.10 0.04 0.01 0.10 0.06 0.16

1.00 0.57 0.08 0.29 0.34 0.35 0.40 0.15 0.17 0.13 0.02 0.41

MEDICOS TMORTINF ENSBAS

ENSSEC

ENSSUP

ESPECT BIBLIOT

AMB_TOT TACTIV TDES

PCOMP

1.00 )0.02 0.61 0.61 0.59 0.56 0.21 0.28 0.35 )0.02 0.72

1.00 0.92 0.59 0.18 0.09 0.25 0.01 0.71

1.00 0.59 0.22 0.05 0.12 0.02 0.61

1.00 0.17 0.19 0.28 0.05 0.69

1.00 0.35 0.01 0.41

1.00

1.00 0.02 0.02 0.02 )0.02 0.03 )0.05 )0.07 )0.04 )0.01

1.00 0.92 0.78 0.53 0.05 0.12 0.37 )0.07 0.69

1.00 )0.07 )0.20 0.27 0.08

1.00 )0.08 0.56

1.00 0.02

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

standardised variables. Other transformations could also be suggested by looking at the high values of skewness and kurtosis found in some variables. However, our analysis revealed that it was not possible to find a transformation that would guarantee the normality (or quasi-normality) of all the variables. In a situation such as this, the decision to transform only one subset of the variables leads to an arbitrary modification of the discriminative capacity of only some aspects (variables) under consideration. Consequently, and as the methods employed in this study do not make any distributional assumptions (Sharma, 1996; Hair et al., 1998), no transformation of the data was performed. We may also look to the correlation matrix (Table 3), which reveals the existence of strong relationships between some variables. While some of these interdependencies were expected, others were not. Some of these aspects will now be considered: ii(i) The population density is obviously closely correlated with the various types of educational settings per km2 . It is also important to notice its strong correlation with the purchasing power indicator and the domestic consumption of electricity and water. i(ii) The birth rate and the percentage of young population are inversely and highly correlated with the death rate, suggesting the gradual ageing of some municipalities. (iii) The value of the imports is highly correlated with the value of the exports (79%), reflecting the dependency on foreign supplies from the Portuguese export sector. (iv) Intriguingly, the number of health centres per 1000 inhabitants is highly correlated with the death rate. The explanation for this might be because the mortality rate is higher in the less populated and wider municipalities. (v) Also intriguing is the lack of correlation between the number of schools at the secondary level per km2 and the percentage of total population aged up to 24 years. This is understandable, however, if one notices that the latter variable has a low correlation with the population density.

127

3. Identification of the underlying socio-economic dimensions Factor analysis will be used to look for a small number of socio-economic dimensions that adequately summarise the information contained in the original set of variables. This analysis is a class of multivariate statistical methods aimed at investigating the dimensions or constructs assumed to underlie a set of interdependent variables (Kim and Mueller, 1978). Exploratory factor analysis summarises the information contained in the set of original variables into a smaller group of factors (or dimensions) with minimum loss of information (Gorsuch, 1983). The derived factors are linear combinations of sets of original, highly correlated, variables. Factor analysis involves various steps (Norusis, 1990). Firstly, based on the correlation matrix for all variables, the appropriateness of the factor model has to be evaluated. Secondly, it is necessary to decide which factor model should be used, the number of factors that should be extracted, and to assess how well the model fits the original data. Thirdly, the choice of the rotation method to make factors more interpretable needs to be made. Finally, the computed factor scores can be used in other statistical analyses. This methodology has been applied in the present research. 3.1. Evaluating the appropriateness of factor analysis Evaluating the appropriateness of factor analysis means assessing whether the variables are significantly and sufficiently correlated with each other so that their number can be reduced by applying the factor analytic model. This can be done with a visual inspection of the correlation matrix for all variables, and by computing some statistics, including the Bartlett test of sphericity and the Keiser–Meyer–Olkin measure of sampling adequacy. The correlation matrix reveals that all but two variables have at least one correlation coefficient with an absolute value larger than 0.3, the value that Kinnear and Gray (1994) suggest as the minimum value for including a variable in the

128

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

analysis. Both the Bartlett test of sphericity, with a figure of 8668.7 and an associated probability less than 0.05, and the Keiser–Meyer–Olkin measure of sampling adequacy, with a measure of 0.71, considered as ‘‘middle’’ by Kaiser (1974), suggest that the data structure is adequate to be subjected to factor analysis. For this reason, all variables have been included in the analysis, although one should expect to find two factors, each one representing each of the two variables with correlation coefficients less than 0.3 (the ‘‘number of deaths before the age of 1 year per 1000 births’’ and the ‘‘unemployment rate’’). 3.2. Methods for factor extraction Kline (1994) suggests that to check for the stability and robustness of the solution, data should be subjected to principal components factor analysis and to maximum likelihood factor analysis. The maximum likelihood method has the additional advantage of providing a goodness-of-fit test for the adequacy of a K-factor model. However, the maximum likelihood method was not used in this study due to the fact that variables are not normally distributed. 3.3. Criteria to decide on the number of factors to extract Five criteria are frequently used to decide the number of factors to extract. These are the eigenvalue criterion; the scree test criterion (Catell, 1966); the percentage of variance criterion; the test of fit on the number of factors, provided by the maximum likelihood method; and finally, the interpretability of the factor structure solution (Hair et al., 1998; Kline, 1994). The eigenvalue criterion considers that all factors having eigenvalues greater than 1 should be retained. The rationale for the eigenvalue criterion is that any factor should account for at least the variance of a single variable. Looking at Table 4, this means that eight or nine factors should be retained. The scree test indicates the maximum number of factors to extract. This is given by the point at which the curve first begins to level out to become

Table 4 Results of principal components factor analysis Factor

Eigenvalue

Percent of variance

Cumulative percent of variance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

9.32516 4.69876 2.92866 2.10624 1.77361 1.48180 1.30103 1.03655 0.96247 0.78268 0.72619 0.71190 0.63651 0.59930 0.51752 0.49056 0.41701 0.34728 0.33904 0.27651 0.25573 0.22669 0.20070 0.16912 0.15401 0.14240 0.11022 0.08924 0.07357 0.04889 0.03846 0.02647 0.00572

28.2581 14.2387 8.87473 6.38255 5.37456 4.49031 3.94251 3.14107 2.91656 2.37177 2.20058 2.15728 1.92881 1.81606 1.56825 1.48654 1.26366 1.05236 1.02739 0.83792 0.77493 0.68694 0.60817 0.51250 0.46669 0.43151 0.33400 0.27044 0.22295 0.14815 0.11654 0.08022 0.01734

28.25805 42.49672 51.37145 57.75400 63.12856 67.61887 71.56137 74.70245 77.61901 79.99078 82.19136 84.34864 86.27745 88.09350 89.66176 91.14830 92.41195 93.46431 94.49170 95.32962 96.10455 96.79150 97.39966 97.91216 98.37885 98.81037 99.14437 99.41480 99.63776 99.78590 99.90244 99.98266 100.00000

horizontal. Looking at Fig. 2, this points to the selection of nine factors. The percentage of variance criterion suggests that one should extract all factors that account for at least 60% (approximately) of the variance of the original variables. Although no absolute cutoff point has been adopted for all data, this figure is normally accepted as satisfactory in the Social Sciences. Looking at Table 4, this means that the minimum number of factors that should be retained is five. After the number of candidate factor solutions has been determined through the use of the abovementioned criteria, the final number of factors to extract has to pass the interpretability test. In

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

129

Fig. 2. Scree plot for principal components factor analysis.

practice, the ability to interpret and assign some meaning to the factors, acts as an extremely important criterion in determining the final number of factors to extract (Hair et al., 1998). Nine factors explaining 77.6% of the total variance of the original variables were retained. Varimax rotation was then used to provide a more interpretable factor structure. Varimax rotation, which imposes an orthogonal structure on the data, should always be used when the resulting factor scores are to be analysed by other statistical procedures, as is the case in the present study (Hair et al., 1998). The rotated factor matrix is shown in Table 5. 3.4. Assessing the quality of the resulting factor matrix In an ideal solution matrix each variable would only load on one factor with a score of one, and would not load at all on the other factors. However, in practice, factor loadings greater than 0.30 are considered significant; whereas loadings greater than 0.50 are considered very significant (Hair et al., 1998). Table 5 reveals that all variables have at least one factor loading greater than 0.489, whereas the great majority of the variables have very high loadings on only one factor. A good factor solution should also account for between 50% and 70% of the amount of variance

of each individual variable. The 9-factor structure found explains between 52% and 93% of the variance of each individual variable. Moreover, for 22 out of 33 variables, the model accounts for more than 75% of variance of each individual variable. This highlights the very good quality of the results of the factor analytic model. 3.5. Naming the factors The first factor, labelled ‘‘Purchasing Power and Population Density’’, has high loadings on the population density, domestic electricity consumption, amount of water distributed by the Town Council, number of schools at the basic level, number of schools at the secondary level, number of schools at the college level, and purchasing power indicator. The second factor has positive high loadings on the percentage of total population aged up to 24 years, and the number of births; and negative high loadings on the number of deaths, and number of health centres. We also notice the significant positive loading on the migration variable. This factor was labelled ‘‘Demographic Mobility’’. The third factor has high loadings on the variables ‘‘Number of firms in the secondary sector’’, ‘‘Number of firms in the tertiary sector’’, and ‘‘Bank deposits’’. This factor was named ‘‘Private Business’’.

130

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

Table 5 Varimax rotated factor matrix Variable

Factor 1

Factor 2

Factor 3

Factor 4

Factor 5

Factor 6

Factor 7

Factor 8

Factor 9

DENSPOP POP_0–24 POP_25–64 TNAT TMORT TMIG IMPORT EXPORT ALOJAM EMP_ABC EMP_DEF EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

0.8971 )0.0156 0.2809 0.1163 )0.1247 )0.2244 0.3748 0.0721 0.0114 )0.3106 )0.0651 0.0841 0.2411 )0.0238 0.8724 0.3485 0.9108 0.4438 )0.0538 0.1435 )0.1690 0.2819 0.6773 )0.0088 0.9333 0.9480 0.8894 0.6247 0.1652 0.0953 0.2565 )0.0102 0.7319

0.1385 0.8893 0.3620 0.8246 )0.8844 0.3640 0.0490 0.1788 0.0476 )0.4951 0.0477 )0.0423 )0.0491 )0.6359 0.1741 0.2872 0.0607 )0.1749 0.0602 0.0569 )0.7515 0.0100 0.0873 0.0291 0.2016 0.0152 )0.0816 0.0610 )0.3541 0.0133 0.5812 )0.2064 0.1513

0.0078 )0.0776 0.0859 0.0473 0.0042 0.0565 0.0148 )0.0097 0.0495 0.4579 0.9479 0.9458 0.8829 0.1110 0.0035 )0.0358 0.0197 0.1327 )0.0005 )0.0012 )0.0180 )0.0097 0.0338 )0.0172 0.0226 0.0493 0.0591 0.0377 0.1537 )0.0118 0.0452 )0.0651 0.0894

0.0325 0.0630 0.1429 0.0765 )0.0739 0.1276 0.7439 0.8527 )0.0602 )0.1013 0.0025 0.0086 0.0141 0.0863 0.0388 0.2338 0.0380 0.1019 0.7226 0.0210 )0.0668 )0.0300 0.0053 )0.0226 0.0415 0.0663 0.0423 0.0758 0.0554 0.0470 0.1395 )0.0391 0.1548

)0.0219 )0.0121 0.0768 0.0792 )0.0038 )0.1929 )0.0197 )0.0154 )0.0565 )0.1172 )0.0474 )0.0036 0.0832 )0.0827 0.0604 0.0294 0.0661 0.1505 0.0294 0.8764 0.0110 0.8958 0.4179 0.0792 0.0204 0.0919 0.1284 0.3060 0.1516 0.1683 0.0361 )0.0285 0.2273

)0.0589 )0.0509 0.0545 0.1565 0.0908 0.1587 )0.1357 )0.1183 0.8474 )0.0952 )0.0150 0.1308 0.0322 0.4072 )0.0191 )0.1929 0.0310 0.5119 0.2141 )0.0259 )0.0602 )0.0127 0.0588 0.0602 )0.0620 )0.0040 0.0090 0.3301 0.0302 0.0802 0.0541 )0.1026 0.2454

0.0889 )0.1934 0.7170 0.1116 )0.2373 0.4980 0.2743 0.2691 0.1067 )0.0505 0.0968 0.1049 )0.1183 )0.2229 0.2574 0.4892 0.1927 0.4515 )0.1022 0.1011 )0.1381 0.0520 0.2172 )0.0698 0.0116 )0.0675 )0.1552 0.0639 )0.2023 0.7264 0.5558 0.0988 0.4118

0.0363 )0.1436 0.2241 )0.0897 0.0634 )0.0636 )0.1216 )0.1237 )0.1204 0.1764 )0.0156 0.0357 )0.0301 0.1709 )0.0777 )0.1074 )0.0184 0.0409 0.2252 0.0161 0.0684 0.0212 0.0540 0.0337 )0.0243 0.0362 0.0154 0.1359 0.5351 )0.0856 0.1006 0.8348 0.1073

0.0361 0.1198 )0.0942 0.1188 0.0021 )0.0722 0.0714 0.0995 0.0568 )0.1359 0.0104 0.0108 )0.0203 0.0762 0.0313 0.1245 0.0177 0.0077 )0.1976 0.0777 0.1360 0.0228 )0.0594 0.9367 0.0340 0.0060 )0.0163 )0.1022 0.0418 )0.0187 )0.0754 0.0132 )0.0439

7.3641 0.2232

4.4664 0.1353

2.8832 0.0874

2.0154 0.0611

2.0916 0.0634

1.5624 0.0473

2.8525 0.0864

1.3081 0.0396

1.0707 0.0324

Expl. Var. Prp. Totl

Factor loadings (varimax normalised). Extraction: principal components (marked loadings are >.7).

The fourth factor, named ‘‘Industrial Activity’’ represents the variables ‘‘Value of imports’’, ‘‘Value of exports’’, and ‘‘Sales of Fuel’’. The fifth factor, which represents the variables ‘‘Number of hospitals’’ and ‘‘Number of hospital beds’’, was named ‘‘Hospital Services’’. Factor seven has high loadings on the variables ‘‘Percentage of total population between 25 and 64 years of age’’, and ‘‘Ratio of Town Council environmental expenditure to Town Council total expenditure’’. It was named ‘‘Active Adults and Environmental Needs’’.

Finally, factors six (named ‘‘Tourism’’), eight (named ‘‘Unemployment Rate’’), and nine (named ‘‘Infant Mortality Rate’’), each represent one single variable, respectively ‘‘Number of hotel beds’’, ‘‘Unemployment rate’’, and the ‘‘Number of deaths at less than 1 year of age per 1000 births’’.

4. Grouping the municipalities Cluster analysis was used to look for groups of municipalities with similar levels of socio-eco-

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

nomic development. This method is recognised as an appropriate technique for finding groups of cases, and has been used extensively in the Social Sciences (Punj and Stewart, 1983). Cluster methods were run using both the standardised original socio-economic variables and the nine socio-economic factors derived in Section 3. Results of both approaches were compared. Various hierarchical cluster procedures were first employed to establish the number of clusters of municipalities. A non-hierarchical cluster procedure (K-means), using the cluster centers of the best hierarchical solution as the initial seed points, was then used to ‘‘fine-tune’’ the results of the best cluster solution derived by the hierarchical procedures (see Everitt, 1993). 4.1. Cluster analysis using the standardised original variables 4.1.1. Hierarchical methods A range of hierarchical cluster procedures using various distance measures was used in the search for homogeneous groups of municipalities. Results from the majority of analyses suggested a 3-cluster solution, with a great number of municipalities in each cluster, as the best statistical solution. Some

131

methods, however, suggested a fourth cluster formed with just the two main municipalities, Lisboa and Porto, or even a fifth including two municipalities, Sardoal and Vila Nova da Barquinha, with a great number of firms per 1000 inhabitants. The best statistical and interpretative solution was given by WardÕs (1963) method (see Fig. 3). 4.1.2. Non-hierarchical methods The K-means non-hierarchical procedure was then run to improve the 3-cluster solution. However, one of the three clusters that was derived, after various sets of seeds were run, always included just the municipalities of Lisboa and Porto. For this reason, these two municipalities were considered a fourth cluster and were left out of the cluster analysis. The initial seeds (Arraiolos, Bragancßa and Almada) were chosen based on the results of the WardÕs method, and also because these municipalities are vastly different in terms of their socioeconomic characteristics. A graphical representation of the three clusters (four with the Lisboa/Porto cluster) is shown in Fig. 4. These clusters basically divide the Portuguese territory in a way which highlights the

Fig. 3. Dendrogram (WardÕs method).

132

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

Fig. 4. Clusters of the Portuguese municipalities showing four socio-economic levels of development.

(known) development differences between coastal and inland zones. As one can see, there is a great difference between this characterisation and the picture drawn in the introduction, based on the GDP figures for the NUTS-2 regions also marked in this map. To help analyse the results, the mean and standard deviation of all standardised variables for the different clusters, are presented in Table 6. Cluster 1, named ‘‘Less developed rural areas’’, consists mainly of interior municipalities in the North and Centre of Portugal, usually close to the Spanish border, and it spreads also to the coast in the South. Cluster 1 is characterised by an ageing population, where the birth rate is very low and

the death rate is very high. The existing firms belong almost exclusively to the primary sector. The number of Health Centres per inhabitant is relatively high, which is due to the low population density. The percentage of the active population is very low and the unemployment rate is very high, when compared to the municipalities in the other clusters. Cluster 2, named ‘‘More developed rural areas’’, includes a large number of municipalities from the northern and central regions of Portugal, as well as some municipalities from the Alentejo and Algarve, in the South. Looking at Table 6, one can see that with regard to cluster 2, all variables have mean figures that are close to the global average for each variable. Therefore, it can be said that the municipalities in this cluster have an average level of socio-economic development; they are more developed than the municipalities in cluster 1, but less developed than those in cluster 3. Cluster 3, named ‘‘Coastal urban areas’’, consists of municipalities from the coastal zone of Portugal. It includes the northern municipalities belonging to the metropolitan area of Porto (excluding the even more developed municipality of Porto); some central municipalities such as Coimbra, Figueira da Foz and Leiria; the municipalities around Lisboa, including Set ubal; the municipality of Sines (a large sea port), in the Alentejo; and finally, two groups of municipalities around Faro and Portim~ao, in the Algarve. These municipalities are characterised by a high population density, a high birth rate and a low death rate. The percentage of population with ages between 25 and 64 years is high, as well as the percentage of the working population. Contrasting with municipalities in Cluster 1, Cluster 3 municipalities have a high migration rate. Concerning the economic indicators, one notices that both imports and exports are greater than the national average. The number of firms from the primary sector is very low, whereas the number of firms from the secondary and tertiary sectors is above the national average. The purchasing power of individuals living in these municipalities is clearly greater than that of other municipalities (excluding Lisboa and Porto). The water and electricity consumption variables show figures far above the national av-

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

133

Table 6 Descriptive statistics for clusters 1–4 (original variables) Variable

DENSPOP POP_0–24 POP_25–64 TNAT TMORT TMIG IMPORT EXPORT ALOJAM EMP_ABC EMP_DEF EMP_GQ DEPOSITO DESPCAM ELECDOM ELECIND AGUA TELEFONE COMBUST HOSP CENTSAU CAMAS MEDICOS TMORTINF ENSBAS ENSSEC ENSSUP ESPECT BIBLIOT AMB_TOT TACTIV TDES PCOMP

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Mean

S.D.

Mean

S.D.

Mean

S.D.

Mean

S.D.

)0.30 )0.99 )0.60 )0.92 0.99 )0.61 )0.39 )0.42 )0.13 0.73 0.08 0.11 0.11 0.92 )0.41 )0.43 )0.32 )0.11 )0.09 )0.23 1.03 )0.20 )0.36 0.04 )0.41 )0.22 )0.12 )0.19 0.64 )0.37 )0.75 0.64 )0.52

0.02 0.64 0.80 0.66 1.00 0.97 0.09 0.08 0.21 1.03 1.74 1.75 1.61 1.07 0.06 0.04 0.03 0.55 0.19 1.35 1.13 0.97 0.18 1.48 0.09 0.03 0.00 0.58 1.26 0.62 0.81 1.35 0.28

)0.20 0.38 )0.12 0.17 )0.21 0.13 )0.16 )0.16 )0.06 )0.07 )0.07 )0.14 )0.09 )0.34 )0.22 )0.28 )0.20 )0.28 )0.09 )0.01 )0.29 )0.02 )0.15 )0.02 )0.13 )0.16 )0.10 )0.20 )0.25 )0.10 )0.04 )0.30 )0.19

0.11 0.86 0.81 0.84 0.60 0.74 0.43 0.37 0.43 0.79 0.46 0.36 0.45 0.61 0.21 0.23 0.16 0.85 0.15 0.77 0.56 0.91 0.41 0.80 0.21 0.06 0.05 0.47 0.62 0.95 0.76 0.62 0.61

0.75 0.52 1.26 0.96 )0.98 0.73 0.93 1.09 0.37 )0.89 0.08 0.18 )0.04 )0.51 0.96 1.41 0.72 0.82 0.40 0.29 )0.77 0.23 0.75 )0.02 0.68 0.39 0.02 0.61 )0.39 0.85 1.27 )0.15 1.09

1.49 0.66 0.58 0.68 0.53 0.88 1.77 1.89 2.19 0.58 0.32 0.43 0.51 0.81 1.17 1.66 1.15 1.23 2.29 0.75 0.26 0.98 1.62 0.48 1.22 0.90 0.20 1.38 0.80 1.17 0.52 0.78 0.90

7.80 )0.80 1.15 0.10 0.15 )3.61 3.63 0.64 0.15 )1.38 )0.29 1.14 3.45 0.86 8.15 1.75 8.93 3.63 0.01 2.46 )0.73 3.80 6.22 0.24 8.79 10.44 11.57 6.65 2.59 0.25 1.09 0.19 6.62

0.22 0.58 0.18 0.37 0.65 0.66 2.85 0.77 0.12 0.05 0.22 0.28 2.06 1.03 0.63 1.10 2.57 1.40 0.05 0.15 0.09 0.24 0.43 0.23 0.50 0.60 1.14 2.85 1.33 0.26 0.22 0.08 1.19

erage for these variables. The number of telephones is also high when compared to that of clusters 1 and 2. Concerning the fields of education, health, and culture, one can see that the number of schools per km2 at the various educational levels is much higher than that found in clusters 1 and 2. On the contrary, both the number of libraries and the number of health centres per inhabitant are lower when compared to clusters 1 and 2. This might be explained by the higher population density of municipalities in cluster 3, which allows a great number of people to use the same library or health centre. The higher number of doctors per inhabitant verified in cluster 3

supports this reasoning. Finally, municipalities belonging to cluster 3 have the highest ‘‘ratio of Town Council environmental expenditure to Town Council total expenditure’’ of all clusters (including cluster 4). As mentioned before, the municipalities of Lisboa and Porto make up cluster 4. This cluster was named ‘‘The two main cities’’. It displays far superior levels for the socio-economic variables, with special emphasis on the purchasing power index, population density, usual domestic comfort indicators (electricity, water and telephones) and number of hospitals, hospital beds and medical doctors per capita.

134

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

4.2. Cluster analysis using the factor scores

the ‘‘less developed municipalities’’. In the northern interior of Portugal, the opposite occurred, with some municipalities that had been previously classified in Cluster 1 being now classified in Cluster 2. The differences in classification mentioned above might be justified by the percentage of total variance of the original variables that is not explained by the nine factors (22.4%).

Using the factor scores obtained in Section 3, WardÕs hierarchical procedure was first used to select the number of clusters, whereas the K-means non-hierarchical cluster procedure, using the cluster centers obtained with the WardÕs method as the initial seed points, was used to ‘‘fine-tune’’ the results. The initial seeds used in the K-means procedure were Arraiolos, Nazare and Almada. The replacement of the municipality of Bragancßa with the municipality of Nazare was due to the fact that Bragancßa and Almada have now been classified into the same cluster by the hierarchical procedure. Again, Lisboa and Porto were set apart of the cluster analysis, and considered as the fourth cluster. Table 7 shows the average and standard deviation figures for the nine factors, for clusters 1–4. Observing the average figures for factor variables, for the first three clusters, one notices that the factors that most explain the differentiation among clusters are factors 1, 2, 7 and 8. As expected, Cluster 4 (Lisboa and Porto) shows mean values very different from the national average. To conclude, it should be said that the new map of clusters (see Marqu^es, 1999) has revealed a structure similar to that found in the map of Fig. 4 (almost 80% of the municipalities remained in the same clusters). Some differences have been found, however. Some municipalities in the Alentejo, which had been previously categorised in Cluster 2, were now classified in Cluster 1 – the cluster of

5. Conclusions The first conclusion of this work is that the multivariate techniques utilised were successful in allowing us to identify (a) the main axes of socioeconomic characterisation (the factors), and (b) the regions of the Portuguese territory with differing degrees of development. The second conclusion has to do with the variables that have been used in this study. From the beginning we were concerned that, because we were using a different number of variables from the various sectors (economic, health, cultural, etc.), this could introduce some bias in the clustering of the municipalities. But the analysis made with the factors has indicated that there was no cause for concern, as the resulting clusters of municipalities were very similar to those found where the original variables were used. This fact is encouraging, especially when there is no theoretical justification to exclude, a priori, any variable (Hair et al., 1998). The third conclusion of this study reinforces a well-known fact in Portugal: that the coastal zone is more developed than the interior of the country.

Table 7 Descriptive statistics for clusters 1–4 (factors) Factor

Factor Factor Factor Factor Factor Factor Factor Factor Factor

Cluster 1

1 2 3 4 5 6 7 8 9

Cluster 2

Cluster 3

Cluster 4

Mean

S.D.

Mean

S.D.

Mean

S.D.

Mean

S.D.

)0.26 )0.70 )0.04 0.02 )0.03 0.03 )0.17 1.22 0.44

0.24 0.90 1.13 1.45 1.33 0.69 0.65 1.04 1.49

)0.18 0.13 0.02 )0.13 )0.03 )0.14 )0.50 )0.50 )0.22

0.15 1.01 1.14 0.33 0.76 0.38 0.65 0.52 0.76

0.31 0.43 )0.02 0.22 0.05 0.26 1.23 )0.11 0.04

0.83 0.67 0.37 1.32 1.08 1.78 0.76 0.74 0.71

10.11 )1.24 0.77 0.63 1.26 0.05 )2.18 0.08 )0.11

0.80 1.01 0.68 1.60 0.17 0.56 0.37 0.11 0.21

J.O. Soares et al. / European Journal of Operational Research 145 (2003) 121–135

Any regional development policy has to deal with this reality, in order to stop the migration of young people to the coast, by promoting the creation of jobs and public services (namely in the health and education areas) in the less developed zone. A final conclusion, and perhaps the most important, is that the classification scheme resulting from this research exposes evident weaknesses of the NUTS-2 classification scheme used by the European Union, in line with the concerns raised by Lipshitz and Raveh (1998). In fact, our classification scheme has uncovered striking development differences inside each of the five Portuguese NUTS-2 regions, which are treated as homogeneous from the point of view of the European Union regional development policy.

References Brand~ ao, A., Pires, A., Portugal, J., 1998. Agrupamentos de concelhos de Portugal e sua caracterizacß~ao. Revista de Estatıstica 1, 73–93. Catell, R.B., 1966. The scree test for the number of factors. Multivariate Behaviour Research 1, 245–276. European Commission, 1999. Sixth periodic report on the social and economic situation and development of the regions of the European Union. European Council, 1999. Regulation (EC) No.1260/1999 of the European Council, Official Journal of the European Communities 26.6.1999. European Parliament and Council, 1999. Regulation (EC) No.1783/1999 of the European Parliament and of the Council, Official Journal of the European Communities 12.7.1999. Everitt, B.S., 1993. Cluster Analysis, third ed. John Wiley & Sons, New York. Ferr~ ao, J., Jensen-Butler, C., 1988. Existem Ôregi~ oes perifericasÕ em Portugal? An alise Social 24 (1), 355–371. Gorsuch, R.L., 1983. Factor Analysis. Lawrence Erlbaum Associates Publishers, Hillsdale, NJ.

135

Hair, J., Anderson, R., Tatham, R., Black, W., 1998. Multivariate Data Analysis, fifth ed. Prentice-Hall, Englewood Cliffs, NJ. INE – Instituto Nacional de Estatıstica, 1991, Censos. INE – Instituto Nacional de Estatıstica, 1995a. Anuario Estatıstico Regional. INE – Instituto Nacional de Estatıstica, 1995b. Estudo Sobre o Poder de Compra Concelhio. Kaiser, H., 1974. Little Jiffy Mark 4. Educational and Psychology Measurement 34, 111–117. Kim, J., Mueller, C.W., 1978. Factor Analysis: Statistical Methods and Practical Issues. Sage Publications Inc., Beverley Hills, CA. Kinnear, P., Gray, C.D., 1994. SPSS for Windows Made Simple. Lawrence Erlbaum Associates Publishers, Hove, UK. Kline, P., 1994. An Easy Guide to Factor Analysis. Routledge, London. Lema, P.B., Mather, P.M., 1970. Factor Analysis e Cluster Analysis Aplicados a Dados Estatısticos Sobre Portugal, Geography Department, Nottingham University, Nottingham. Lema, P.B., Mather, P.M., 1977. O Norte de Portugal – ensaio de analise multivariada, Estudos de Geografia Humana e Regional 5, Centro de Estudos Geograficos, Universidade de Lisboa. Lipshitz, G., Raveh, A., 1998. Socio-economic differences among localities: A new method of multivariate analysis. Regional Studies 32 (8), 747–757. Marqu^es, M.M., 1999. Analise Estatıstica Multivariada Para Caracterizacß~ao S ocio-Econ omica dos Concelhos do Territ orio Continental, IST – Universidade Tecnica de Lisboa. Norusis, M.J., 1990. SPSS/Pc+ Statistics 4.0, SPSS Inc. Openshaw, S., 1995. Census UsersÕ Handbook. Geoinformation International and John Wiley & Sons, Cambridge. Ozimek, J., 1993. Targeting For Success: A Guide to New Techniques for Measurement and Analysis in Database and Direct Response Market. McGraw-Hill, Berkshire. Punj, G., Stewart, D.W., 1983. Cluster analysis in marketing research; a review and suggestions for application. Journal of Marketing Research 20, 134–148. Sharma, S., 1996. Applied Multivariate Techniques. Wiley, New York. Ward, J., 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244.