On measuring association between preference systems - CiteSeerX

1 downloads 0 Views 514KB Size Report
the classical Kendall's rank correlation coefficient that it might be used for measuring association between preference systems with missing information or ...
25-29 July, 2004 Budapest, Hungary

On Measuring Association Between Preference Systems Przemystaw Grzegorzewski Systems Research Institute Polish Academy of Scicnccs Newelska 6 0 1-447 Warsaw. Poland E-mail: [email protected] N~struct- The problem of measuring association between preference systems in situations with missing information or noncomparable outputs is discussed. New correlation coefficient, which generalize Kendall's rank correlation coefficients used traditionally in statistics, is suggested. The construction utilizes intuitionistic fuzzy sets.

1. IN-~'KODIJCTION In statistics and data analysis we often examine more than one variable (or attribute) which characterize the population under study. Thercfore, a natural and important problem arises immcdiatcly - whether thcse variables arc indcpendcnt. If two variables are not indepcndcnt then thcy are said to be dependcnt, associatcd or corrclatcd. And much of the statistical literature i s devoted to measuring and describing this association. In the case of the quantitative data the study of the relationship between two variables is called, correlation analysis. For such data one can ofien assess the degree of correlation from thc scattergram and calculate the Pearson product-moment correlation coefficient. Moreover, if the data come from the normal distribution one can apply statistical tests to verify hypothescs on the population correlation cocffcicnt. Wc can derive analogous measures of association in thc distribution-frcc fi-amework by using the conccpt of rank from the ordering of the observations. As examples, we derive two such measures: Kendall's rank correlation coefficient and Spearman's rank correlation coefficient. These two coefficients could be applied not only for numerical data, but also for qualitative data, provided they are on an ordinal scale. A typical situation where Kendall's or Spearman's rmk correlation coefficient is effectively applied is the comparison of prefercnccs. For cxamplc, if our varittblcs corrcspond to two prcfcrcnce systems, using Kendall's or Spcarman's rank correlation cocfficicnt we may statc whether thesc preferenccs are concordant or discordant and evaluate the strength of the possible association. Classical statistical tools deal with random phenomena but they have been constructed for precise and unambiguous observations. But in real life we often meet vague data and ambiguous answers which abound with missing information and hesitance. Such data exhibit uncertainty which has diffcrcnt source than randomncss. Thus traditional statistical

0-7803-8353-2104/$20.00 0 2004 IEEE

proccdurcs cannot bc applied thcrc and new tools which admit uncertainty due to imprecision, vagucncss and hesitancc arc strongly required. In the prcscnt paper wc suggest how to measurc association bctwcen prcfcrcnce systcms which are not necessarily unambiguous and admit some hesitance. We propose a natural and simple mathcmatical modcl for such preference system based on intuitionistic fuzzy sets. Then we show how to generalize the classical Kendall's rank correlation coefficient that it might be used for measuring association between preference systems with missing information or noncomparable outputs. We also discuss basic properties of that generalized Kendall 's correlation coefficient. EKENCbS A N D RANK COKRELAI'ION

Let X = {.cl.. . . ,x,} denote a finite universe of discourse. Suppose that elements (objects) T I . .. . ,x, are ordered according to preferences of the two persons d and B. Thus our data are pairs (-41. B t ) , . . . , ( A T l 7 & ) whcrc , A, and B? denote ranks attributed by A and B. rcspectively, to elcment s, which reflect its position relative to the others T I . . . . ,Z,-I, :x:,+j,. . ..r,] according to two prcfcrcnce systems A and B. Here a natural question arises: whether exists any rclationship bctwecn prefcrcncc sy.itcms A and B or they are independent'? Well known statistical tools as Kendall's 7 or Spearman's rank correlation coefficient might be applied there to measure the association between these preference systems. When no ties exist (ix. when there are no two values of A or two values of B with the same rank) then the Kendall correlation coefficient is given by 'n,

n

ulhere

K,,

= sgn(A,

- .A2) . sgn(ll, - B?),

(2)

while the Spearman correlation coefficient is defined by the following formula

133

FUZZ-IEEE 2004

where d, = A, - 13,. i = 1, . . . n. arc the differences in ranks of A, and B, (see. e.g. 171). Kendall’s 7- and Spearman’s correlation coefficient satisfy the usual requirements of good association measure. It can be shown that

modclling prefcrcnccs. Therefore we will start from recalling somc basic concepts and notation. 111. INTUITIONISTIC FUZZY SETS

Let S dcnotc a univcrsc of discourse. Then a fuzzy sct C in X is dcfincd as a set of ordered pairs

c = { ( x . p c ( x ) ): x E X } . The preference systems A and B are perfectly concordant if and only if 7- = 1 (or 9*, = 1). For perfect discord”, i.e. arrangement B must bc thc revcrsc of arrangement A, we get T = -1 (or T , = ~ -1). Absolute valucs 17-1 and jrsl between 0 and 1 givc a relative indication of the dcgrce of association between A and B. where absolute values close to one reflect strong association, while values close to zero indicate weak association or even independence. Moreover, both coefficients 7- and rs are commutative, symmetric about zero and are invariant under all order-preserving tmnsformations. If tied observations also appear then the most common practice for dealing with them, as in most other nonparametric procedures. is to assign equal ranks to indistinguishable observations. The properties and possible relationship between T and T , has long been studied (see. e.g., [ 131). It was shown that there is no sharp functional relationship between this two correlation coefficicnts but inequalities relating 7- and T , do exist (see [4],

[si). Let us consider a following example: Example 1 Two persons - John and Susan - were asked to rank the following hlcnds of tea from the most prcfcrrcd to the less esteemed one. We want to check if there cxist any association between John’s and Susan’s preferences on the tea blends (and if it is so, to measurc its strength). Assam Darjeeling English Aftemoon Irish Breakfast

Ceylon Earl Grey English Breakfast Victorian Blend.

Supposc that John’s likes English Brcakfast best. Ncxt is Irish Breakfast, Dajccling, Ceylon, Assam and English Aftcmoon but hc hatcs Earl Grey. Moreover, he has never drunk Victorian Blend. Susan favorite tea is Earl Grey. Next shc prefers English Aftcmoon, Assam, Ceylon and Darjeeling. She docs not like strong teas like English Breakfast, Irish Breakfast and Victorian Blend and shc appraises them cvcnly. It is evident that this kind of data rcquire rank corrclation measure to cvaluatc the association bctwccn John’s and Susan’s prcfercnccs. Unfortunately, ncithcr classical Kcndall’s cocfficicnt nor Spcamian’s cocfficicnt cannot bc applied herc. It is generally because not all elements have been ranked. N Further on we suggcst how to copc with problcms like this. In our approach we utilize intuitionistic fuzzy scts for 134

(6)

where pc : X -+ [O, 11 is the membership fkction of C and p ~ ( . c )is thc grade of bclongingness of x into C (see [ 141). Thus automatically the gradc of nonbclongingncss of .c into C is equal to 1 - pc(z). However, in real life the linguistic negation not always identifies with logical negation. This situation is very common in natural language processing, computing with words, etc. Therefore Atanassov [1-21 suggested a generalization of classical fuzzy set, called an intuitionistic fuzzy set. An intuitionistic fuzzy set C in X is given by a set of ordered triples

c = {(.E./.Q(:c). where pc, U(; : X

0

-

I/c(z)): x E X } ,

(7)

[O. l] are functions such that

5 pc(2:)4- vc(x) 5 1

VX E

x.

(8)

For each 5 the numbers p ( - ( x )and V C : ( T ) represent the degree of membership and degree of nonmembership of the element x E X to C c X. respectively.

It is easily seen that an intuitionistic fuzzy set { ( x . p c ( . c ) .I - p ~ ( x ) :) .e E lU} is equivalcnt to (6), i.e. each fuzzy set is a particular case of the intuitionistic fuzzy set. We will dcnote a family of fuzzy scts in X by F S ( S ) , while I F S ( X ) stands for the family of all intuitionistic f u v y sets in X. For each element :t: E X we can compute, so called. the intuitionistic f u 7 q index of x in C defined as follows

”&)

= 1- p < - ( x )- vc(x).

It is seen immediately that ~ p ( . c E ) [O. 11 b’x E

(9)

X. If C

E

F S ( X ) then “ ~ ( x )= 0 b’x E S. Iv. IFS I N MODELI.INC; PREFERENCES In this section we will suggest how to apply intuitionistic fuzzy sets in modelling preferences. A method proposed below seems to be useful especially then where not ail elements under consideration can be rankcd according to prcfcrcnce systems A and B. In our approach we will attribute an intuitionistic fuzzy set to each prcferencc systcm. For simplicity of notation we will identify prcfcrcnce systcms A and B with the corresponding intuitionistic fuzzy sets. Thus let A = { ( x ‘ . p ~ ( x &v~(z,)) ). : x L E X} denote an intuitionistic fuzzy subset of the universe of discourse X = {.cl,. . . .xn), where membership function p t l ( x t ) indicates the degree to which x, is the most preferred element according to the preference system A, while nonmembership function V A ( T , ) shows the dcgrcc to which 5 , is the less prcfcrrcd clcmcnt according to A.

25-29 July, 2004 Budapest, Hungary ) if and only if Similarly, let B = { ( x z . p ~ ( ~ 7 ) , ~ ~ : n" ( r Ez )X} ) denote obscrved. One may also notice that T A ( : ~=~ 1 an intuitionistic fuzzy subset of the univcrsc of discoursc clenicnt !r, E 9 is noncomparablc with othcr clement or all X , where membership function \in ( .rZ) and nonmcmbership elements 3'1,. . . . . T , ~havc obtained the same rank. function VD(Y,) indicate the dcgree to which x, is the most Hence it is seen that intuitionistic fuzzy sets seem to be preferred and the less preferrcd elcment according to the a natural and useful tool for modelling preference systems preference system B, respectively. admitting nonlinear orderings. Hcre a natural question arises: how to detcmiinc these v. KENDALL' S CORRELATION COEFFICIENT mcmbcrihip and nonmcmbcrship functions. Let us recall that the only available information on A and B are orderings that Several authors discussed how to measure the correlation admit ties and elements that cannot be ranked ( i s . we deal with betwecii intuitionistic fuzzy scts. Unfortunatcly, the suggested orderings which are not necessarily linear orderings because coefficients very ofteiz evaluate only the strength of the relathere are elements which are noncomparable). However, one tion, i.e. they take values from the unit interval [O. 11 (see, e.g., can always specify two functions w , ~ h,+ , : X -+ (0,l.. . . . n[6], [9].[IO]). However, we need tools that not only provide 1) defined as follows: for each given z1 E X let u l ~ ( ~us -~ ) the strength of the relationship but also show whether with denote the number of elements X I . . .. ,2,-1: z l + l . . . . Xn the intuitionistic fuzzy sets are positively or negatively related. surely worse than xb,while h~ (s?) let be equal to the number Such cocfficients were proposed in [ 111. [ 121 for the structurcs of clenients . T I , . . . :rt- 1 , xz+1, . . . s,,surely better than :e, which arc intuitionistic or intewal-valued generalizations of in thc ordering concsponding to thc prefcrencc system A. fuzzy numbcrs. So we cannot apply thcm directly for our Using functions iu.a(c,) and b ~ ( x ,we ) may detcrminc thc purposes. Therefore we have to construct another measure of rcqucstcd incmbcrship and nonmcmbcrship functions p.4 and corrclation. It should posscs at least the following properties: V A . Namely, lct our coefficient should range in [-1.11 to indicate both the

.

.

strength of the relationship and the concordance or discordance; it should be applied for different types of intuitionistic fuzzy sets (i.e. not necessarily normal, coiivex, etc.). Moreover, a counterpart of a well known stiatistical measure of association would be desirable. Similarly, we may easily find two functions t t ' ~ bn . : Below we suggest how to gcneralizc Kendall's correlation -7 --$ (0.1 ..... n, - 1 ) such that for each given n*, E cocfficicnt into the intuitionistic f u v y set domain. A- a value ~ N R ( . C , ) denotes the numbcr of elements XI,.. . . I I ' ~ - ~ , L G . . ~. .+x n~ surely , worsc than .E&, while b s ( . c , ) Definition 1 is cqual to the numbcr of*clcmcnts21,. . . ,.r,-l. ~ , + 1 . . . . .r,& Let A = { ( x * ~ ~ , 4 ( . ~ , ) . 1 ~: . 4 z-, ( ~E, ) ~ X} unci B = surely better than J, in the ordering corresponding to the ( ( x , , p ~ ( x , )v. B ( x * ) } : .c, E X } denote two intuitionistic preference system B. Thus, as before. the membership and fmzy subsets of the universe of diseorwse X = ( 2 1 , . . . x,}. nonmembership functions p~ and vg are given as follows Then Ke~iduil'scorrelation coeficient 7 befiveen A and B is given by

.

n

ri

It is easily seen that W.~I(X~), h ~ ( x , )Z. U R ( X , ) , b ~ ( x E~ ) (0. . , ..n - 1) becausc wc rank n elements and hence for each clement .E, E -71thcrc exist no less than zero and no more than n - 1 elcmcnts which are better (worse) than x,. Moreover, &e admit situations when the same rank is assigned to more than one element and elements that are not comparable with the others. One may ask whether ?(A,B) satisfies the usual requireIn such a way we get two well defined intuitionistic fuzzy ments of good association measure. Moreover, is it really good scts which describe nicely preference systems A and B. generalization of the classical Kendall's correlation coefficient Without loss of generality let us discuss the properties of A. (i.e. (14) reduces to ( I ) if all elements are ranked by both It is seen that the intuitionistic index 7i-q(.c7) = 0 for each preference systems A and B)" Basic properties o f ? are given x, E X if and only if all elements are ranked and there are in the following propositions. no ties. Comcrsely, if there exist such element .r7E S that 7iA ( r 7> ) 0 then it means that there are ties or noncomparable Proposition 1 elements in the corresponding preference system. Moreover, For all ,4.B E I F S ( X ) , dzere X = (2,.. . . .t,} we have more tics or elements that arc not comparable with the others e ? ( A , B ) = ?(B,A ) , are present. biggcr values of the intuitionistic fuzzy index are 0 I?(A,l3)1 5 1. 135

FUZZ-1EEE 2004

Proposition 2 I f ull elements ure ranked by both pr formdue 1101, ( I I ) and (12), (13). respectively. Then ? ( A5)= 1 iffpreference systems A und B are perfrcfly concordant, 0 S ( A , B ) = -1 {fl preference systems A und B are per-fictll, discordunf (clrraizgement 13 tnust he the reverse of’ arrangement Aj.

In [8] a generalization of the classical Spearman’s rank correlation coefficient F.$ for pref’erence systems with missing information or noncomparable outputs has been suggested. Its definition is much more complicated than the definition of ?. and is given as follows: for two intuitionistic fuzzy : :r, E X } and 13 = subsets ,4 = {(~~?,~f.~(~,).~,4(.~*)) ((.r,./ig(n.l).vH(.c,)) : x, E X } , of the universe of discourse S = ( 2 1 . . . . ,.zTt}> which correspond to preferences systems under study, the generalized Spearman‘s correlation coefficicnt 7,is given by

Proposition 4 Siippose ut leust one preference s y tern is conipletely hesitant (i.e. none of’the elements can be runked, so thut we cannotJind any association between preference systems .4 and B. Then ?(.a. B ) = 0.

V 1. C OM PA R ISON s

where

Example 1 (continuation) First of all let us construct intuitionistic fuzzy sets describing John’s and Susan’s preferences. We denote them by A and B, respectively. According to (10)--(13) we get:

A

=

t,s),

{(Assam.

::?),

(Ceylon. 1 7

’) ,

7 1 (Darjeeling. t,

and

a-

(Earl Grey. 0.

~

=

n+-1 Tl’+L$T,-.>

n2-1

English Aftemoon,

Example I (continuation) For John’s and Susan’s preferences we get

English Brcakfast.

T , ( A . B ) = -0,399. ,(Victorian

and

B = {(Assam.

,:

;).

(Ceylon.

Remark As for classical Kendall’s and Spearman’s rank correlation coefficients, thcrc are also no sharp functional relationship between their counterparts 7 and Fs. It may happen that ?(A.B) > Fs(A,B). as it is in Example 1. but it is not difficult to find situation with opposite relationship.

:. 3).

English Aftemoon. =. -

Example 2 Let us considcr two preferences systems described by the following intuitionistic fuzzy subsets of the universe of discourse

( 7

x = {XI.. . .