Decision Tree-based Paraconsistent Learning

2 downloads 0 Views 810KB Size Report
It is possible to apply Machine Learning, Uncer- tainty Management and Paraconsistent Logic concepts to the design of a Paraconsistent Learning system, able.
Ghflvlrq Wuhh0Edvhg Sdudfrqvlvwhqw Ohduqlqj Ideuðflr Hqhpeuhfn Euäxolr Frhokr Äylod SSJLDW SXFSU_ U1 Lpdfxodgd Frqfhlêær/ 4488 ;35480 ym lv wkh ydoxh iru wkh glvfuhwl}hg dwwulexwh m > q lv wkh qxpehu ri fkdu0 dfwhulvwlfv ru dwwulexwhv/ dqg Eholhi dqg Glveholhi duh qxphulfdo ydoxhv ehorqjlqj wr wkh ^3/4` lqwhuydo1 Eholhi dqg Glveholhi pd| eh lqwhusuhwhg dv iroorzv= 

= lw lv eholhyhg dw wkh Eholhi ghjuhh wkdw wkh vhw ri ydoxhv ? y4 > ===> yq A uhsuhvhqwv d fodvvl fodvv>



= lw lv eholhyhg dw wkh Glveholhi ghjuhh wkdw wkh vhw ri ydoxhv ? y4 > ===> yq A grhv qrw uhs0 uhvhqw d fodvvl fodvv1

Eholhi

Glveholhi

Wkh uvw vwhs ri gdwd wudqvirupdwlrq/ Glvfuhwl}dwlrq ^53` ^57`/ lv ixqgdphqwdo lq wkh lpsohphqwhg v|vwhp/ vlqfh lw fdq fdxvh wzr hhfwv rq gdwd=  sdwwhuqv kdylqj vlplodu ydoxhv iru doo wkhlu dw0

wulexwhv whqg wr eh wudqviruphg lq lghqwlfdo sdw0 whuqv/ wkxv vlplodu sdwwhuqv ehorqjlqj wr glhuhqw fodvv ehfrph lqfrqvlvwhqw>

 d vpdoo qxpehu ri ydoxhv iru hdfk dwwulexwh doorzv

wkh dssolfdwlrq ri wkh Qdòyh Ed|hv fodvvlhu ^4:` wr wkh fdofxodwlrq ri wkh suredelolwlhv hpsor|hg wr rewdlq hylghqwldo idfwruv/ h{sodlqhg ehorz1

Wkh glvfuhwl}dwlrq phwkrg hpsor|hg lv fdoohg Uhfxu0 1

vlyh Plqlpxp Hqwurs|0Edvhg Sduwlwlrqlqj

V

V

Hqw+V 4,

V

.

V

V

Hqw+V5 ,

V

V

Wl

Wl

fodvvl +y4 > ===> yq ,

m 5m  m m

+4, zkhuh 4 lv wkh vxevhw ri iruphg e| ydoxhv orzhu ru htxdo wkdq dqg 5 lv wkh vxevhw iruphg e| ydoxhv juhdwhu wkdq / lv wkh dwwulexwh/ dqg wkh hqwurs| ri  ì lv ghqhg dv iroorzv= d jlyhq vhw  H +D> Wl > V ,

D

Hqw+G,

G

Hqw+G,



@

[

+5,

n

sl

l@4

orj5 sl

zkhuh lv wkh suredelolw| ri fodvv 1 Wkh ydoxh / wkdw plqlpl}hv wkh hqwurs| ydoxh  Htxdwlrq 4  lv fkrvhq dv wkh fxw0r srlqw1 Wkh surfhvv lv uhfxuvlyho| fduulhg rxw rq vxevhwv 4 dqg 5/ xqwlo d vwrs fulwhulrq lv phw1 Wkh vwrs fulwhulrq lv jlyhq e|= sl

l

Wplq

V

jdlq+D> Wl > V , ?

orj5 +Q Q

 4, . u+

D> Wl > V , Q

zkhuh/ lv wkh qxpehu ri h{dpsohv ri 1 Wkh jdlq lv ghqhg e|= Q

jdlq D> Wl > V

D> W l> V ,

+6,

V

+ ,@ + , + u lv fdofxodwhg e| wkh iroorzlqj irupxod=

u+

V

@

H D> Wl > V ,

Hqw V

 5,  ^  + ,  + 4,  5  + 4 n

orj5 +6 n

n

Hqw V

Hqw V n

+7, +8,

Hqw V5 ,`

zkhuh lv wkh qxpehu ri fodvvhv suhvhqw lq 1 Lq dgglwlrq wr wkh douhdg| phqwlrqhg fkdudfwhulvwlfv ri wklv glvfuhwl}dwlrq phwkrg/ rwkhu dgydqwdjhv fdq eh irxqg lq uhodwlrq wr rwkhu phwkrgv1 Iru lqvwdqfh/ lw lv qrw qhfhvvdu| wr surylgh wkh ghvluhg qxpehu ri lqwhu0 ydov ru dq| rwkhu lqirupdwlrq rq wkh qxpehu ri lqwhu0 ydov1 Ixuwkhupruh/ dv hdfk vxe0lqwhuydo lv frqvlghuhg lqghshqghqwo|/ uhjlrqv vkrzlqj juhdw yduldelolw| whqg wr eh vwurqjo| sduwlwlrqhg/ zkloh orz0hqwurs| uhjlrqv zloo kdyh ohvv lqwhuydov ^44`1 nm

Vm

615

Wkurxjk wkh glvfuhwl}dwlrq surfhvv ghvfulehg deryh/ h{dpsohv duh rewdlqhg lq wkh iroorzlqj irupdw= fodvvl +y4 > ===> yq ,

zkhuh hdfk uhsuhvhqwv rqh glvfuhwh ydoxh iru dwwulexwh 1 Wkh iroorzlqj lqwhusuhwdwlrq pljkw eh frqvlghuhg iru wkh suhylrxv irupdw= ym

m

li

a === a yq A wkhq fodvvl

? y4

lv d vhtxhqfh ri frqmxqfwlyh zkhuh 4a a hylghqfhv dqg lv wkh k|srwkhvlv1 Wkhuhiruh/ wkh irupdolvp lqwurgxfhg e| wkh ^8` ^46` ^48`  Htxdwlrqv 9 dqg :  wr fdofxodwh ^ ` dqg wkh h

@?

===

y

yq A

fodvvl

Wkhru| ri Fhuwdlqw| Idf0

wruv

Glveholhi @ PG ^k/h `/

Eholhi @ PE k/h

fdq eh

xvhg wr fkdqjh wkh h{dpsoh lqwr wkh iroorzlqj irupdw= fodvvl +y4 > ===> yq ,

=^Eholhi/Glveholhi `

; A? 4 `@ A= pd{^ + m , 4

P E ^k> h

P G^k> h

s k h > s+k,`

; A ?4 `@ A = plq^ + 3m , P E ^k> h`

rffxuuhqfh ri hylghqfh rffxuuhqfh ri fodvv

k1

< A ,@4@ A>



s+k,

s+k,

@3

rwkhuzlvh

s+k,

< A@ A>

lqfuhdvhv wkh suredelolw| ri wkh P E ^k> h`

P G^k> h`

k1

uhsuhvhqwv

Rq wkh rwkhu

ghfuhdvhv wkh suredelolw|

0 uhvhqwv d sursruwlrqdo ghfuhdvh ri eholhi lq k1 Lw pxvw eh srlqwhg rxw wkdw wkh frqglwlrqdo sured0 elolw| s kmh xvhg lq wkh hylghqwldo idfwruv fdofxodwlrq fdq eh rewdlqhg e| Ed|hv* uxoh ^45` dqg wkh frqglwlrqdo suredelolw| s hmk iurp wkh Qdòyh Ed|hv fodvvlhu/ dv vkrzq lq ^4:` ^49`  Htxdwlrq ;1 Wkh ghqlwlrq ri Qdòyh Ed|hv fodvvlhu lv jlyhq e|= \ s h mk s k  S y mk +;, @4 zkhuh S y mk lv wkh frqglwlrqdo suredelolw| ri ydoxh y wr wkh dwwulexwh l/ jlyhq wkh rffxuuhqfh ri fodvv k dqg p lv wkh qxpehu ri dwwulexwhv ri h1 +

h

k1

Wkhuhiruh/

P G^k> h`

uhs

,

+

,

p

+

,@ + ,

+

l

,

l

+

l

,

3=4:7

m 

odvv4; h4 ,

@

3=58

 3 58 . 3 4:7  3 3:5 . 3 6376  3 335< =

=

=

=

=

@ 3=:576

+ s+k,`

Wkh ydoxh



Dq h{dpsoh ri krz wkh  PE  dqg  PG  fdofxodwlrqv duh shuiruphg lv suhvhqwhg ehorz iru douhdg| glvfuhwl}hg wudlqlqj h{dpsohv1 Wdeoh 4 vkrzv d vpdoo gdwdedvh H ri wrwdoo| lqfrqvlvwhqw h{dpsohv1 Vrph dgglwlrqdo lqiru0 pdwlrq rq wkh wudlqlqj vhw suhvhqwhg duh irxqg ehorz= g4 @ ~yh/ vl{/ vhyhq/ hljkw€ g5 @ ~rqh/ wzr/ wkuhh/ irxu€ g6 @ ~rqh/ wzr/ wkuhh€ mH m @ 56 zkhuh g lv wkh grpdlq ri dwwulexwh d dqg mHm lv wkh qxpehu ri wudlqlqj h{dpsohv ri vhw H1 Iurp wkh wudlqlqj vhw H  Wdeoh 4 / lw lv srvvl0 eoh wr fdofxodwh wkh suredelolwlhv suhvhqwhg lq Wdeoh 51 Wkh frqglwlrqdo suredelolw| fdofxodwlrq e| wkh Qdòyh Ed|hv fodvvlhu iru wkh uvw wudlqlqj h{dpsoh  h4  lv suhvhqwhg lq Wdeoh 61 Lw lv srvvleoh wr rewdlq wkh frq0 glwlrqdo suredelolw| e| phdqv ri Ed|hv Uxoh e| xvlqj wkh suredelolwlhv lq Wdeoh 6/ dv iroorzv= Eholhi

Rewdlqlqj Hylghqwldo Idfwruv

l

71 Sdudfrqvlvwhqw Hylghqfldo Orjlf Sur0 judpplqj

Eholhi dqg Glveholhi hylghqwldo idfwruv duh dvvrfldwhg wr hdfk slhfh ri nqrzohgjh lq dq hylghqwldo orjlf v|vwhp ^5`1 Erwk idfwruv ehorqj wr wkh ^3/4` lqwhuydo ^59`/ wkdw lv/ lqqlwh ydoxhv fdq eh dvvrfldwhg zlwk wkh v|vwhp dvvxpswlrqv ^4`1 Wkhuhiruh/ dq lqqlwh odwlfh W @? mW m> A fdq eh ghqhg/ zkhuh=

mW m @ i 5 ?m3   4j  i{ 5 ?m3  {  4j Wkh W odwlfh  Iljxuh 4  kdv d pd{lpxp srlqw {

{

uhsuhvhqwhg e| hylghqfh ^4/4`1 Wklv srlqw uhsuhvhqwv wkh pd{lpxp lqfrqvlvwhqf|/ vlqfh erwk wkh wuxwk dqg

H{dpsoh fodvv4; +hljkw/rqh/wzr, fodvv55 +vhyhq/wzr/wkuhh, fodvv5; +vhyhq/wzr/rqh, fodvv4< +vhyhq/irxu/wkuhh, fodvv55 +vhyhq/wzr/rqh, fodvv5; +hljkw/rqh/rqh, fodvv67 +yh/irxu/wzr, fodvv55 +hljkw/rqh/wkuhh, fodvv5; +vhyhq/wzr/rqh, fodvv67 +yh/irxu/wkuhh, fodvv4; +hljkw/rqh/rqh, fodvv4< +yh/irxu/wkuhh, fodvv5; +vhyhq/wzr/wkuhh, fodvv4< +hljkw/rqh/rqh, fodvv55 +hljkw/rqh/wzr, fodvv5; +vhyhq/wzr/rqh, fodvv4; +hljkw/rqh/rqh, fodvv4< +yh/irxu/wkuhh, fodvv55 +hljkw/rqh/rqh, fodvv5; +vhyhq/wzr/rqh, fodvv4; +hljkw/rqh/rqh, fodvv5; +vhyhq/wzr/wzr, fodvv67 +vl{/wkuhh/wzr,

h h4 h5 h6 h7 h8 h9 h: h; h< h43 h44 h45 h46 h47 h48 h49 h4: h4; h4< h53 h54 h55 h56

Figure 1. Infinitely Valued Lattice. wkh idovlw| ri wkh dvvxpswlrq duh eholhyhg dw wkh vdph srlqw lq wlph1 Lqwxlwlyho|/ wkh wuxwk ri dq dvvxpswlrq lv uhsuhvhqwhg e| wkh hylghqfh

Table 1. Training Set H . Suredelolwlhv

 E@ ' seƒ   E@ ' r%ƒ   E@ ' ree?ƒ   E@ ' e}|ƒ   E@2 ' J?eƒ   E@2 ' |Jƒ   E@2 ' |oeeƒ   E@2 ' sJoƒ   E@ ' J?eƒ   E@ ' |Jƒ   E@ ' |oeeƒ   E 

Fodvv +F,

H

b

22

2H

e

313

318

313

313

319:

313

313

313

313

3166

313

3158

317

31;8:

313

413

3158

319

31476

313

413

3158

319

31476

313

313

313

317

31;8:

313

313

313

313

313

3166 319:

313

31:8

313

313

31:8

3158

317

31:47

313

3158

313

315

31476

319:

313

31:8

317

31476

3166

314:7

314:7

3154:

31637

3146

Table 2. Probabilities Calculated from H . fodvv fodvv4; fodvv4< fodvv55 fodvv5; fodvv67

m

S +h4 fodvv,

Qdòyh Ed|hv

 4 3  3 58 @ 3 58  3 58  3 3 @ 3 3 3 9  3 9  3 5 @ 3 3:5 3 476  3 476  3 476 @ 3 335< 3 3  3 3  3 9: @ 3 3 4 =3

=

3=58 =

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

=

^413/313`/ vlqfh wkh wuxwk lv ixoo| eholhyhg lq dqg qrwklqj lv nqrzq derxw wkh idovlw| ri wkh dvvxpswlrq1 Rq wkh rwkhu kdqg/ wkh ido0 vlw| lv uhsuhvhqwhg e| ^313/413`/ vlqfh wkh idovlw| lv ixoo| eholhyhg lq dqg qrwklqj lv nqrzq derxw wkh wuxwk ri wkh dvvxpswlrq1 Lq dgglwlrq wr wkh lqfrqvlvwhqf| lqwhu0 suhwdwlrq/ dqrwkhu frqfhsw lqwurgxfhg e| wkh Sdudfrq0 vlvwhqw Hylghqwldo Orjlf Surjudpplqj dv rssrvhg wr Fodvvlfdo Orjlf/ lv wkh lqwhusuhwdwlrq ri wkh xqnqrzq  ^313/313`1 Lq wklv fdvh/ wkhuh lv qr lqirupdwlrq hlwkhu rq wkh wuxwk ru idovlw| ri wkh suhplvh1 Lw fdq eh vhhq wkdw wkh glvwlqfwlrq ehwzhhq wkh ido0 vlw| dqg xqnqrzq vlwxdwlrqv fdq surylgh lpsruwdqw lq0 irupdwlrq rq wkh lqihuhqfh dqg ghflvlrq0pdnlqj sur0 fhvvhv1 Iru lqvwdqfh/ ohw xv frqvlghu d v|vwhp hydox0 dwlqj wkh lqyroyhphqw ri d shuvrq lq d fulph1 Li wkh dqvzhu rewdlqhg e| wkh v|vwhp lv qr  ^313/413`  lw fdq frqfoxghg wkdw wkh shuvrq xqghu lqyhvwljdwlrq lv qrw lqyroyhg dqg lw lv/ wkhuhiruh/ lqqrfhqw1 Krzhyhu/ d ^313/313` dqvzhu pd| lqglfdwh d olnho| lqyroyhphqw dqg pruh lqirupdwlrq lv qhhghg iru dqrwkhu frqfoxvlrq wr eh rewdlqhg1 Lq frqyhqwlrqdo orjlf v|vwhpv/ h1j1 wkh Surorj odqjxdjh ^9` ^58`/ wklv glvwlqfwlrq fdqqrw eh gl0 uhfwo| rewdlqhg/ vlqfh wkh wzr rqo| srvvleoh lqwhusuhwd0 wlrqv duh ru 1 Zkhq d dqvzhu lv rewdlqhg/ lw lv qrw nqrzq li lw lv uhodwhg wr wkh idovlw| ru wr wkh odfn ri nqrzohgjh ri wkh wuxwk0ydoxh ri wkh ^56`1 suhplvh  Frqwudu| wr wkh Wkhru| ri Suredelolw| dqg rwkhu txdqwlwdwlyh uhdvrqlqj phdqv/ dqg idf0 wruv duh qrw gluhfwo| uhodwhg1 Iru lqvwdqfh/ lq wkh Wkh0 ru| ri Suredelolw| wkh eholhi lq d jlyhq hyhqw D lv jlyhq e| s D  suredelolw| ri rffxuuhqfh ri D / wkxv/ wkh suredelolw| ri eholhi lq =D lv jlyhq e|  s D 1 Wklv dqg rwkhu uhodwlrqvklsv duh qrw ydolg lq Orjlf Hylghq0 wldo Surjudpv  SOHv1 Ixuwkhupruh/ wkh dqg idfwruv fdq eh frpelqhg lqwr d vlqjoh idfwru  L2X +Lqfrqvlvwhqf|2Xqghughwhuplqdwlrq,  e| Htxd0 wlrq 431 wuxh

qrw0wuxh

qrw0wuxh

Forvhg Zruog Dvvxpswlrq Eholhi

Glveholhi

+ ,

4

+ ,

Eholhi

Table 3. Calculation of the Conditional Probability by the Naive Bayes Classifier.

Glveholhi

+43, zkhuh lv wkh glvwdqfh iurp srlqw lq uhod0 wlrq wr wkh vwudljkw olqh suhvhqwhg lq Iljxuh 51 

g+x> y ,

3=8 5

 433

g+x> y ,

+x> y ,

fodvv4 +h,

= ^3=83> 3=45`=

fodvv4 +h,

= ^3=95> 3=3:`=

fodvv5 +h,

= ^3=35> 3=5 3= 51 = lq wklv fdvh/ wkh dqvzhu surylghg zloo eh wkh hylghqwldo idfwru fdo0 fxodwhg iurp wkh vxsuhph ^4` ri wkh uhvshfwlyh ohdi0 qrgh1 Lq wkh uvw vlwxdwlrq/ dq lpsruwdqw txhvwlrq pxvw eh vroyhg= H{dpsoh 814 lv surylghg wr mxvwli| wkh fulwhulrq fkrvhq1 Ohw T eh d txhu| zlwk wkh iroorzlqj h Lw lv dv0 phdqlqj= vxphg wkdw wkh wudyhuvdo lq DG ohg wkh lqihuhqfh phfk0 dqlvp wr ohdi qrgh QI / pdgh xs ri vxevhw HQI  H  Wdeoh 7  zkhuh H lv wkh vhw ri doo wudlqlqj h{dpsohv1 Reylrxvo|/ zkhq dq QI ohdi lv iruphg e| d vhw HQI  H ri h{dpsohv kdylqj mxvw d fhuwdlq fodvv F / wkh dqvzhu wr txhu| T lv jlyhq e|= H

dl

d > ===> dr

H > ===> Hn

H

n

dl

dl

Hm

lqiruplqj rqo| wkh ydoxhv ri wkh suhglfwlqj dw0

Figure 2. Lattice in the Cartesian Plan.

Lw fdq eh vdlg wkdw vwudljkw olqh lv jlyhq e| wkh htxdwlrq  1 Wkhuhiruh/ dq hylghqwldo fodxvh lv vdlg wr eh shuihfwo| ghqhg li 1 Vlploduo|/ li  lv rqh srlqw ehorz wkh vwudljkw lv fdoohg xqghughwhuplqdwh1 olqh / fodxvh Rq wkh rwkhu kdqg/ li / = ì lv fdoohg ryhughwhuplqdwh ru lqfrqvlvwhqw1 u

{

s

.|

4@3

= ^x> y `

x

x.y ?

u

s

.

y

@ 4

4

= ^x> y ` x

.

y A

4

s

^x> y `

81 Sdudfrqvlvwhqw Hylghqfldo Orjlf Sur0 judpplqj lq Ghflvlrq Wuhh

Dv ghqhg lq ^54` ^47`/ d ghflvlrq wuhh  GW  lv iruphg e| dq vhw ri qrghv dqg d vhw ri dufv olqnlqj wkrvh qrghv1 D qrgh 5 uhsuhvhqwv d vxevhw ri wkh wrwdo wudlqlqj h{dpsoh ru sdwwhuq vhw 1 Li lv wkh urrw/ wkhq  dqg  1 Wkh edvlf Ghflvlrq Wuhh exloglqj dojrulwkp lv vlp0 soh1 Frqvlghulqj d vhw i 4 j ri wudlqlqj h{0 dpsohv dqg d vhw i 4 j ri pxwxdoo| h{foxg0 lqj fodvvhv wkdw frxog eh dvvrfldwhg zlwk rqh h{dpsoh 5 / 111/ wkh edvlf DG dojrulwkp lv ghvfulehg ehorz= 41 Li lv d qrq0hpsw| vhw ri h{dpsohv shuwdlqlqj wr wkh vdph fodvv / wkhq irup d ohdi iurp dqg dqg hqg> 51 Li lv d hpsw| vhw  zlwkrxw h{dpsohv / wkhq irup d ohdi dqg fkrrvh/ dffruglqj wr vrph fulwh0 ulrq/ fodvv iru wkh ohdi1 Wkh F718 dojrulwkp ^55` xvhv wkh prvw iuhtxhqw fodvv h{lvwlqj lq wkh sduhqw0 qrgh> Q

Q

Ql

HQl

H

H

HQl

H

H

F

hl

@

@

Ql

HQl

h > ===> hq

f > ===fp

H

H

fm

H

fl

H

H > ===> Hn

fm

wulexwhv

vxsso|lqj dv lqsxw wkh ydoxhv ri doo dwwulexwhv/ lq0 foxglqj wkh ydoxh ri wkh fodvv

Zklfk lv wkh fulwhulrq wr fkrrvh d jlyhq fodvv lq

d ohdi B

H{dpsoh 814

Zkdw lv wkh fodvv ri whvw h{dpsoh

T F @

B

iHQI j

= vxs

zkhuh vxsi E lv jlyhq e| dffruglqj wr ^59` ^:` ^4` ^5`1 Krzhyhu/ lq H{dpsoh 814 ohdi qrgh suhvhqwv h{0 dpsohv ri erwk fodvv  4 4 dqg 5 51 Khuh/ wkh vxsuhph ri wkh h{dpsohv vhw ehorqjlqj wr hdfk fodvv l  QI  pxvw eh fdofxodwhg/ ghqrwhg i QI €1 Wkh vxsuhphv ri 4 dqg 5 duh jlyhq e|= ^ 4 > G4 `> ===> ^Eq > Gq `j ^pd{+iE4 > ===> Eq j,> pd{+iG4 > ===> Gq j,`

QI

F

F

vxs H

l

H

@

fodvv

F

l

F

F

@

fodvv

> = i QI j 1  5= i QI j Lqwxlwlyho|/ lw zrxog eh srvvleoh wr rewdlq wkhlu uh0 vshfwlyh idfwruv ri lqfrqvlvwhqf|2xqghughwhuplqdwlrq  L2X  ^4` iurp wkdw vxsuhphv dqg fkrrvh dprqj wkh fodvv wkh rqh zlwk wkh vpdoohvw idfwru1 Lq H{dp0 soh 814/ wklv phdqv fkrrvlqj fodvv 5/ vlqfh 

F4

vxs

H

4

@ ^3=95> 3=45`

F

vxs

H

5

@ ^3=4;> 3= 3=3`

hy

@ ^3=3> 3=;`

hy

hy

91 Jhqhudwlrq ri Fodvvlhuv

Lw lv srvvleoh wr jhqhudwh gdwdedvhv zlwk wkh vdph irupdw ri H{dpsoh 814 +Wdeoh 7,/ diwhu hpsor|lqj wkh wudqvirupdwlrqv ghvfulehg deryh  vhh vhfwlrq 61 Wkhuhiruh/ rqh fdq jhqhudwh d fodvvlhu edvhg rq d Sdudfrqvlvwhqw Ghflvlrq Wuhh uhsuhvhqwlqj wkh prgho iru vxfk gdwdedvhv1

Dv fdq eh vhhq lq H{dpsoh 814/ ohdi qrghv pd| frq0 wdlq h{dpsohv ri glhuhqw fodvv1 Wkhuhiruh/ d fkdqjh lv qhhghg lq uhodwlrq wr wkh ruljlqdo GW dojrulwkp1 Wkh fkdqjh frqvlvwv lq lqwurgxflqj d qhz vwrs fulwhulrq wkdw lpsohphqwv d suh0suxqlqj phfkdqlvp/ ghvfulehg ehorz1 Jlyhq d sdudphwhu I ghvfulehg dv dq hqwu| wr wkh v|vwhp/ wkh qhz vwrs fulwhulrq lv=  Li

H lv d qrq0hpsw| vhw ri h{dpsohv dqg wkhuh lv d vxevhw Hl  H ri h{dpsohv ehorqjlqj wr d vdph mH m fodvv/ vr wkdw l  I =

mH m

 wkhq irup d ohdi dqg qlvk>  rwkhuzlvh frqwlqxh jhqhudwlqj wkh wuhh1

Dv vkrzq/ I ehorqjv wr wkh lqwhuydo ^3> 4` dqg hq0 kdqfhv wkh jhqhudol}dwlrq ghjuhh ri wkh wuhh1 Wkh jhq0 hudwhg wuhh whqgv wr eh pruh vshflf/ wkhuhiruh pruh frpsoh{/ dv wkh ydoxh ri I lqfuhdvhv1 Rq wkh rwkhu kdqg/ wkh wuhh whqgv wr eh pruh jhqhulf dqg vlpsohu dv wkh ydoxh ri I ghfuhdvhv1 Wklv ehkdylru fdq eh revhuyhg lq Iljxuhv 6 dqg 71 Wkh wuhhv ghslfwhg zhuh surgxfhg iurp wkh vdph gdwdedvh/ mxvw e| dowhulqj wkh suxqlqj idfwru1 Wkh wuhhv lq Iljxuhv 6 dqg 7 zhuh jhqhudwhg zlwk suxqlqj idfwruv ri 31; dqg 31;8/ uhvshfwlyho|1 Wkh gdwdedvh xvhg dv h{0 dpsoh lv fdoohg LULV/ d zlgho| nqrzq gdwdedvh lq wkh Pdfklqh Ohduqlqj frppxqlw|1 Wkdw gdwdedvh suhvhqwv 483 h{dpsohv pdgh xs ri 7 qxphulfdo ydoxhg dwwulexwhv dqg wkuhh fodvvhv1 B0 lgw+31;/*2ideulflr2gdwdedvhv2lulv2lulv1wuh*,1 |hv B0 vkrzbghflvlrqbwuhh1 d7@irxu @@A fodvv@wkuhh=^413/313`=313 @@A fodvv@wzr=^313/31;;85`=31447; d7@wkuhh @@A fodvv@wzr=^413/313`=313 @@A fodvv@wkuhh=^413/31  wkh vhfrqg uhdvrq iru wkh suh0suxqlqj fkrlfh lv wkh juhdwhu h!flhqf| rewdlqhg zlwk wkh dojrulwkp dv rssrvhg wr srvw0suxqlqj ^55`1 Lq wkrvh wuldov qr frpsdudwlyh whvwv zhuh shuiruphg zlwk rwkhu suxqlqj fulwhuld/ dqg wkhuhiruh lw fdqqrw eh jxdudqwhhg wkdw wkh wuhhv rewdlqhg duh wkh ehvw1 Dq0 rwkhu lpsruwdqw frqvlghudwlrq lv uhodwhg wr wkh fulwhulrq wr fkrrvh wkh dwwulexwh1 Wkh fulwhulrq hpsor|hg e| wkh v|vwhp lv wkh vdph vkrzq lq ^55`  jdlq udwlr1 

:1 Whvw Phwrgrorj|

H

W hps

;1 Suholplqdu| Uhvxowv

Iljxuhv 8 dqg 9 suhvhqw wkh dyhudjh fodvvlfdwlrq hu0 uru dfklhyhg dqg wkh dyhudjh vl}h ri wkh wuhhv jhqhudwhg/ uhvshfwlyho|/ e| dojrulwkpv F718 dqg SGW/ dv wkh lq0 frqvlvwhqf| idfwru lv lqfuhdvhg1 Wkh uhvxowv suhvhqwhg lq Iljxuhv 8 dqg 9 zhuh rewdlqhg iurp wkh dyhudjh ri doo gdwdedvhv xvlqj wkh iroorzlqj surfhgxuh=  Iru hdfk lqfrqvlvwhqf| shufhqwdjh gr @ wkh vxp ri wkh fodvvlfdwlrq huuru ri  doo gdwdedvhv> @ wkh vxp ri wkh qxpehu ri qrghv ri doo  wuhhv rewdlqhg +rqh iru hdfk gdwdedvh,> b @ Q  4

Huuru

D furvv0ydolgdwlrq surfhgxuh zlwk 43 lwhudwlrqv shu gdwdedvh zdv hpsor|hg1 Lw zdv duelwudulo| vwlsxodwhg wkdw :3 ( ri wkh h{dpsohv zrxog eh xvhg iru wudlqlqj dqg 63 ( iru whvwv1 Wkh edvhv hpsor|hg lq wkh wuldov zhuh wkh iroorzlqj= P|rhohwulf/ Ohqvhv/ Kd|hv/ ]rr/ Lulv/ Vsruw/ Zlqh/ Jodvv/ Iodj/ Dgxow/ Dgxow41 D ghwdlohg ghvfulswlrq ri gdwdedvhv hpsor|hg lq wkh wuldov fdq eh irxqg dw kwws=22zzz1lfv1xfl1hgx2pohduq 2POVxppdu|1kwpo/ h{fhsw iru wkh Vsruw gdwdedvh/ vkrzq lq ^55` ^4:`1 Iru hdfk gdwdedvh vhyhudo fodvvl0 huv  SGWv  zhuh rewdlqhg/ vlqfh vhyhudo suxqlqj idfwruv zhuh whvwhg1 Iurp wkdw vhw ri fodvvlhuv/ wkh rqh zlwk wkh ehvw fodvvlfdwlrq udwh zdv fkrvhq1 Ixuwkhu0 pruh/ dq dgglwlrqdo sdudphwhu zdv hpsor|hg/ uhs0 uhvhqwlqj wkh shufhqwdjh ri lqfrqvlvwhqw h{dpsohv wkdw pxvw eh dxwrpdwlfdoo| jhqhudwhg ehiruh wudlqlqj1 L

Vl}h

Dyhudjh



Huuru

Huuru

DyhudjhbVl}h @ Vl}h Q

zkhuh Q lv wkh qxpehu ri gdwdedvhv hpsor|hg1 Dv wr wkh fodvvlfdwlrq uhvxowv rewdlqhg/ lw fdq eh vhhq wkdw iru edvhv suhvhqwlqj ihz lqfrqvlvwhqw h{dp0 sohv wkh ehkdylru ri wkh SGW v|vwhp lv yhu| vlplodu wr 4 Lw lv lpsruwdqw wr vwuhvv wkdw/ lq wklv sdshu/ d gdwdedvh lv frqvlghuhg lqfrqvlvwhqw zkhq lw suhvhqwv h{dpsohv kdylqj lghq0 wlfdo suhglfwlqj dwwulexwhv ydoxhv/ exw glhulqj iurp wkh ydoxh ri wkh fodvv dwwulexwh1 Wkhuhiruh/ d gdwdedvh  lv frqvlghuhg 433( lqfrqvlvwhqw zkhq iru dq h{dpsoh e M  dq h{dpsoh e M  h{lvwv vxfk wkdw e lv lqfrqvlvwhqw lq uhodwlrq wr e1

F7181 Lw fdq eh revhuyhg lq Iljxuh 8 wkdw iru lqfrq0 vlvwhqf| udwhv ydu|lqj ehwzhhq 313 dqg 316/ wkh v|vwhp suhvhqwv uhvxowv forvh wr F7181 Krzhyhu/ iurp lqfrqvlv0 whqf| idfwru 317 rq/ wkh uhvxowv hylghqfh d sdosdeoh lp0 suryhphqw lq wkh fodvvlfdwlrq huuru udwh/ surylqj wkdw wkh v|vwhp lv pruh vwdeoh zkhq dssolhg wr gdwdedvhv zlwk kljkhu lqfrqvlvwhqf| udwhv1 Wkh uhvxowv glvsod|hg lq Iljxuh 9 vkrz wkdw iru gdwdedvhv zlwk vpdoo lqfrqvlvwhqf| udwhv wkh wuhhv jhq0 hudwhg whqg wr eh pruh frpsoh{ dv rssrvhg wr wkrvh rewdlqhg zlwk dojrulwkp F7181 Lw lv eholhyhg wkdw wkh h{fhvvlyh jhqhudwlrq ri lqwhuydov iru fhuwdlq dwwulexwhv/ fdxvhg e| glvfuhwl}dwlrq/ lv wkh pdlq idfwru uhvsrqvleoh iru wkh rewdlqphqw ri frpsoh{ ghflvlrq wuhhv1 Ixuwkhu0 pruh/ wkh suxqlqj fulwhulrq fdq dovr frqwulexwh wr wkh h{fhvvlyh jurzwk ri wuhhv1 Krzhyhu/ diwhu idfwru 318 lw lv revhuyhg wkdw wkh v|vwhp whqgv wr surgxfh vpdoohu wuhhv1

gdwdedvhv1 Wkh gdwdedvhv zhuh vxeplwwhg wr wkh dxwr0 pdwlf lqfrqvlvwhqw h{dpsoh jhqhudwlrq surfhgxuh1 Lw fdq eh vhhq iurp Iljxuh : wkdw wkh v|vwhp gh0 yhorshg suhvhqwhg ehwwhu uhvxowv lq whupv ri fodvvlfd0 wlrq iru doo gdwdedvhv wkdw wkh F718 v|vwhp/ zkhq wkrvh edvhv vkrzhg 433( lqfrqvlvwhqf|1 Dv wr wkh wuhh vl}h/ d Iljxuh ; vkrzv wkdw wkh wuhhv jhqhudwhg iru 433( lqfrq0 vlvwhqw gdwdedvhv doo dovr vpdoohu iru prvw gdwdedvhv1

Figure 7. Error of Classification Obtained of 100% Inconsistent Databases.

Figure 5. Average Error Classification.

Figure 6. Average Size Trees.

Iljxuhv : dqg ; suhvhqw wkh uhvxowv rewdlqhg e| v|v0 whpv F718 dqg DGS zkhq dssolhg wr 433( lqfrqvlvwhqw

Figure 8. Size of the Trees Obtained of 100% Inconsistent Databases.

^9` Fdvdqryd/ P1D1> Jlruqr/ I1D1F1> Ixuwdgr/ D1 O1 I1/