An Efficient Privacy Preserving Search Scheme with ... - Science Direct

7 downloads 0 Views 242KB Size Report
outsource the data to a cloud storage facility saves the data management and ... investigated on this issue in the recent days and proposed several ciphertext ...
Available online at www.sciencedirect.com

ScienceDirect Procedia Technology 25 (2016) 310 – 317

*OREDO&ROORTXLXPLQ5HFHQW$GYDQFHPHQWDQG(IIHFWXDO5HVHDUFKHVLQ(QJLQHHULQJ6FLHQFHDQG 7HFKQRORJ\ 5$(5(67 

$QHIILFLHQWSULYDF\SUHVHUYLQJVHDUFKVFKHPHZLWKDFFHVVFRQWURO IRUFORXGGDWDFHQWHUV 7UHVD0DU\*HRUJH9 6KDPQD6-XELODQW-.L]KDNNHWKRWWDP Department of Computer Science and Engineering, Musaliar College of Engineering and Technology, Pathanamthitta 689653, India

$EVWUDFW 7KHLQWHUQHWDQGWKHHPHUJHQFHRIVRFLDOQHWZRUNVSURGXFHWHUDE\WHVRIGDWDHYHU\GD\,QWKLVELJGDWDVFHQDULRWKHDELOLW\WR RXWVRXUFHWKHGDWDWRDFORXGVWRUDJHIDFLOLW\VDYHVWKHGDWDPDQDJHPHQWDQGVWRUDJHIDFLOLW\FRVW6RPHPDMRUFKDOOHQJHVZLWK WKLV VFKHPH DUH SURYLGLQJ VHFXULW\ DQG HQVXULQJ WKH SULYDF\ RI WKH RXWVRXUFHG GDWD $OWKRXJK GDWD VHFXULW\ FDQ EH DFKLHYHG WKURXJK HQFU\SWLRQ VHDUFKLQJ RQ HQFU\SWHG GDWD EHFRPH D FRPSOH[ WDVN 7KH SURSRVHG ZRUN VXJJHVWV DQ HIILFLHQW VHDUFKLQJ VFKHPHIRUHQFU\SWHGFORXGGDWDEDVHGRQKLHUDUFKLFDOFOXVWHULQJRIGRFXPHQWV7KHKLHUDUFKLFDOFOXVWHULQJPHWKRGSUHVHUYHVWKH VHPDQWLF UHODWLRQVKLS EHWZHHQ WKH GRFXPHQWV LQ WKH HQFU\SWHG GRPDLQ WR VSHHG XS WKH VHDUFK SURFHVV &RQVHTXHQWO\ WKH SURSRVHG V\VWHP KDV OLQHDU FRPSXWDWLRQDO FRPSOH[LW\ GXULQJ WKH VHDUFK SKDVH LQ UHVSRQVH WR DQ H[SRQHQWLDO LQFUHDVH LQ WKH QXPEHURIGRFXPHQWV7KHV\VWHPDOVRHQVXUHVGDWDSULYDF\E\SURYLGLQJRQO\OLPLWHGDFFHVVRIWKHGRFXPHQWVWRWKHGLIIHUHQW W\SHVRIXVHUVE\LPSOHPHQWLQJDFFHVVFRQWUROPHFKDQLVPVUHVXOWLQJLQPRUHVHFXUHGGDWDVWRUDJHLQWKHFORXG ‹7KH$XWKRUV3XEOLVKHGE\(OVHYLHU/WG © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 3HHUUHYLHZXQGHUUHVSRQVLELOLW\RIWKHRUJDQL]LQJFRPPLWWHHRI5$(5(67 (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of RAEREST 2016 Keywords:VHDUFKDEOHHQFU\SWLRQPXOWLNH\ZRUGVHDUFKKLHUDUFKLFDOFOXVWHULQJDFFHVVFRQWURO

,QWURGXFWLRQ $IXQGDPHQWDODSSOLFDWLRQRIFORXGFRPSXWLQJLVWKHDELOLW\WRRXWVRXUFHUHPRWHGDWDWRH[WHUQDOFORXGVHUYHUVWR HQDEOHVFDODEOHGDWDVWRUDJH7KHFORXGVHUYHUFDQSURYLGHDKXJHVWRUDJHVSDFHDQGKLJKFRPSXWDWLRQDOSRZHU>@ 



&RUUHVSRQGLQJDXWKRU E-mail address:YWUHVDPJ#JPDLOFRP

2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of RAEREST 2016 doi:10.1016/j.protcy.2016.08.112

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

311

$FFRUGLQJO\HQWHUSULVHVDQGXVHUVZKRRZQDODUJHDPRXQWRIGDWDFDQRYHUFRPHWKHLUKDUGZDUHOLPLWDWLRQV$VWKLV WHFKQLTXHLVEHFRPLQJPRUHDQGPRUHSRSXODUWKHGDWDYROXPHLQFORXGVWRUDJHIDFLOLWLHVLVH[SHULHQFLQJDGUDPDWLF JURZWK $PDMRUFRQFHUQUHJDUGLQJWKHXVHRIFORXGFRPSXWLQJIRUGDWDVWRUDJHLVWKDWWKHRXWVRXUFHGGDWDPD\FRQWDLQ VHQVLWLYH LQIRUPDWLRQ VXFK DV SKRWRV HPDLOV EDQN VWDWHPHQWVHWF ,I WKHGDWD LV VWRUHG LQD SXEOLF FORXGZKLFK LV DFFHVVLEOH WR VHYHUDO RWKHU SHRSOH ZLWKRXW HIILFLHQW SURWHFWLRQ PHFKDQLVP LW FDQ OHDG WR VHYHUH SULYDF\ DQG FRQILGHQWLDOLW\ YLRODWLRQV >@ 7KH WUDGLWLRQDO ZD\ WR SUHYHQW VHQVLWLYH GDWD LV HQFU\SWLRQ 7KH GRFXPHQWV DUH HQFU\SWHG EHIRUH RXWVRXUFLQJ WKHP WR WKH FORXG 7KLV KRZHYHU LQWURGXFHV IXUWKHU FRPSOH[LWLHV GXULQJ WKH VHDUFK RSHUDWLRQ RQ HQFU\SWHG GDWD ZKHQ OHJLWLPDWH XVHUV QHHG DFFHVV WR WKRVH GRFXPHQWV 0DQ\ UHVHDUFKHUV KDYH LQYHVWLJDWHGRQWKLVLVVXHLQWKHUHFHQWGD\VDQGSURSRVHGVHYHUDOFLSKHUWH[WVHDUFKVFKHPHVEDVHGRQFU\SWRJUDSK\ WHFKQLTXHV >@ >@ +RZHYHU WKHVH PHWKRGV QHHG H[WHQVLYH FRPSXWDWLRQV DQG VXIIHU IURP KLJK WLPH FRPSOH[LW\ +HQFH WKHVH PHWKRGV DUH QRW VXLWDEOH IRU D ELJ GDWD HQYLURQPHQW >@ $QRWKHU PDMRU GUDZEDFN LV WKDW WKH UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVFRQFHDOHGGXULQJWKHHQFU\SWLRQSURFHVV0DLQWDLQLQJVXFKDUHODWLRQVKLSLV LPSRUWDQWDVLWUHSUHVHQWVWKHSURSHUWLHVRIWKHGRFXPHQWV ,W LV DOVR QHFHVVDU\ WRSURYLGH FRQWUROOHGDFFHVV WR WKHRXWVRXUFHG FORXG GDWD WR GLIIHUHQW FODVVHVRI XVHUV 7KH V\VWHP PXVW SUHYHQW XQDXWKRUL]HG XVHUV IURP XSORDGLQJ FRUUXSWHG GRFXPHQWV WR WKH FORXG VHUYHU )RU H[DPSOH FRQVLGHUDXQLYHUVLW\FORXGLQZKLFKWKHVWXGHQWPDUNOLVWVDUHVWRUHGLQWKHFORXG,QVXFKDVFHQDULRWKHVWXGHQWV PXVWEHSUHYHQWHGIURPXSORDGLQJWKHLURZQPDUNOLVWVWKHUHE\RYHUZULWLQJWKHRULJLQDOFRS\7RSUHYHQWWKLVWKH V\VWHP ZLOO SURYLGH RQO\ GRZQORDG SULYLOHJHV WR WKH VWXGHQW XVHUV RI WKH FORXG 3URSHU LPSOHPHQWDWLRQ RI DFFHVV FRQWUROPHFKDQLVPVZLOOHQVXUHVXFKOLPLWHGDFFHVVWRWKHGLIIHUHQWFODVVRIFORXGXVHUV 7KHSURSRVHGV\VWHPXVHVDVHDUFKLQJVFKHPHEDVHGRQPXOWLNH\ZRUGUDQNHGVHDUFK,QDGGLWLRQDKLHUDUFKLFDO FOXVWHULQJPHWKRGLVXVHGWRFOXVWHUWKHGRFXPHQWVEDVHGRQDUHOHYDQFHVFRUH7KHUHLVDOVRDOLPLWRQWKHPD[LPXP VL]HRIHDFKFOXVWHU,IWKHVL]HRIDFOXVWHUH[FHHGVWKLVOLPLWWKHFOXVWHULVIXUWKHUGLYLGHGLQWRVXEFOXVWHUVXQWLOWKH VL]H RI HDFK FOXVWHU IDOO EHORZ WKH WKUHVKROG YDOXH 'XULQJ WKH VHDUFK SKDVH WKH V\VWHP LWHUDWLYHO\ GHWHUPLQHV WKH PRVWUHOHYDQWFOXVWHU2QO\WKRVHGRFXPHQWVLQWKDWFOXVWHUQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFK WLPH 5HODWHGZRUNV 0DQ\ UHVHDUFKHV KDYH SURSRVHG VHYHUDO PHWKRGV IRU VHDUFK RQ HQFU\SWHG GDWD LQ WKH FORXG 6RPH RI WKHP DQG WKHLUGUDZEDFNVDUHGLVFXVVHGEHORZ 2.1. Searchable encryption based on single keyword ,QWKHPHWKRGSURSRVHGE\6RQJHWDO>@HDFKZRUGLQWKHGRFXPHQWLVHQFU\SWHGLQGHSHQGHQWO\7KLVUHTXLUHV VFDQQLQJ RI WKH HQWLUH GDWD FROOHFWLRQ ZRUG E\ ZRUG 7KH PDMRU GUDZEDFN RI WKLV PHWKRG LV WKH KLJK VHDUFK FRVW UHVXOWLQJIURPWKHVFDQQLQJRIHQWLUHGRFXPHQW&DVKHWDO>@SURSRVHGDV\PPHWULFVHDUFKDEOHHQFU\SWLRQVFKHPH 7KRXJKLWSURYLGHVKLJKHIILFLHQF\IRUODUJHGDWDEDVHVLWODFNVDUDQNPHFKDQLVP,IDODUJHQXPEHURIGRFXPHQWV FRQWDLQWKHVHDUFKHGNH\ZRUGWKHXVHUKDVWRPDQXDOO\VHOHFWZKDWWKH\DFWXDOO\ZDQWZKLFKLQWXUQLQFUHDVHWKH RYHUDOOVHDUFKWLPH 2.2. Searchable encryption based on multiple keywords &DR HW DO >@SURSRVHG DQ DUFKLWHFWXUHZKLFK SHUIRUP PXOWLNH\ZRUG VHDUFK DQG DOVRVXSSRUW UHVXOW UDQNLQJ E\ XVLQJNQHDUHVWQHLJKERUDOJRULWKP+RZHYHUWKHVHDUFKWLPHRIWKLVPHWKRGJURZVH[SRQHQWLDOO\LQUHVSRQVHWRDQ H[SRQHQWLDOO\ LQFUHDVLQJ VL]H RI WKH GRFXPHQW FROOHFWLRQV 6XQ HW DO >@ SURSRVHG D QHZ DUFKLWHFWXUH 7KRXJK LW SURYLGHV EHWWHU HIILFLHQF\ WKH UHOHYDQFH EHWZHHQ WKH GRFXPHQWV LV LJQRUHG DQG KHQFH LW GRHV QRW UHWXUQ WKH PRVW UHOHYDQWUHVXOWV

312

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

2.3. Boolean Symmetric Searchable Encryption 7DULN0RDWD]DQG$EGXOODWLI6KLIND>@SURSRVHGDV\VWHPIRUVHDUFKLQJPXOWLSOHNH\ZRUGVRYHUHQFU\SWHGGDWD XVLQJ%RROHDQ6\PPHWULF6HDUFKDEOH(QFU\SWLRQ %66( ,WXVHV*UDP6FKPLGWSURFHVVWRRSWLPL]HWKHVHDUFK SURFHVV,WFRQVLGHUVDUELWUDU\ERROHDQH[SUHVVLRQVVXFKDVFRQMXQFWLRQVDQGGLVMXQFWLRQVRINH\ZRUGVDQGWKHLU FRPSOHPHQWRQNH\ZRUGV 2.4. Fuzzy Keyword Search 7KHDERYHPHQWLRQHGVHDUFKLQJVFKHPHVZLOOUHWULHYHILOHVRQO\EDVHGRQH[DFWPDWFKRIWKHNH\ZRUG$Q\W\SRV DQGLQFRQVLVWHQFLHVLQWKHIRUPDWZLOOQRWUHWXUQWKHUHTXLUHGGRFXPHQWV-/LHWDO>@SURSRVHGDZLOGFDUGEDVHG WHFKQLTXH WR FUHDWH HIILFLHQWIX]]\NH\ZRUG VHWV WKDW FDQEHXVHG IRU PDWFKLQJ UHOHYDQW GRFXPHQWV :KHQHYHU WKH H[DFWPDWFKVHDUFKIDLOVWKHVHDUFKUHVXOWLVSURYLGHGEDVHGRQWKHIX]]\NH\ZRUGGDWDVHW 6\VWHPPRGHODQGSUREOHPIRUPXODWLRQ 7KH SURSRVHG V\VWHP XVHV D YHFWRU VSDFH PRGHO LQ ZKLFK HYHU\ GRFXPHQW LV UHSUHVHQWHG E\ D YHFWRU (YHU\ GRFXPHQWFDQEHVHHQDVDSRLQWLQDKLJKGLPHQVLRQDOVSDFH7KHGRFXPHQWVDUHFODVVLILHGLQWRFDWHJRULHVE\XVLQJD FOXVWHULQJPHWKRG7KHSURSRVHGV\VWHPXVHVDKLHUDUFKLFDOFOXVWHULQJLQGH[LHDKLHUDUFK\RIFOXVWHUVDWGLIIHUHQW OHYHOVLVXVHG(DFKFOXVWHUKDVDFRQVWUDLQWRQWKHPLQLPXPUHOHYDQFHVFRUHEHWZHHQWKHGRFXPHQWVLQWKDWFOXVWHU :KHQDQHZGRFXPHQWLVDGGHGWRWKHFOXVWHUWKHFRQVWUDLQWPD\JHWEURNHQ,QVXFKDFDVHDQHZFOXVWHUFHQWHUZLOO EHDGGHGWRWKHV\VWHP$IWHUWKDWDOOWKHFOXVWHUFHQWHUVZLOOEHUHVHOHFWHGDQGDOOWKHGRFXPHQWVZLOOEHUHDVVLJQHG 7KHPD[LPXPVL]HRIWKHFOXVWHULVDOVRIL[HGIRUHDFKOHYHO,IWKHVL]HRIDFOXVWHUH[FHHGVWKHPD[LPXPOLPLWWKDW FOXVWHUZLOOEHGLYLGHGLQWRPXOWLSOHVXEFOXVWHUV:KHQDVHDUFKLVEHLQJSHUIRUPHGRQO\WKRVHGRFXPHQWVLQWKH UHOHYDQWFOXVWHUVQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFKWLPH 'XULQJWKHVHDUFKSKDVHWKHUHOHYDQFHVFRUHEHWZHHQWKHVHDUFKTXHU\DQGWKHFOXVWHUFHQWHUVRIWKHILUVWOHYHO LQGH[ LV FRPSXWHG 7KH FOXVWHU FHQWHU ZLWK PD[LPXP UHOHYDQFH VFRUH ZLOO EH VHOHFWHG DQG WKLV SURFHVV ZLOO EH LWHUDWLYHO\UHSHDWHGIRUWKHFKLOGUHQLQWKHQH[WOHYHOFOXVWHUVXQWLOWKHVPDOOHVWFOXVWHULQWKHORZHVWOHYHOLVIRXQG,I WKLVFOXVWHUGRHVQRWFRQWDLQWKHGHVLUHGGRFXPHQWWKHV\VWHPZLOOWUDFHEDFNWRWKHSDUHQWRIWKHVPDOOHVWFOXVWHU 7KLVSURFHVVLVUHSHDWHGXQWLOWKHGHVLUHGGRFXPHQWLVIRXQGRUWKHURRWFOXVWHULVUHDFKHG 3.1. System architecture 7KHV\VWHPDUFKLWHFWXUHLVFRPSRVHGRIPDLQO\IRXUHQWLWLHVDVVKRZQLQ)LJ7KH\DUHWKHGDWDRZQHUWKHGDWD XVHUWKHFORXGVHUYHUDQGWKHFORXGPDQDJHU7KHGDWDRZQHULVWKHPRGXOHUHVSRQVLEOHIRUFROOHFWLQJGRFXPHQWV SHUIRUPLQJ WKH HQFDSVXODWLRQ EXLOGLQJ WKH GRFXPHQW LQGH[ DQG RXWVRXUFLQJ WKH HQFU\SWHG GRFXPHQW WR WKH FORXG VHUYHU7KHGDWDXVHULVWKHFRQVXPHURIWKHGRFXPHQWVDQGWKH\PXVWKDYHQHFHVVDU\DXWKRUL]DWLRQEHIRUHDFFHVVLQJ WKLVGDWD7KHFORXGVHUYHULVWKHHQWLW\ZKLFKSURYLGHVDKXJHVWRUDJHVSDFHDQGQHFHVVDU\FRPSXWDWLRQDOUHVRXUFHV IRU WKH FLSKHUWH[W VHDUFK7KH FORXG PDQDJHU LV UHVSRQVLEOH IRU HQVXULQJ DFFHVV FRQWURO ,W EORFNV DOOXQDXWKRUL]HG UHTXHVWVIRUWKHGDWDE\FKHFNLQJWKHSULYDF\VHWWLQJVRIHDFKXVHU:KHQWKHFORXGVHUYHUUHFHLYHVDUHTXHVWIRUD GRFXPHQWWKLVUHTXHVWLVYHULILHGE\WKHFORXGPDQDJHU8SRQVXFFHVVIXOYHULILFDWLRQWKHFORXGVHUYHUUHWXUQVWKH UHTXLUHGGRFXPHQWV 

313

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

 )LJ6\VWHPDUFKLWHFWXUH

,PSOHPHQWDWLRQGHWDLOV 4.1. MRSE-HCI architecture 7KHSURSRVHGV\VWHPXVHV0XOWLNH\ZRUG5DQNHG6HDUFKRYHU(QFU\SWHGGDWDEDVHGRQ+LHUDUFKLFDO&OXVWHULQJ ,QGH[ 056(+&,  VFKHPH LQ ZKLFK WKH YHFWRU VSDFH PRGHO LV DGRSWHG IURP WKH 0XOWLNH\ZRUG 5DQNHG 6HDUFK RYHU (QFU\SWHG GDWD 056(  >@ DQG WKH LQGH[LQJ LV EDVHG RQ +LHUDUFKLFDO ,QGH[LQJ 6WUXFWXUH +&,  >@ 7KH GHWDLOHGGHVFULSWLRQLVDVIROORZV(YHU\GRFXPHQWLVLQGH[HGE\DYHFWRUDQGHDFKGLPHQVLRQRIWKHYHFWRUUHIHUVWR D NH\ZRUG 7KH YDOXH RI HDFK GLPHQVLRQ LQGLFDWHV ZKHWKHU WKH NH\ZRUG DSSHDUV LQ WKH SDUWLFXODU GRFXPHQW 7KH TXHU\LVDOVRUHSUHVHQWHGLQDVLPLODUZD\DVDYHFWRU7KHOHQJWKVRIWKHGRFXPHQWYHFWRUVDUHQRUPDOL]HGDQGKHQFH WKH GLVWDQFH RI SRLQWV LQ WKH QGLPHQVLRQDO VSDFH UHIOHFWV WKH UHOHYDQFH RI FRUUHVSRQGLQJ GRFXPHQWV 'XULQJ WKH VHDUFKSKDVHWKHFORXGVHUYHUFRPSRQHQWFRPSXWHVWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGWKHGRFXPHQWV YHFWRU E\ FRPSXWLQJ WKHLU LQQHU SURGXFW :KHQ WKH GRFXPHQWV DUH VWRUHG LQ WKH FORXG LQ DQ HQFU\SWHG IRUP WKH VHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVZLOOEHORVW+RZHYHUWKHSURSRVHGV\VWHPXVHVDFOXVWHULQJPHWKRG,Q WKHQGLPHQVLRQDOVSDFHWKHSRLQWVRIKLJKO\UHOHYDQWGRFXPHQWVDUHYHU\FORVHWRHDFKRWKHUWKHUHE\WKHVHPDQWLF UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVSUHVHUYHG :KHQWKHYROXPHRIGDWDLQWKHFORXGH[SHULHQFHVDGUDPDWLFJURZWKWKHWUDGLWLRQDOVHDUFK DSSURDFKHVZLOOEH YHU\LQHIILFLHQWDQGKDVDQH[SRQHQWLDOJURZWK7RLPSURYHWKHVHDUFKHIILFLHQF\DKLHUDUFKLFDOFOXVWHULQJPHWKRGLV XVHG7KHKLHUDUFKLFDODSSURDFKFOXVWHUVWKHGRFXPHQWVEDVHGRQWKHUHOHYDQFHVFRUHDWGLIIHUHQWOHYHOV:KHQWKH VL]H RI WKH FOXVWHU UHDFKHV WKH PD[LPXP FOXVWHU VL]H WKUHVKROG WKH V\VWHP SDUWLWLRQV WKH FOXVWHUV LQWR VXEFOXVWHUV XQWLO WKH FULWHULRQ LV VDWLVILHG :KHQ WKH GRFXPHQWV DUH EHLQJ XSORDGHG WKH GDWD RZQHU DOVR EXLOGV DQ HQFU\SWHG LQGH[$V\PPHWULFNH\HQFU\SWLRQDOJRULWKPLVXVHGDQGWKHGRFXPHQWVDUHHQFU\SWHGXVLQJVRPHUDQGRPQXPEHUV DQGDVHFUHWNH\:KHQWKHGDWDXVHUQHHGVDSDUWLFXODUGRFXPHQWDTXHU\LVVXEPLWWHGWRWKHFORXGVHUYHU7KHFORXG VHUYHUZLOOUHWXUQWKHWDUJHWGRFXPHQWWRWKHGDWDXVHU 7KHIXQFWLRQVRIWKHGLIIHUHQWFRPSRQHQWVDUHGHVFULEHGEHORZ .H\JHQ7KLVIXQFWLRQZLOOJHQHUDWHWKHVHFUHWNH\‫݇ݏ‬XVHGWRHQFU\SWWKHLQGH[DQGWKHGRFXPHQWV)RUWKLVD  ൅ — ൅ ͳሻELWYHFWRULQZKLFKHDFKHOHPHQWLVDQLQWHJHURUDQGWZRLQYHUWLEOH  ൅ ˜ ൅ ͳሻ ൈ ሺ ൅ ˜ ൅ ͳሻPDWULFHV M1DQGM2ZKRVHHOHPHQWVDUHUDQGRPLQWHJHUVDUHJHQHUDWHG ,QGH[ 7KLVSKDVHJHQHUDWHV WKH HQFU\SWHG LQGH[E\ XVLQJ WKH DERYH JHQHUDWHG VHFUHWNH\ 7KH FOXVWHULQJ SURFHVV

314

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

DOVRWDNHVSODFHLQWKLVSKDVH7KHLQGH[DOJRULWKPLVDVIROORZV   $WRNHQL]HUDQGDSDUVHUWRROVDUHXVHGWRH[WUDFWDOOWKHNH\ZRUGVSUHVHQWLQWKHGRFXPHQW   7KHGRFXPHQWVDUHWUDQVIRUPHGLQWRDFROOHFWLRQRI'RFXPHQW9HFWRUV '9    $ 4XDOLW\ +LHUDUFKLFDO &OXVWHULQJ 4+&  PHWKRG LV XVHG WR JHQHUDWH WKH LQIRUPDWLRQ DERXW 'RFXPHQWV &ODVVLILFDWLRQ '& DQGWKHFROOHFWLRQRI&OXVWHU&HQWHUV9HFWRUV &&9)ሼܿଵ ‫ܿ ڮ‬௡ ሽ   7KH GDWD RZQHU SHUIRUPV WKH GLPHQVLRQH[SDQGLQJ DQG YHFWRU VSOLWWLQJ SURFHGXUH RQ HYHU\ GRFXPHQW YHFWRU D 'XULQJGLPHQVLRQH[SDQGLQJSURFHGXUHHDFKYHFWRULQ&&9LVH[WHQGHGWRሺ ൅ ˜ ൅ ͳሻELWORQJ YHFWRU ZKHUH WKH YDOXH LQ ൅ ŒሺͲ ൑ Œ ൑ ˜ሻGLPHQVLRQ LV DQ LQWHJHU QXPEHU JHQHUDWHG UDQGRPO\ DQGWKHODVWGLPHQVLRQLVVHWWR E 'XULQJWKHYHFWRUVSOLWWLQJSURFHGXUHHYHU\H[WHQGHGGRFXPHQWYHFWRULVVSOLWLQWRWZRሺ ൅ ˜ ൅ ͳሻELWORQJ YHFWRUV ᇱ DQG ᇱᇱ  XVLQJ WKH DERYH JHQHUDWHGሺ ൅ ˜ ൅ ͳሻELW YHFWRUDV D VSOLWWLQJ LQGLFDWRU (QFU\SWLRQ7KHSODLQGRFXPHQWVHW'LVHQFU\SWHGXVLQJDQ\VHFXUHV\PPHWULFHQFU\SWLRQDOJRULWKPVXFKDV$(6 7KHHQFU\SWHGGRFXPHQWLVWKHQRXWVRXUFHGWRWKHFORXG 7UDSGRRU:KHQDXVHUVXEPLWVDTXHU\WKHFORXGPDQDJHUZLOODQDO\VHWKHTXHU\DQGYHULI\WKDWWKHUHTXHVWFRPH IURP DQ DXWKHQWLFDWHG XVHU7KHNH\ZRUGV LQ WKHTXHU\ DUH DQDO\]HG ZLWK WKHKHOSRIGLFWLRQDU\ ': DQG DTXHU\ YHFWRU49LVJHQHUDWHGZKLFKLVWKHQH[WHQGHGWRDሺ ൅ ˜ ൅ ͳሻELWYHFWRU 6HDUFK:KHQWKHFORXGVHUYHUUHFHLYHVWKHTXHU\YHFWRUWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGLQGH[ YHFWRURIFOXVWHUVDUHFRPSXWHGLQDKLHUDUFKLFDOPDQQHU,WILQDOO\FKRVHVWKHFOXVWHUZLWKPD[LPXPUHOHYDQFHVFRUH DVWKHWDUJHWFOXVWHUDQGVHDUFKIRUWKHUHTXLUHGGRFXPHQW,IWKHGRFXPHQWLVQRWIRXQGLWEDFNWUDFNVDQGFKRRVHD GLIIHUHQWFOXVWHUZLWKQH[WKLJKHVWVFRUH7KLVSURFHVVLVUHSHDWHGXQWLOWKHWDUJHWGRFXPHQWLVIRXQG 'HFU\SWLRQ7KLVFRPSRQHQWLVXVHGE\WKHGDWDXVHUWRGHFU\SWWKHUHWXUQHGGRFXPHQW7KHVHFUHWNH\LVH[FKDQJHG WRWKHXVHUWKURXJKDVHFXUHPHFKDQLVP 4.2. Relevance measure ,QWKHSURSRVHGV\VWHPWKHFRQFHSWRIFRRUGLQDWHPDWFKLQJLVXVHGDVDUHOHYDQFHPHDVXUH7KHUHOHYDQFHVFRUH EHWZHHQGRFXPHQWdiDQGTXHU\‫ݍ‬௪ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ௡ା௩ାଵ



ܴ௤ௗ௜ ൌ ෍ ሺ‫ݍ‬௪ǡ௧ ൈ ݀௜ǡ௧ ሻ ௧ୀଵ

 

7KHUHOHYDQFHVFRUHEHWZHHQTXHU\‫ݍ‬௪ DQGFOXVWHUFHQWHU݈ܿ௜ǡ௝ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ௡ା௩ାଵ



ܴ௤௖௜ ൌ ෍ ሺ‫ݍ‬௪ǡ௧ ൈ ݈ܿ௜ǡ௝ǡ௧ ሻ ௧ୀଵ

 

7KHUHOHYDQFHVFRUHEHWZHHQGRFXPHQW݀௜ DQG݀௝ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ௡ା௩ାଵ



ܴௗௗ௜ ൌ ෍ ሺ݀௜ǡ௧ ൈ ݀௝ǡ௧ ሻ ௧ୀଵ

 

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

315

4.3. Quality Hierarchical Clustering Algorithm 6RPH RI WKH PRVW ZLGHO\ XVHG DQG SRSXODU FOXVWHULQJ DOJRULWKPV DUH K-means DQG K-medoids ,Q WKHVH DOJRULWKPVWKHYDOXHRIkLVIL[HGHDUOLHU+RZHYHULQDELJGDWDVFHQDULRLWLVLPSRVVLEOHWRSUHGLFWWKHYDOXHRIN HDUO\ 7KH FOXVWHUV DUH WR EH JHQHUDWHG G\QDPLFDOO\ +HQFH D dynamic K-means algorithm LV XVHG 7R NHHS WKH FOXVWHUVGHQVHDQGFRPSDFWDPLQLPXPUHOHYDQFHWKUHVKROGYDOXHLVPDLQWDLQHG:KLOHSHUIRUPLQJWKHFOXVWHULQJ SURFHVVWKHUHOHYDQFHVFRUHEHWZHHQHDFKGRFXPHQWDQGLWVFOXVWHUFHQWHULVFRPSXWHGDQGLIWKLVYDOXHLVOHVVWKDQ WKH PLQLPXP WKUHVKROG YDOXH D QHZ FOXVWHU LV DGGHG DQG DOO WKH GRFXPHQWV DUH UHDVVLJQHG DFFRUGLQJO\ 7KLV SURFHGXUHLVH[HFXWHGLWHUDWLYHO\XQWLODVWDEOHYDOXHRIkLVUHDFKHG 4.4. Search Algorithm 7RVHDUFKIRUDSDUWLFXODUGRFXPHQWWKHFORXGVHUYHUILUVWQHHGVWRILQGWKHFOXVWHUWKDWPRVWPDWFKWKHTXHU\7KH FORXGVHUYHUXVHVWKHFOXVWHULQGH[‫ܫ‬௖ DQGDQLWHUDWLYHSURFHGXUHDVGHVFULEHGEHORZWRILQGWKHWRSPDWFKHGFOXVWHU   7KHFORXGVHUYHUILUVWFRPSXWHVWKHUHOHYDQFHVFRUHYDOXHEHWZHHQTXHU\ܶ௪ DQGHQFU\SWHGYHFWRUVRIWKH ILUVWOHYHOFOXVWHUFHQWHUVLQFOXVWHULQGH[‫ܫ‬௖ DVGHVFULEHGLQ(TXDWLRQ,WWKHQFKRRVHVWKHiWKFOXVWHUFHQWHU ‫ܫ‬௖ǡଵǡ௜ ZLWKWKHKLJKHVWVFRUH   )RUHDFKFKLOGFOXVWHUFHQWHUVRIWKHDERYHVHOHFWHGFOXVWHUFHQWHUWKHFORXGVHUYHUFRPSXWHVWKHUHOHYDQFH VFRUH EHWZHHQܶ௪ DQG HYHU\ HQFU\SWHG YHFWRUV RI FKLOG FOXVWHU FHQWHUV DQG ILQDOO\ JHWV WKH FOXVWHU FHQWHU ‫ܫ‬௖ǡଶǡ௜ ZLWKWKHWRSVFRUH 7KHDERYHSURFHGXUHLVLWHUDWHGXQWLOWKHXOWLPDWHFOXVWHUFHQWHU ୡǡଵǡ୧ LQODVWOHYHOOLVDFKLHYHG 5HVXOWVDQGDQDO\VLV 5.1. Search Efficiency 7KHHIILFLHQF\RIWKHV\VWHPZDVWHVWHGZLWKDWZROHYHOFOXVWHULQJPRGHO7KHQXPEHURIRSHUDWLRQQHHGHGIRU WKHHQWLUHVHDUFKSURFHVVFDQEHFRPSXWHGDVGHVFULEHGLQ(TXDWLRQ7RLQFUHDVHWKHVHDUFKHIILFLHQF\WKHV\VWHP XVHVDVWDWLFGLFWLRQDU\RINH\ZRUGVZKLFKGRHVQRWHIIHFWLYHO\FRQWULEXWHWRWKHVHDUFKSURFHVV7KHWHUPVOLNHµIRU¶ µDQG¶ HWF LQ WKH VHDUFK TXHU\ ZLOO EH UHPRYHG DQG D PRGLILHG TXHU\ YHFWRU ZLOO EH FRQVWUXFWHG 7KH VXEVHTXHQW FRPSDULVRQVDUHPDGHRQO\ZLWKWKHPRGLILHGTXHU\YHFWRU/HWxGHQRWHWKHVL]HRIWKHVWDWLFGLFWLRQDU\wGHQRWH WKHQXPEHURITXHU\NH\ZRUGVuGHQRWHWKHQXPEHURINH\ZRUGVLQWKHPRGLILHGTXHU\YHFWRUnGHQRWHWKHWRWDO QXPEHURIGRFXPHQWVLQWKHGRFXPHQWVFROOHFWLRQkGHQRWHWKHQXPEHURIFDWHJRULHVLQWKHILUVWOHYHOFOXVWHUDQGt GHQRWHWKHDYHUDJHQXPEHURIGRFXPHQWVLQWKHVXEVHTXHQWFOXVWHU



ܱ‫ݏ݊݋݅ݐܽݎ݁݌‬ሺܵ݁ܽ‫ݏݏ݁ܿ݋ݎ݌݄ܿݎ‬ሻ ൌ ‫ ݔ כ ݓ‬൅ ሺ‫ ݓ‬െ ‫ݑ‬ሻ݇ ൅ ሺ‫ ݓ‬െ ‫ ݑ‬െ ͳሻ‫ݐ‬

 

7KHQXPEHURIRSHUDWLRQVUHTXLUHGE\DV\VWHPZLWKRXWDQ\FOXVWHULQJWHFKQLTXHLVGHVFULEHGLQ(TXDWLRQ ܱ‫ݏ݊݋݅ݐܽݎ݁݌‬ሺ݁‫݉݁ݐݏݕݏ݃݊݅ݐݏ݅ݔ‬ሻ ൌ ‫ ݔ כ ݓ‬൅ ሺ‫ ݓ‬െ ‫ݑ‬ሻ݊



 

'XULQJ WKH VHDUFK VWHS WKH H[LVWLQJ V\VWHP FRPSDUHV WKH TXHU\ YHFWRU ZLWK WKH HQWLUH GRFXPHQWV FROOHFWLRQ ZKHUHDV WKH SURSRVHG V\VWHP FRPSDUHV LW RQO\ ZLWK WKH UHOHYDQW FOXVWHU OHDGLQJ WR VLJQLILFDQW UHGXFWLRQ LQ VHDUFK WLPH 5.2. Performance analysis 7RWHVWWKHSHUIRUPDQFHRIWKHSURSRVHGV\VWHPDQH[SHULPHQWDOVHWXSZDVEXLOWDVIROORZV$QDSSOLFDWLRQ

316

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

VLPXODWLQJWKHDFWLYLWLHVRIDXQLYHUVLW\ZDVFUHDWHG7KHFORXGVWRUDJHSODWIRUPIRUWKHV\VWHPZDVSURYLGHGE\WKH *RRJOHSXEOLFFORXG7KHGDWDRZQHUVRIWKHV\VWHPDUH  7KHXQLYHUVLW\ZKLFKRZQVWKHPDUNOLVWVDQGFHUWLILFDWHVRIDOOWKHSDVVHGRXWDQGSUHVHQWO\VWXG\LQJ VWXGHQWV  7KHFROOHJHZKLFKXSORDGVWKHVHVVLRQDOPDUNVDQGRWKHUVWXGHQWVSHFLILFGRFXPHQWVRIDOOWKHVWXGHQWV

Searchtime

7KHGDWDVHWIRUWKHSHUIRUPDQFHDQDO\VLVZDVEXLOWIURPWKHDERYHPHQWLRQHGW\SHVRIGRFXPHQWV7KHV\VWHP ZDVWHVWHGZLWKDOLQHDULQFUHDVHLQWKHQXPEHURIGRFXPHQWVDQGWKHFRUUHVSRQGLQJVHDUFKWLPHVZHUHHVWLPDWHG,W LVHYLGHQWIURP)LJWKDWWKHSURSRVHGV\VWHPRXWSHUIRUPVWKHH[LVWLQJV\VWHPZLWKRXWFOXVWHULQJ7KHV\VWHPZDV DOVRWHVWHGZLWKDQH[SRQHQWLDOJURZWKLQWKHQXPEHURIGRFXPHQWV)LJVKRZVWKDWWKHSURSRVHGV\VWHPZLWK FOXVWHULQJKDVDOLQHDUJURZWKLQVHDUFKWLPHZKLOHWKHV\VWHPZLWKRXWFOXVWHULQJKDVDQH[SRQHQWLDOJURZWKLQVHDUFK WLPH 12000 10000 8000 6000 4000 2000 0

without clustering

10 20 30 40 50

with hierarchica lclustering

Numberofdocuments(x100)  )LJ&RPSDULVRQRIVHDUFKWLPHZLWKDOLQHDUJURZWKLQGRFXPHQWVFROOHFWLRQ

Searchtime

20000 15000 10000

without clustering

5000

with clustering

0 148 403 109629808103 Numberofdocuments

 )LJ&RPSDULVRQRIVHDUFKWLPHZLWKDQH[SRQHQWLDOJURZWKLQGRFXPHQWVFROOHFWLRQ

5.3. Security analysis $ GHGLFDWHG PRGXOH FDOOHG FORXG PDQDJHU LV DGGHG WR WKH SURSRVHG V\VWHP WR YHULI\ WKH DXWKHQWLFLW\ RI WKH DUULYLQJ UHTXHVWV 7R HQVXUH WKH FRQILGHQWLDOLW\ DQG SULYDF\ RI WKH GRFXPHQWV VWRUHG LQ WKH FORXG VHUYHU DOO WKH GRFXPHQWV DUH HQFU\SWHG XVLQJ D V\PPHWULF HQFU\SWLRQ DOJRULWKP EHIRUH XSORDGLQJ LW WR WKH FORXG ,Q DGGLWLRQ WR WKDWWKHFORXGVWRUDJHSURYLGHUDOVRSHUIRUPVDWZROHYHOHQFU\SWLRQRQWKHGRFXPHQWVDQGUHWXUQVDSXEOLFNH\WR WKHFORXGPDQDJHU$OOWKHNH\VDUHPDQDJHGE\WKHFORXGPDQDJHUDQGRQO\SHRSOHZLWKVXIILFLHQWDFFHVVULJKWVFDQ

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317

317

GHFU\SW WKH GRFXPHQW &RQVHTXHQWO\ WKH V\VWHP HQVXUHV WKDW HYHQ LI DQ LQWUXGHU DFFHVVHV WKH GRFXPHQW GLUHFWO\ IURPWKHFORXGVHUYHUWKH\FDQQRWJHWWKHSODLQWH[WRIWKHGRFXPHQWV &RQFOXVLRQDQGIXWXUHZRUN 7KHSUREOHPRIVHDUFKLQJDQGVHFXUHO\DFFHVVLQJWKHHQFU\SWHGGDWDLQWKHFORXGLVDQDO\]HG,WLVXQGHUVWRRGWKDW PDLQWDLQLQJWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVUHGXFHWKHVHDUFKWLPHIRUDGRFXPHQW7KHSURSRVHG ZRUN LV EDVHG RQ PXOWL NH\ZRUG UDQNHG VHDUFK RYHU HQFU\SWHG GDWD 7KH XVH RI KLHUDUFKLFDO FOXVWHULQJ PHWKRG WR FOXVWHUWKHGRFXPHQWVSUHVHUYHVWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWV7KHH[SHULPHQWDOUHVXOWVSURYH WKDWWKHSURSRVHGV\VWHPKDVDOLQHDUJURZWKLQWLPHFRPSOH[LW\ZKHQWKHVL]HRIWKHGRFXPHQWVFROOHFWLRQLQFUHDVHV H[SRQHQWLDOO\,WDOVRLPSOHPHQWVDGHGLFDWHGPRGXOHQDPHGFORXGPDQJHUWRHQVXUHWKHSULYDF\RIFORXGGDWDE\ JUDQWLQJRQO\OLPLWHGDFFHVVWRWKHGRFXPHQWVFROOHFWLRQWRGLIIHUHQWFODVVHVRIXVHUV$VIXWXUHZRUNPRUHVHFXUH DOJRULWKPV FDQ EH GHYHORSHG IRU LPSURYLQJ WKH SULYDF\ RI WKH XSORDGHG GRFXPHQWV 0RUH VHFXUH DFFHVV FRQWURO VFKHPHV VXFK DV '\QDPLF ,QIRUPDWLRQ )ORZ 7UDFNLQJ ',)7  WHFKQLTXHV >@ ZLWK FDSDELOLWLHV WR UHFRJQL]H WKH DGYDQFHGYXOQHUDELOLWLHVFDQDOVRERRVWXSWKHRYHUDOOSHUIRUPDQFHRIWKHV\VWHP 5HIHUHQFHV >@ ;LDQ & /X @/L+'DL@&DVK'-DHJHU--DUHFNL6-XWOD&.UDZF]\N+5RVX0& 6WHLQHU0 2FWREHU '\QDPLFVHDUFKDEOHHQFU\SWLRQLQYHU\ ODUJHGDWDEDVHV'DWDVWUXFWXUHVDQGLPSOHPHQWDWLRQ,Q1HWZRUNDQG'LVWULEXWHG6\VWHP6HFXULW\6\PSRVLXP 1'66¶  >@&DR1:DQJ&/L05HQ. /RX:  3ULYDF\SUHVHUYLQJPXOWLNH\ZRUGUDQNHGVHDUFKRYHUHQFU\SWHGFORXGGDWD3DUDOOHO DQG'LVWULEXWHG6\VWHPV,(((7UDQVDFWLRQVRQ   >@6XQ::DQJ%&DR1/L0/RX:+RX@&KHQ&=KX;6KHQ3+X-*XR67DUL= =RPD\D$$Q(IILFLHQW3ULYDF\3UHVHUYLQJ5DQNHG.H\ZRUG6HDUFK0HWKRG >@'DOWRQ0.R]\UDNLV& =HOGRYLFK1 $XJXVW 1HPHVLV3UHYHQWLQJ$XWKHQWLFDWLRQ $FFHVV&RQWURO9XOQHUDELOLWLHVLQ:HE $SSOLFDWLRQV,Q86(1,;6HFXULW\6\PSRVLXP SS