Distributed XML Stream Filtering System with High ... - Semantic Scholar

4 downloads 2737 Views 559KB Size Report
XML data into HTML data by selecting their XSLT file as they like and browse ... the “condition” combobox which includes operators such as. “exist”,= ...
Distributed XML Stream Filtering System with High Scalability Hiroyuki Uchiyama Makoto Onizuka Takashi Honishi NTT Cyber Space Laboratories, NTT Corporation 1-1 Hikari-no-oka, Yokosuka-Shi, Kanagawa, 239-0847 Japan {uchiyama.hiroyuki, onizuka.makoto, honishi.takashi}@lab.ntt.co.jp Abstract We propose a distributed XML stream filtering system that uses a large number of subscribers’ profiles, written in XPath expressions, to filter XML streams and then publish the filtered data in real-time. To realize the proposed system, we define XPath expression features on XML data and utilize them to forecast the servers’ loads. Our method is realized by combining methods to share the total transfer loads of each filtering server and to equalize the sum of overlap size between filtering servers. Experiments show that the rate at which the publishing time increases with the number of XPath expressions is three times smaller in the proposed system than in the round-robin method. Furthermore, the overhead of the proposed method is quite low.

1

as to share the transfer loads among the FSs. First we define the XPE features in XML Streams. Our method uses these features to forecast the transfer load of a new XPE and determine which FS a new XPE should be assigned to equalize FSs’ loads. The results of experiments on real XML data show the effectiveness of the proposed method. Paper Outline The next section introduces the XML stream filtering system and explains the client applications and briefly describe the incremental update method of lazyDFA for XPath insertion/deletion. A single FS performance analysis and a problem definition are given in Sec.3. Section 4 describes transfer load sharing problem and a definition of XPath features. The XPEs distribution method is explained in Sec.5. The experiments and results are described in Sec.6. Section 7 and 8 provide related work and discussion respectively. The conclusion and future work are given in Sec.9.

Introduction

Because of the penetration of the Internet, the Selective data Dissemination of Information (SDI)[7, 9, 23, 2, 3] is being actively targeted for application in many areas such as real estate sales, electronic personalized newspapers/advertisements, and sensor driven services. Some schemas for XML data are designed for these services[6, 21]. An SDI system requires its subscribers to register their profiles. The system uses the profiles to filter the incoming information and then publish the filtered XML streams to the respective subscribers. In these services, it is natural to expect that subscriber numbers will be extremely high. Our preliminary experiments showed that a single filtering server(FS) that hosts the XPath filtering engine[16] can support up to only a few thousand users. Unfortunately, this level of capacity is far too low. Given that the expectation is for user numbers in the order of tens of thousands, we need an efficient way of managing several servers. This paper proposes a novel load sharing method that rapidly assigns the XPath expressions(XPEs)1 registered by subscribers so 1 The

subscribers’ profiles are written in XPE

2 XML Stream Filtering System

In this section, we describe the XML stream filtering system in detail. This paper introduces the XPath stream filter system; it uses the XPath Filter Engine[16], which is an extended variant of DFA called lazyDFA. Figure 1 shows a service schematic. This figure shows that a FS receives incoming XML streams, filters them based on registered XPEs, and transfers the filtered XML streams to the appropriate subscribers. Figure 2 shows a publish/subscribe system architecture. The application server stores subscribers’ XPEs. Upon receiving the XML stream from the publisher’s application, the application server sends it to the XPath filtering engine which filters the desired XML streams as indicated by the registered XPEs. The XPath filtering engine output is returned to the application server via call-back functions. Finally, the application server transfers the filtered XML streams to the subscribers.

Proceedings of the 21st International Conference on Data Engineering (ICDE 2005) 1084-4627/05 $20.00 © 2005 IEEE

-





+









2







3







$

&

(

*

























































.















,























C

&

E A















!





(





K

M

M

N

P

Q

R

S

P

T

U

$

3











J



V

%

'

W

X

Y

W

X

)



+





!





J







: &