A Link-Based Ranking Model for Services

3 downloads 0 Views 326KB Size Report
Google's PageRank, which estimates the importance of a page in the Web graph ... one day, Le Monde is estimated to be highly useful for Google News at ...
A Link-Based Ranking Model for Services Camelia Constantin1 , Bernd Amann1 , and David Gross-Amblard2 1 LIP6, Univ.Paris 6, France [email protected], [email protected] 2 CEDRIC, CNAM, France [email protected]

Abstract. The number of services on the web is growing every day and finding useful and efficient ranking methods for services has become an important issue in modern web applications. In this paper we present a link-based importance model and efficient algorithms for distributed services collaborating through service calls. We adapt the PageRank algorithm and define a service importance that reflects its activity and its contribution to the quality of other services.

1

Introduction

The basic task of a ranking model is to define scores for ordering a set of entities according to some specific criteria. A large number of ranking models and algorithms have been proposed for various kinds of applications and information entities (documents, tuples, services) in different domains like document retrieval, Web search, service discovery and P2P query processing. Content-based ranking models classify entities according to the relevance of their contents or other associated metadata to a given input query. This kind of ranking has proven its efficiency in document retrieval systems and quality-based data and service selection. Link-based ranking exploits any kind of structural, semantic or navigational links between entities for estimating the importance of a service among all other services. The most prominent example for this kind of ranking is certainly Google’s PageRank, which estimates the importance of a page in the Web graph for efficiently ranking Web search results. This article presents a new link-based ranking model and algorithms for distributed collaborating services. We consider service-oriented applications relying on a set of services that collaborate for executing certain tasks. Our notion of service is abstract and should be understood as any kind of node, with some local behavior and data, which calls other nodes by exchanging messages. We do not put any restriction neither on the structure and contents of the exchanged messages nor on the functionality, local data and the implementation of each individual service. This means that our model applies to heterogeneous service infrastructures including, for instance, simple generic file sharing and document printing services, database query interfaces and Web services encapsulating some complex application-specific behavior. R. Meersman, Z. Tari et al. (Eds.): OTM 2006, LNCS 4275, pp. 327–344, 2006. c Springer-Verlag Berlin Heidelberg 2006 

328

C. Constantin, B. Amann, and D. Gross-Amblard

Many kinds of applications fit our definition of a service and could benefit from our importance based ranking approach. For instance, in a distributed P2P search engine, each peer can be defined as a service searching some local documents and calling other peer services. Search results can obviously be ranked according to some content-based criteria like document/query relevance, but in some situations, it also might be useful to rank results according to the importance of their sources. Another example is standard web service discovery and selection. Whereas it is possible to compare and rank web services according to their WSDL and UDDI descriptions, these techniques are generally based on homogeneous and semantically rich service descriptions. We claim that linkbased measures taking into account “collaboration links” between services are an interesting alternative for ranking heterogeneous services in large-scale serviceoriented architectures (SOA). We will use in the sequel an example of a distributed news syndication system, where each node can play the role of a news server (provider), a news consumer (client) or both (portal). Figure 2 shows an example of five collaborating news services s1 to s5 . Services s4 and s5 are provided by the French news agency Agence France Press (AFP) and the U.S. agency Reuters. Services s2 and s3 correspond to two French journals providing daily news. Finally, s1 corresponds to the Google News service. All services collaborate by exchanging news via service calls. This is illustrated in Figure 1 below showing a graph where each edge si →t sj corresponds to a service call of service si to service sj at time instant t. We see for example that AFP has been called by service Le Monde Le Monde (daily)

Google News (hourly)

1

AFP

9

(hourly) 2

S2

S1

S4

1 8

6

2

Reuters 10

S3

(hourly) S5

20 Minutes (daily)

Fig. 1. A service call log

at three different instants 1, 2 and 9. In our model we assume that each such call si →t sj represents a contribution of the called service sj to the calling service si and that this contribution also depends on timestamp t. For example, suppose that Google News is interested in all news published by Le Monde which updates its news every day. Then, the call-dependent contribution of Le Monde for Google News at some given moment τ could be estimated by the age of the last call received by Le Monde from Google News before τ . If this age is less than one day, Le Monde is estimated to be highly useful for Google News at moment τ . On the other hand, if the age of the last call exceeds a certain period, e.g. one month, all results received from Le Monde become useless for Google News.

A Link-Based Ranking Model for Services

329

The contribution of a service to some of its clients not only depends on the timestamps of the received service calls, but also on the way of how the service contributes to the client’s quality compared to other services. If we assume that Google News considers the news obtained from Le Monde of high quality, the contribution of this service might be estimated higher than the contribution of 20 Minutes which provides less relevant news. Our notion of “quality” is abstract and each individual service might use different quality features for estimating the contribution of other services. For example, the French newspaper Le Monde supposes that, in general, the French news agency AFP provides more interesting news for its readers than the U.S. agency Reuters. On the other hand, 20 Minutes tries to reduce cost and therefore prefers Google News which is free of charge to AFP, which is not. This “quality contribution” is illustrated in the graph of Figure 2 where each edge si →c sj is labeled by the value c ∈ [0, 1] representing the contribution of service sj to the quality of its client si relative to the other services used by si . Le Monde Google News

AFP 0.6

0.8

S2

S1

S4 0.4

0.2 0.7

0.3

Reuters

S3 S5

20 Minutes

Fig. 2. Quality contribution graph

We are not interested in the way each service defines and acquires this knowledge about the relative quality contribution of other services. Whereas, for the sake of simplicity, this information is considered to be public in our model, Section 4 shows that public access is not needed for computing service importance. The main goal of this article is to define a formal model and algorithms for ranking collaborating services according to their contribution to other services. The basic idea is to define and apply a link-based importance model considering service calls and quality contribution scores for estimating their importance among all other services. For example, if we consider only the quality contribution graph in Figure 2, it is easy to argue that AFP is more important than Reuters since it globally contributes more to the other services. This argument is enforced by Figure 1 which shows that Reuters has never been called by any other service. However, if we try to compare Le Monde and 20 Minutes in the same way, we observe that both criteria (contribution and usage) are independent of each other. For example, Le Monde is obviously more important than 20 Minutes when considering only its contribution to the quality of Google News but, as it is shown in Figure 1, it has been called a long time ago, which decreases its contribution to Google News compared to the one of 20 Minutes.

330

C. Constantin, B. Amann, and D. Gross-Amblard

The rest of the article is organized as follows. The next section presents related work. The formal model is defined in Section 3, followed by the presentation of two scalable distributed algorithms for computing service importance. These algorithms have been implemented and evaluated by simulation. The obtained results are described in Section 5. Section 6 shortly presents ongoing implementation and future work.

2

Related Work

Ranking services has been recognized as an important issue in the context of Web service discovery and dynamic service selection. Multiple criteria can be used for ranking services. For example, [7] propose to use advanced sampling techniques for comparing and ranking data-intensive Web services according to their local data. Recommendation based techniques exploit user feedback [17,12] and voting services [11] for dynamic service selection. Other ranking criteria are based on the conformance of QoS features during given time periods [21,12]. All these parameters can be combined using different weights according to specific user requirements [21,17]. In this context, our method can be seen as a recommendation-based approach for ranking services where each service call corresponds to an implicit vote taking into account service quality and observed usage during some time period. Link-based ranking methods like PageRank [18] and HITS [14] have been developed and applied with success for ranking Web search results. The basic idea of this family of ranking models and algorithms is to consider that each page p propagates a fraction of its importance to all other pages p it references. Most of the existing algorithms compute the importance of a page by exploiting the Web’s hyperlink structure. OPIC [1] avoids the construction of the Web matrix by an adaptive on-line algorithm for estimating page importance dynamically during the Web crawling process. Our ranking method exploits temporal information concerning the usage of a service for calculating a time-dependent importance score. More recent linkbased approaches also consider time-dependent importance measures. For example, [4] observes that link-based importance penalizes new pages on the Web and proposes to decrease the PageRank of older pages. The same kind of argument is used by [8] that computes the importance of a page based on PageRank and its time derivative. [10] exploits temporal information for ranking news and their sources. A recent article is more important than an old one, and a source that produced recent important news is more important than other sources. Importance based models and algorithms also have been applied in the context of P2P infrastructures. For example, [22] defines and computes importance scores for pages distributed in a P2P network. The importance of a page is defined as a combination of its local importance inside its peer and the global importance of its peer among the other peers. A similar approach is presented by [19], where each peer refines the score of its local pages by periodically meeting with other peers. A synchronous algorithm for reputation management in P2P systems is described in [13]. Each peer computes global trust value based

A Link-Based Ranking Model for Services

331

on the trust values of other peers and their local trust on it. [20] proposes a distributed asynchronous version of PageRank based on chaotic iterations. A totally asynchronous computation is presented by [15]. Our asynchronous iterative algorithm for computing the importance of collaborating services is inspired by [20,15] and considers the modification proposed by [5] for the termination of the asynchronous computation.

3

A Link-Based Service Importance Model

A service-oriented system is a set of services exchanging messages. We suppose that all messages are time-stamped by a synchronized continuous clock producing an infinite set of clock values T 1 . For the sake of simplicity, we do not distinguish between different end-users and external applications and we assume that they are all represented by a single service in the system. We define a logging function that associates to each pair of services (si , sj ) the time-stamps of all messages sent by si to sj . More formally: Definition 1 (logging function). Let S be a finite set of identified services exchanging messages. A logging function Λ : T × S × S → 2T returns for each instant t ∈ T and pair of distinct services (si , sj ), the set of time-stamps Λ(t, si , sj ) = {ti |ti ≤ t} ⊆ T of all messages sent by si to sj before t. Logging function Λ registers the collaboration between services as a set of exchanged messages. Note that Λ ignores local messages. At a higher level of abstraction it is possible to encode different kinds of message exchange patterns for composing messages into service calls. For example, in a request-response pattern (protocol) the service sending the first message (request) is considered to be the client of the receiver service whereas in a solicit-response protocol the sender of the first message (solicit) will be considered as the server of the second one. In the following, we will assume that logging function Λ only registers the timestamps t of request-response service calls si →t sj where t corresponds to the timestamp of the request message sent from si to sj . Definition 2 (service call graph). Function Λ observes the activity between distinct services and generates a directed service call graph SC(t) = (S, C, Λ) where (i) S is the set of vertices (ii) C is the set of edges, such that (si → sj ) ∈ C iff Λ(t, si , sj ) = ∅. A service that is called contributes, directly or indirectly, to other services in the system. We denote by Out(t, si ) = {sj | sj ∈ S ∧ Λ(t, si , sj ) = ∅} the set of services called by si until time instant t. Similarly, In(t, sj ) = {si | si ∈ S ∧ Λ(t, si , sj ) = ∅} is the set of services that called sj before t. Then, a service sj can contribute at some instant t directly to all services si ∈ In(t, sj ) and transitively to all services sk to which si contributes. As shown in the introduction, we claim that we should distinguish between an intrinsic contribution of a service sj to the quality a service si with respect 1

Our model does not require exact synchronization between local clocks.

332

C. Constantin, B. Amann, and D. Gross-Amblard

to all other services and a usage-based contribution that represents the way in which si actually uses sj (independently of all other services). 3.1

Service Quality Contribution Scores

We assume that each service defines some measure for estimating its quality according to certain application-specific criteria. The key point behind these measures is that the quality of a service also depends on the quality of the services that it calls. For example, in a P2P search engine, the quality of a service might be defined by the average number of relevant documents returned to its clients. Other quality criteria might be the average freshness of its results (in the case of data replication with update), the average response time or even some simple business criteria based on the price of each service call. We define the contribution score of a service sj to the quality of the service si by a function Υ : Definition 3 (local quality contribution). Let S be a set of services. The contribution function Υ : S × S → [0, 1] defines for each pair of distinct services (si , sj ) a local contribution score Υ (si , sj ) = πji of service sj to the quality of  service si such that sj ∈Out(si ) πji = 1. Function Υ (si , sj ) does not return a quantitative value but a score πji for comparing the contribution of all services sj ∈ Out(t, si ) to the quality of service si . We also suppose that these scores are static and statistically independent of a particular call between si and sj 2 . Each service si defines the local contribution πki of all services sk that it uses independently on the services sj that are used by sk . Nonetheless the contribution of sj to the quality of si through sk can be estimated as the part of πki which is due to sj , i.e πki ∗ πjk . The quality contribution graph SC(t, Υ ) is obtained by adding to each edge si → sj in service call graph SC(t) a label Υ (si , sj ) = πji (see for example the graph in Figure 2). Any path p = si →πki sk → . . . sl →πjl sj in this graph then represents a contribution of service sj to service si with a contribution score πp = πki ∗ ... ∗ πjl . We define the global contribution of some service sj to the quality of any service si ∈ S as follows : Definition 4 (global quality contribution). We denote by Pij the set of all possible paths from si to sj in the contribution graph SC(t, Υ ). The global ∗ of service sj to the quality of si is the sum of the contribution contribution πji  ∗ of sj to si on all the possible paths p from si to sj in SC(t, Υ ) : πji = p∈Pij πp . Example 1. In Figure 2, Le Monde locally contributes to the quality of Google News with a score of 0.8. Since Le Monde calls AFP, AFP indirectly contributes to the quality of Google News with score 0.8 ∗ 0.6 = 0.48. Observe that the contribution graph might contain cycles which lead to an infinite number of 2

This restriction can be relaxed but it simplifies the presentation and the understanding of our importance model.

A Link-Based Ranking Model for Services

333

contribution paths. For example, there exists an infinite number of contribution paths from Google News to AFP that pass through 20 Minutes. The set of paths can be reduced to a finite one by eliminating the ones with πp < ε for a given ε. The contribution of AFP to the quality of Google News can then be computed as: (0.8 ∗ 0.6 + 0.2 ∗ 0.3) ∗ Σi≥0 (0.7 ∗ 0.2)i = 0.63. 3.2

Service Usage

The calls registered in Λ(t, si , sj ) represent the effective usage of sj by si at some given instant t (independently of the other services in the system). Obviously, a service can effectively contribute to the quality of other services only if it is called. In addition, the number and timestamps of the incoming calls influence its local and global contribution to other services. For each service couple (si , sj ) we introduce a service usage score that expresses the way in which sj is used by si , depending on the calls received from si : Definition 5 (local service usage). The service usage function Ud : T × S × S → [0, 1] returns for each time instant t and pair of services (si , sj ) a local service usage score Ud (t, si , sj ) = uji (t) of service sj by service si obtained by calls in Λ(t, si , sj ). The definition and implementation of the usage function depends on the application semantics and any function aggregating service call time-stamps in Λ(t, si , sj ) with an appropriate semantics could be chosen. Each service sj can choose a specific function for each client si ∈ In(t, sj ), or apply the same function to all of its clients. In the following examples we assume wlg. that the usage score uji (t) decreases with the age of the last call received from client si where the decrease factor is the same for all clients of sj . For instance, if sj is a data provider which updates its data very frequently, it might choose a usage function which decreases very rapidly for all incoming service calls in order to take into account that it should be called frequently for obtaining maximal usage. Example 2. Service s1 (Google News) is interested by all news produced by service s2 (Le Monde). Since Le Monde updates its news every day, Google News has to call it daily. In this case the service usage of s2 by s1 is maximal, i.e u21 (t) = 1 for any t. If the only call s1 →1 s2 from Google News to Le Monde has happened several days before instant 10, the usage score of Google News for Le Monde has decreased and might even be 0, reflecting the fact that no news received by Le Monde are still useful. Service usage does not take into account possible logical or temporal relationships between incoming and outgoing calls of a service si . We introduce the notion of call usage that describes this kind of relationship between service calls. Call usage scores are obtained by a function combining the information on the incoming service calls in Λ(t, si , sk ) with the information on the outgoing service calls in Λ(t, sk , sj ) of the same service sk . More formally:

334

C. Constantin, B. Amann, and D. Gross-Amblard

Definition 6 (call usage). Let S be a set of services observed by a logging function Λ. Call usage function Ui : T × S × S × S → [0, 1] returns for each instant t and triple of services (si , sk , sj ) the call usage score Ui (t, si , sk , sj ) =  ujki (t) of outgoing service calls sk →t sj ∈ Λ(t, sk , sj ) for incoming service calls  si →t sk ∈ Λ(t, si , sk ). Score ujki (t) defines the degree in which calls received by sj from sk contribute to calls received by sk from si . Similarly to local service usage scores, the way in which incoming and outgoing service calls are compared depends on the application and the service implementation. For example a service si whose calls to some service sk regularly trigger calls from sk to some other service sj leads to a high call usage score ujki (t). Example 3. If we consider that all calls received by Le Monde only can exploit news generated before, the call usage of the calls to AFP for the calls from Google News via Le Monde at instant 1 is equal to 0 (Le Monde did not call AFP before instant 1). We use the definition of call usage to generalize the notion of service usage by defining the global service usage score of a service sj by any service si ∈ S. When a service si calls service sk that calls sj , we can compute the service usage of sj for service si by taking into consideration the service usage ujk (t) of sj by sk combined with the call usage ujki (t) between calls in Λ(t, si , sk ) and calls in Λ(t, sk , sj ). Consequently, the service usage of sj by service si through sk is estimated as ujki (t) ∗ ujk (t). More generally, for a call path p = si → sl → sm → . . . → sn → sk → sj in SC(t) the global usage score of sj for si on this path is defined by the product up (t) = umli (t) ∗ . . . ∗ ujkn (t) ∗ ujk (t). Definition 7 (global service usage). Let Pij be the set of all possible paths from si to sj in the service call graph SC(t). The global service usage score u∗ji (t) of service sj by service si at instant t is the sum of the service usage scores of sj for si on paths in Pij : u∗ji (t) = p∈Pij up (t). 3.3

Effective Contribution and Importance

Our objective is to compare and rank services with respect to their activity observed by logging function Λ combined with the information about how each service contributes to the quality of other services in a service-oriented system. The importance of sj is related to its effective contribution to all services si ∈ S, which is computed by combining contribution and usage scores on all call paths in SC(t). The main idea is to weight contribution scores on each path from si to sj with the corresponding usage score, an important service being one that highly contributes by its usage to the quality of other services. Definition 8 (local effective contribution). The local effective contribution π ˜ji (t) of service sj to service si ∈ In(t, sj ) at instant t is the product of the local quality contribution and the local usage score of sj by si : π ˜ji (t) = πji ∗ uji (t).

A Link-Based Ranking Model for Services

335

As for quality contributions and for usage scores, we can extend the notion of local effective contribution to services connected by service-call paths. For a given service-call path p = si → sl → sm → . . . → sn → sk → sj in ˜p (t), is SC(t) the effective contribution of sj to si on this path, denoted by π computed by multiplying the quality contribution score with the service usage sore of sj for si on this path : usage contribution

quality contribution       π ˜p (t) = πp ∗ up (t) = πli ∗ πml . . . πkn ∗ πjk ∗ umli (t) ∗ . . . ∗ ujkn (t) ∗ ujk (t) (1)

Definition 9 (global effective contribution). Let Pij be the set of all possible paths from si to sj in the service call graph SC(t). The global effective ∗ contribution score π ˜ji (t) of sj to si is obtained as the sum of the effective contribution on all possible paths p ∈ Pij : ∗ (t) = π ˜ji



π ˜p (t)

(2)

p∈Pij

A service is then important if it has a great effective contribution to all other services in the system. Definition 10 (service importance). The importance of a service sj is defined as the sum of its global effective contributions to all services si ∈ S: Ij (t) =



∗ π ˜ji (t)

(3)

si ∈S

Example 4. Figures 2 and 1 show that service s4 =AFP contributes to service s2 =Le Monde with score π42 = 0.6 and has been called by Le Monde at instant 9. If service usage u42 (t) at instant t = 10 is estimated by the age of the last call and the time elapsed between instant 10 and 9 is only several hours, the effective contribution of AF P to Le Monde is high (close to the quality contribution score). On the other hand, even if Le Monde highly contributes to the quality of Google News, it has been called by Google News several days before instant 10, which leads to a low effective contribution. By a similar kind of reasoning we can see that even if the quality contribution π42 ∗ π21 of AFP to Google News through Le Monde is high, the service contribution of AFP to Google News via Le Monde is low. This is due to the low call contribution score of the calls to AFP for the calls from Google News (when the call at a given instant exploits news generated before this instant, as described in example 3). On the other hand, the quality contribution π43 ∗ π31 of AFP to Google News via 20 Minutes is enforced by a high service usage score u431 (t) ∗ u43 (t).

4

Computing Importance

This section presents two algorithms for computing the importance of services at time τ in a service-oriented system S. The fundamental idea is to exploit the

336

C. Constantin, B. Amann, and D. Gross-Amblard

existing service-oriented architecture for distributing the computation on the different service nodes. The presented algorithms can efficiently be deployed on large-scale service infrastructures since each service computes its importance by exchanging messages only with its already known neighbor services in the service call graph. The computed importance values could then be registered along with other information on the services in an UDDI registry. In both algorithms, each service sk computes its own importance Ik (τ ) at τ by exchanging messages with neighbor services in In(τ, sk ) and Out(τ, sk ). Computation is iterative until convergence : each service sk starts its computation with an initial importance value Ik0 (τ ) and computes a new importance approximation based on importance values νki (τ ) received from its neighbors si ∈ In(τ, sk ). It also sends importance updates νjk (τ ) to services sj ∈ Out(τ, sk ) and stops its computation when the relative error between the newly computed importance and the previous one is lower than a given (sufficiently small) ε. The importance value νjk (τ ) received by sj from each sk ∈ In(τ, sj ) expresses (i) the quality contribution πjk of service sj to service sk and, recursively, (ii) the quality contribution and call usage of service sj for other services si via sk . More formally, the received importance νjk (τ ) received by sj from some client sk is computed using the importance values νki (τ ) received by sk from its clients si :  νjk (τ ) = πjk + νki (τ ) ∗ ujki (τ ) ∗ πjk (4) si ∈In(τ,sk )

Observe that by definition νjk (τ ) = πjk if no service call from service sk to service sj has been useful to any service call received by service sk . The importance Ij (τ ) of a service sj at some moment τ is the sum of the received importance values νjk (τ ) weighted by the direct utility scores ujk (τ ) :  Ij (τ ) = νjk (τ ) ∗ ujk (τ ) (5) sk ∈In(τ,sj )

We suppose that sj knows its own service usage ujk (τ ) at the beginning of the algorithm. The proof that equation 5 computes the same importance values as defined by equation 3 is given in [9]. We propose in the following section two distributed algorithms which are different by the protocol used for exchanging importance values between services. In the first algorithm services synchronize their computation by sending new importance values only after having received importance values from all of its clients. In the second algorithm importance messages are not synchronized. Both algorithms are evaluated and compared in Section 5. 4.1

Synchronous Distributed Computation

In this algorithm each service sk first collects importance values νki (τ ) from all services si ∈ In(τ, sk ), recomputes its importance and sends it to services sj ∈ Out(τ, sk ). Variables Ok and Nk are vectors where oki and nki are old and new importance values received by sk from si ∈ In(τ, sk ) during the computation.

A Link-Based Ranking Model for Services

337

All services choose a common time τ and start the computation with an initial importance value Ik0 equal to 0. Then each service sk ∈ S executes the following algorithm: ComputSync(τ ) Input: a time τ , ρ = ∅ Output: importance of the service sk at time τ begin Ok = Nk = 0 do for each sj ∈ Out(τ, sk ) do  // forward importance values νjk (τ ) = λ ∗ πjk + λ ∗ si ∈In(sk ) nki ∗ ujki (τ ) ∗ πjk send νjk (τ ) to sj endfor T mp = In(τ, sk ) //services that did not send their importance while T mp = ∅ do // compute new importance nki = νki (τ ) //wait for next importance message T mp = T mp \ {si } endwhile if |Nk − Ok |/Ok ≥ ε then ρ = ρ \ {si } else ρ = ρ ∪ {si } // local convergence endif  Ik (τ ) = si ∈In(sk ) nki ∗ uki (τ ) //compute service importance Ok = Nk while ρ = S // stop when all the services have converged end

The above algorithm implements the Jacobi iterations which compute the received importance as the solution of the linear system presented in [9]. Importance values are multiplied by a constant value λ ∈ (0, 1) (similarly to [18]) which guarantees that the solution of the system exists and is unique and that the Jacobi iterations converge to the solution of the system independently on the initial importance values Ik0 (see [9] for a formal proof). The received importance of service sj converges locally when |Nk − Ok |/Ok < ε for a given ε. Local convergence does not necessarily imply global convergence, some services converge faster than others. ρ denotes the set of services that have converged after each step of the algorithm. Computation stops when all services sk ∈ S have converged locally, i.e, when ρ = S. 4.2

Asynchronous Distributed Computation

The algorithm presented in this section avoids additional computation messages and the delays due to synchronization by embedding importance values into application-specific service calls at instants t > τ . The only “synchronization” between services consists in choosing a time τ for starting the computation.

338

C. Constantin, B. Amann, and D. Gross-Amblard

Similarly to the previous algorithm, we use a vector Nk whose elements are initialized to 0 at the beginning of the algorithm. The algorithm is defined as follows : ComputAsync(τ ) : At each service call sk →t sj , where t > τ : – Sender sk  1. computes importance value νjk (τ ) = λ∗(πjk + si ∈In(τ,sk ) nki ∗ujki (τ )∗ πjk ) 2. embeds νjk (τ ) into the service call sk →t sj (e.g. as an additional element of a SOAP message) 3. calls service sj – Receiver sj 1. memorizes the received importance  value in njk and 2. updates its importance Ij (τ ) = sk ∈In(τ,sj ) njk ∗ ujk (τ ).

Each service sk recomputes its local importance at each incoming service call without waiting for all services si ∈ In(τ, sk ) to send their importance. Service sk communicates its new importance values νjk (τ ) to its clients sj ∈ Out(τ, sk ) only when it issues a new call to sj . The result is an asynchronous protocol where each services computes its importance at its own pace, based on possibly outdated importance values. Note that some services might update their importance or might communicate more frequently than others. The above algorithm implements the totally asynchronous iterations that compute the solution of the linear system presented in [9]. We suppose that our system fulfils the total asynchronism assumption [5], i.e., all importance values are updated infinitely often and old values are potentially purged from the system. Each service sk has to be eventually informed on the importance updates from all services si ∈ In(τ, sk ). Then the totally asynchronous iterations converge to the solution of the linear system (see [9]). For the algorithm to terminate, conforming to [5], the importance νjk (τ ) is sent to sj only if it differs with more than ε from the last importance sent by sk to sj . When the computation converges, no update are sent to sj anymore. The algorithm termination is detected when i) no message is in transit and ii) any recomputation of Ik (τ ) does not change its value (conforming to [5]). The above algorithms compute the importance of each service sj at time instant τ iteratively based on equations 4 and 5. In [9] we show that the importance obtained when the algorithms converge corresponds to the sum of the global effective contribution of sj to all services si ∈ S (equation 3). It also is easy to show that each services consumes limited memory of linear size depending on the number of clients (for storing old and new importance values). Global complexity (in terms of the maximal and average number of iterations) is evaluated in the following section.

5

Experimental Evaluation

We implemented both algorithms in Java (JDK 1.5) on a AMD Turion 64 laptop (1.6GHz, 2Gb RAM) under SUSE 10.0 Linux. In the following experiments we

A Link-Based Ranking Model for Services

339

considered a network of 1000 collaborating services which were simulated in form of Java threads. We generated different network configurations (described later) defined by the way each service si chooses its neighbor services sj ∈ Out(τ, si ) that contribute to its quality at some time instant τ . Quality contribution is distributed uniformly for each service in Out(τ, si ) : πji = 1/|Out(τ, si )|. For modeling service and call utility we assign to each edge from si to sj a random value δji ∈ [0, 10] representing the age of the last call from si to sj with respect to the instant τ . We consider that all service use the same utility functions uji (τ ) = 1 − α ∗ δji (service utility) and ujki = 1 − α ∗ |δki − δjk |. Utility factor α is used to control the influence of the utility function on the importance computation (α = 0 means that all service contribution links are taken into consideration during computation ignoring the age of the last service calls). Each service si starts its computation with an importance value of 0, and stops its computing after local convergence (when |Ni − Oi |/Oi < ε). In the synchronous algorithm described in section 4, some services might converge faster than others and we consider that a service si performs an iteration when it receives importance updates from all services in In(τ, si ) that did not converge yet (this is different from the presented algorithm, where each service waits for updates from all services in In(τ, si )). For asynchronous computation we suppose that a service si performs an iteration when it makes a number of importance updates which is equal to the number of its neighbors in In(τ, si ). In the following, unless specified otherwise, all experiments are run with threshold ε = 10−4 , utility factor α = 0 and λ = 0.85. 5.1

Service Graph Generation

In the following we will call the services in Out(τ, si ) the neighbors of si . The neighbors of a service are chosen by the following four strategies, leading to service importance graphs with different topologies: Max-graph [MAX]. This graph is similar to the Web graph model proposed by [2]. Each service chooses as neighbors with probability 0.75 five “popular” services, i.e. services which already have been chosen by many other services. Linear-copying graph [LC]. Each service randomly selects a “prototype” service p among all existing services. It then chooses five neighbors among all services where each such neighbor is with probability 0.75 a neighbor of p. The obtained graph is similar with the Web graph model proposed by [16]. Small-world network [SW]. This configuration simulates a small-world network by creating 20 communities composed of 50 services. Each service in such community randomly connects to 5 neighbors in the same community and each community interacts on average with 5 services of other, randomly drawn, communities. Client-server configuration [CS]. This configuration combines the above three strategies to model a client-server setting with 80 client communities calling services of a single server community (SW). Each client community contains 10

340

C. Constantin, B. Amann, and D. Gross-Amblard

services with a “prototype” service connected to some randomly chosen server services. The other 9 services in the same community are connected randomly to 1 service in the same community and to at most to 5 server services, each server being with probability 0.75 a neighbor of the community’s prototype (LC strategy). The server community is MAX graph composed of 200 server services connected in average to 5 other server services. 5.2

Experimental Results

Figure 3 shows, for all four graph configurations, the average number of computation messages generated per service until convergence using global synchronization, local synchronization (do not wait for services that already have converged) and no synchronization. We see that global synchronization generates the highest number of messages since services that already have locally converged at some iteration step i still keep sending messages until all services have converged. With local synchronization, services which have converged stop sending messages. Using the asynchronous algorithm services converge faster than with the locally synchronous algorithm(as shown by figure 5). The reason for this gain is that a service does not systematically send freshly computed importance values to all its neighbors, as in the synchronous algorithm, but “optimizes” communication by sending new importance values only once in a while. We see that SW and LC generate more messages than the other configurations. Services in SW are randomly connected, which might lead to many cycles and long contribution paths for many services. The same argument holds for prototype services in LC which are chosen randomly. On the contrary, contribution paths in M AX are rather short, as all services link to the popular services. There are many services which are not neighbors of other services and that converge very fast. In CS there are many independent small communities with few connections. Figure 4 shows the influence of the threshold value ε used for convergence on the number of iterations. As expected, lower values for ε lead to a higher number of iterations, since we should take into consideration longer paths (with lower contribution) to achieve convergence. Nonetheless Figure 4 illustrates that the

Fig. 3. Number of messages for the different models

Fig. 4. Number of iterations for convergence w.r.t. ε

A Link-Based Ranking Model for Services

(a) synchronous algorithm

341

(b) asynchronous algorithm

Fig. 5. Ratio of converged nodes w.r.t. number of iterations

growth in the number of iterations is logarithmic independently on the graph model. Similar results were reported by [6] on the convergence of PageRank. Figures 5(a) and (b) show the ratio of the importance values that reached their final ranking with respect to the number of iterations. With the synchronous algorithm (figure 5(a)) for LC (resp. SW ) only 10% (resp. 1%) of the services have converged after 30 iterations. Global convergence is achieved after about 50 iterations. This can be explained by the high connectivity of the small-world communities that leads to a large number of possible paths to be explored. On the contrary, the services in CS converge more quickly (after 10 iterations almost all the services have converged) since client services are grouped in many small communities. By comparing figure 5(a) with figure 5(b) we first note that removing synchronization allows services to converge faster. The reason is that a service sends in each importance message more information on the contribution paths than in the synchronous algorithms, the convergence being then accelerated. In other words, with the asynchronous algorithm services “learn” more quickly all their contribution paths. We also performed several experiments to illustrate the impact of the utility function on the importance computation. In Figure 6 we see that the number of iterations decreases when the value of α increases, independently of the graph model. This was expected since smaller utility values lead to lower connectivity along with smaller paths since the quality contribution of a service on a path is reduced by the utility factor α. In Figure 7 we study the influence of the utility factor α on the ranking of the services for three graph configurations. We consider as reference the ranking obtained for α = 0.0 (ranking obtained by considering only service contribution links without service utility). For different values of α we compute the fraction of the services that are still in the top 10 and top 50 services. We see that service utility strongly impacts the obtained ranking. For example, in the SW configuration and for α = 0.2 only 20% of the most important contributing services still belong to the top 10 (resp. top 50) services. Finally Table 1 illustrates the influence of a service on the importance of other services. After an initial importance computation, we removed a randomly chosen node and recomputed the importance values of all services. The table shows

342

C. Constantin, B. Amann, and D. Gross-Amblard

α 0.0 0.04 0.08 0.12 0.16 0.2 Fig. 6. Number of iteration for different utility functions

10 10 9 8 7 6 4

LC 50 50 46 45 42 39 25

SW 10 50 10 50 7 39 6 30 4 26 2 17 2 10

10 10 10 10 10 8 4

CS 50 50 47 44 42 38 27

Fig. 7. Influence of the utility on the result set

Table 1. Cost of the recomputing when 1 peer leaves

LC SW CS

nodes 938 990 896

without utility utility: α = 0.1 sync async sync async path mess nodes path mess nodes path mess nodes path 18 16682 887 99 7670 938 14 12694 764 61 32 12577 450 210 16325 995 16 2223 256 85 200 230 101 31 648 898 126 200 94 20

mess 2979 2854 245

the number of services involved in the importance recomputation, the number of recomputation messages that are exchanged and the length of the maximum path that is taken into consideration. We see that with the locally synchronous algorithm almost all services are involved in the importance computation, whereas with the asynchronous one only a part of services in the neighborhood of the removed node recompute their importance. For instance with α = 0.1 and CS graph, only 94 nodes participate to the computation with the asynchronous algorithm. Note also that the path lengths and the number of exchanged messaged are smaller with a utility of 0.1. The difference in the path lengths and in the number of exchanged messages between the synchronous and the asynchronous algorithm seems to be dependent on the graph topology. For example, with the LC graph the number of messages of the synchronous algorithm are greater than the ones for the synchronous one for both values of α, whereas for SW the contrary is true.

6

Conclusion

This article presents a general framework for ranking services with respect to their global contribution to other services. Whereas the basic approach has been

A Link-Based Ranking Model for Services

343

illustrated in the context of a service-based news syndication system, we believe that the proposed model and algorithms are useful for many other applications like web service discovery and selection [7,21], service-based P2P data warehousing [3] and XML data integration. We are currently evaluating our model in the context of data-centric web services where each service is defined as a parameterized views on a local data repository containing the results of calls to other services [3]. The basic idea is to redefine service usage and quality contribution in terms of service call expiration, data validity and queries. Finally, we are also aware that there are many open security issues related to our importance model and algorithms. In particular, we do not tackle the problem of services which try to cheat by increasing their importance artificially or by sending fake importance values. This phenomenon is also well-known in the context of traditional web search engines and P2P systems and we believe that existing solutions for securing the computation (like in [13]) could be adapted to our algorithms. This is also part of our future work.

References 1. S. Abiteboul, M. Preda, and G. Cobena. Adaptive On-Line Page Importance Computation. In Proc. Intl. World Wide Web Conference (WWW), pages 280– 290, 2003. 2. R. Albert, H. Jeong, and A.-L. Barab´ asi. The Diameter of the World Wide Web. Science, 286:509–512, 1999. 3. The Active XML Project. http://activexml.net. 4. R. A. Baeza-Yates, C. Castillo, and F. Saint-Jean. Web Dynamics, Structure, and Page Quality. In Web Dynamics, pages 93–112. 2004. 5. D. P. Bertsekas and J. N. Tsitsiklis. Some aspects of parallel and distributed iterative algorithms-a survey. Automatica, 27(1):3–21, 1991. 6. M. Bianchini, M. Gori, and F. Scarselli. Inside PageRank. ACM Transactions on Internet Technology (TOIT), 5(1):92–128, 2005. 7. J. Caverlee, L. Liu, and D. Rocco. Discovering and ranking web services with BASIL: a personalized approach with biased focus. In Proc. Intl. Conf. on ServiceOriented Computing (ICSOC), pages 153–162, 2004. 8. J. Cho, S. Roy, and R. Adams. Page Quality: In Search of an Unbiased Web Ranking. In Proc. ACM Symp. on the Management of Data (SIGMOD), 2005. 9. C. Constantin, B. Amann, and D. Gross-Amblard. A Link-based Ranking Model for Services (long version), 2006. http://www-poleia.lip6.fr/˜ amann/coopis long.pdf. 10. G. M. D. Corso, A. Gulli, and F. Romani. Ranking a Stream of News. In Proc. Intl. World Wide Web Conference (WWW), pages 97–106, 2005. 11. F. Emek¸ci, O. D. Sahin, D. Agrawal, and A. E. Abbadi. A Peer-to-Peer Framework for Web Service Discovery with Ranking. In Proc. Intl. Conf. on Web Services (ICWS), pages 192–199, 2004. 12. S. Kalepu, S. Krishnaswamy, and S. W. Loke. Reputation = f(User Ranking, Compliance, Verity). In Proc. Intl. Conf. on Web Services (ICWS), pages 200– 207, 2004. 13. S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. In Proc. Intl. World Wide Web Conference (WWW), pages 640–651, 2003.

344

C. Constantin, B. Amann, and D. Gross-Amblard

14. J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. J. ACM, 46(5):604–632, 1999. 15. G. Kollias, E. Gallopoulos, and D. B. Szyld. Asynchronous Iterative Computations with Web Information Retrieval Structures: the PageRank Case. In Proc. Intl. Conf. on Parallel Computing (PARCO), 2005. 16. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Random Graph Models for the Web Graph. In Proc. Intl. Symp. on Foundations of Computer Science (FOCS), pages 57–65, 2000. 17. Y. Liu, A. H. H. Ngu, and L. Zeng. QoS computation and policing in dynamic web service selection. In Proc. Intl. World Wide Web Conference (WWW), pages 66–73, 2004. 18. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998. 19. J. X. Parreira and G. Weikum. JXP: Global Authority Scores in a P2P Network. In Proc. Intl. Workshop on the Web and Databases (WebDB), pages 31–36, 2005. 20. K. Sankaralingam, S. Sethumadhavan, and J. C. Browne. Distributed Pagerank for P2P Systems. In Proc. Intl. Symp. on High Performance Distributed Computing (HPDC), pages 58–69, 2003. 21. L.-H. Vu, M. Hauswirth, and K. Aberer. QoS-Based Service Selection and Ranking with Trust and Reputation Management. In Proc. Intl. Conf. on Cooperative Information Systems (CoopIS), pages 466–483, 2005. 22. Y. Wang and D. J. DeWitt. Computing PageRank in a Distributed Internet Search Engine System. In Proc. Intl. Conf. on Very Large Data Bases (VLDB), pages 420–431, 2004.