A Retrieval Technique for Software Components Using ... - Springer Link

0 downloads 0 Views 385KB Size Report
dv(riq ,ris)+dv(raq,ras)+dr(ciq,cis)+dr(caq,cas). 4. When the instance set similarity between instance sets of the method be- havior is calculated at DRSB, dmb is ...
A Retrieval Technique for Software Components Using Directed Replaceability Similarity Hironori Washizaki and Yoshiaki Fukazawa Department of Information and Computer Science, Waseda University 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan {washi, fukazawa}@fuka.info.waseda.ac.jp Abstract. A mechanism of retrieving software components is indispensable for component-based software development. However, conventional retrieval techniques require an additional description, and cannot evaluate the total characteristics of a component. In this paper, we propose a new similarity metric, “directed replaceability similarity” (DRS), which represents how two components differ in terms of structure, behavior, and granularity. We developed a retrieval system that automatically measures DRS between a user’s prototype component and components stored in a repository, without any source codes or additional information. As a result of evaluation experiments, it is found that the retrieval performance of our system is higher than those of conventional techniques.

1

Introduction

Recently, software component technology, which is based on building software systems from reusable components, has attracted attention because it is capable of reducing developmental costs. In a narrow sense, a software component is defined as a unit of composition, and can be independently exchanged in the form of an object code without source codes. The internal structure of the component is not available to the public. Since it is natural to model and implement components in an object-oriented paradigm/language[1], we limit this study to the use of OO language for the implementation of components. The reuse of components over the Internet is emerging, but a technique for retrieving a component that satisfies a given requirement has not yet been established[2]. Important characteristics of components are the following[3]: (1) Structure: internal participants and how they collaborate (2) Behavior: stateless behavior and behavior which relates to states (3) Granularity: the component size and the classification (4) Encapsulation: to what degree are design/implementation decisions hidden (5) Nature: main stage used in the development process (6) Accessibility to Source Code: the modifiability of the component We aim to reuse components in the form of the object code, and at the implementation stage. Moreover, users retrieve a component generally on the basis of its functionality, and it is possible to verify the encapsulation after retrieval. Therefore, “structure”, “behavior” and “granularity” can be considered to be important characteristics of the component in terms of retrieval. Z. Bellahs` ene, D. Patel, and C. Rolland (Eds.): OOIS 2002, LNCS 2425, pp. 298–310, 2002. c Springer-Verlag Berlin Heidelberg 2002 

A Retrieval Technique for Software Components

2

299

Component Retrieval

Conventional retrieval approaches for software components can be classified into four types: automatic extraction approach, specification-based approach, similarity-based approach and type-based approach. The automatic extraction approach is based on the automatic extraction of structural information from components[4]. When source codes are not available, the extracted information is insufficient for the retrieval[5]. The semi-formal specification-based approach is based on catalog information of components[2]. In addition, the formal specification-based approach, which uses a semantic description of the component’s behavior, has been proposed[6]. The preparation costs of both approaches become large because additional descriptions are necessary. The similarity-based approach is based on the similarity between a user’s query and the component stored in the repository[5, 7]. User’s queries are given as a prototype of the component that satisfies the user’s requirement. The type-based approach is based on the component type and the method type[8]. Search results are classified according to adaptability, for example, exact match and generalized match, but more detailed ranking within each match set cannot be obtained. There is another type-based approach by which detailed ranking can be obtained[9], but it requires source codes of components. These approaches consider a single characteristic of the component, and cannot evaluate the total semantic adaptability of the component[2]. The retrieval mechanism should be able to consider two or more characteristics simultaneously. In addition, not all components available over the Internet have additional specification descriptions[5]. The retrieval mechanism should not require any additional information other than the components themselves.

3

Directed Replaceability Similarity

We propose directed replaceability similarity (DRS) as a metric to represent semantically the degree of difference between two components. In a situation in which the component cq is used and system requirements are the same before and after the replacement, when cq is replaced with another component cs , parts which use cq must be modified. DRS(cq , cs ) indicates the necessary adaptation cost in such a situation. It is assumed that all methods of cq are uniformly used. DRS is composed of three primitive similarities corresponding to considered characteristics: the structural similarity DRSS , the behavioral similarity DRSB and the granularity similarity DRSG . All primitive similarities are normalized between 0 and 1. DRS is defined as a dynamically weighted linear combination of primitive similarities. The weight values can be adjusted by users to reflect their own perspectives on the importance of three characteristics. DRS(cq , cs ) 3 is defined as follows, where i=1 wi = 1 and wi ≥ 0: DRS(cq , cs ) ::= w1 DRSS (cq , cs ) + w2 DRSB (cq , cs ) + w3 DRSG (cq , cs ).

300

Hironori Washizaki and Yoshiaki Fukazawa

We first define a similarity function, d(x, y, z), which is commonly used while defining primitive similarities. In this paper, “type” means the classification of any attributes of the component. It is assumed that the binary relation  is defined on the set τ composed of instances of the type t, and τ,  is a partially ordered set. We call the least element of τ,  “the least instance”(roott ). The transformation to obtain the immediate predecessor of a certain instance is assumed to be f . The type instance’s position is defined as the value by which 1 is added to the number of transformation (f ) times necessary for the instance to arrive at roott in τ, . d(x, y, z) represents how an instance Y differs from an instance X from the viewpoint of X using positions x, y and z corresponding to X, Y and Z; Z is a common deepest ancestor instance between X and Y . Z satisfies the following expression: M = {Z  : t | Z   X ∧ Z   Y : Z  }, Z ∈ M ∧ (∀Z  : t | Z  ∈ M : Z   Z). M is the set of small or equal instances from X and Y about . The requirements defining d are the following: (r1) The similarity is always normalized between 0 and 1. (r2) The similarity between equivalent instances is always 0. (r3) When the position of X is smaller than that of Y , the similarity between X and Y seen from X is smaller than that seen from Y . (r4) If relative positions among instances X, Y, Z are fixed, the similarity between X and Y becomes small as the position of Z becomes deep. We define d(x, y, z) as follows based on the above-mentioned requirements. x, y, z ∈ (Positive Integer Set), 0 < z ≤ x, y  1+y y(x+y−2z) d(x, y, z) ::= 1+y 2z t12 dt = (y+1)(x+y+2yz) x+y

d(x, y, z) satisfies the following features corresponding to the requirements. In the following, a, b, c, e are all positive integer values.  1+b 1 (f1) c ≤ a, b ⇒ limb→∞ 1 t2 dt = 1 ⇒ d(a, b, c) < 1 (f2) a = b = c ⇒ d(a, b, c) = d(b, a, c) = 0 (f3) c < a < b ⇒ 0 < d(a, b, c) < d(b, a, c) < 1 (f4) c ≤ a, b ⇒ d(a + e, b + e, c + e) < d(a, b, c) 3.1

Structural Similarity

The component’s name and the component’s method structures (signatures) can be enumerated as attributes that compose the structural characteristic of the component. For example, there are four components, C1 ∼ C4 , shown in Figure 1 and C1 ’s calc1 is assumed to be used. These components have only one method and one member field respectively. Parts which use calc1 need not be modified when calc1 is replaced with a method where the value range of the argument’s type is the same or greater than int, and the value range of the return value’s type is the same or less than int. The relation among value ranges of types is as follows: {x : short| : x} ⊂ {x : int| : x} ⊂ {x : long| : x}. Therefore, the order of easiness

A Retrieval Technique for Software Components

Component C1 C2 C3 C4

Field int data = 0; long data = 0; long data = 0; int data = 0;

Signature int calc1(int x) { short calc2(int x) { short calc3(long x) { long calc4(short x) {

301

Body data = x; return x; } data = (long) x; return 0; } data = (long) x; return x; } return 0; }

Fig. 1. Examples of methods with different structures/behaviors

to replace with calc1 is as follows in terms of the structure: calc2 < calc3 < calc4. DRSS is calculated from sets of such method structural difference and the difference between components’ names which components have before and after replacement. DRSS is defined as follows, using the string similarity dw for names of components and the instance set similarity dr for sets of method structures. Structure of component CS , Method structure MS CS ::={name : String, methods : {m1 : MS , ..., mn : MS }} cq , cs : CS cq = {name = nq , methods = msq } cs = {name = ns , methods = mss } dw(nq ,ns )+2dr(msq ,mss ) DRSS (cq , cs ) ::= 3 The string similarity dw(wq , ws ) between wq and ws is defined as follows using the longest common substring wp of two strings and the function d.  d(#wq , #ws , #wp ) wq , ws , wp : String #wq =(length of wq ) dw(wq , ws ) ::= 1 (wp does not exist) The instance set means the set of the same type’s instances. The instance set similarity dr(Rq , Rs ) of two instance sets Rq and Rs of the type x can be calculated by averaging the total of dx of all pairs in Rq × Rs without any duplications of instances. dx means the similarity of two instances of the type x. We call dx “the internal similarity”. However, the instance set similarity should reflect the difference of the number of instances. Here, #R is the number of instances in R. First, at f1 (Rq , Rs ), dx(q, s) are calculated for all pairs of (q, s), which consist of the instance q in Rq and the instance s in Rs . From all pairs of (q, s), pairs are selected in order from the smallest of dx(q, s) so that instances in the pair may not overlap with instances in already selected pairs. The set of these selected pairs is defined as Sf . Second, if #Rq > #Rs , for all q in the remainder of Rq after calculating f1 , new pairs (q, rootx ) are created using the rootx , which is the least instance of the type x, at f2 (Rq , Rs ). On the other hand, if #Rq < #Rs , for all s in the remainder of Rs , new pairs (rootx , s) are created at f3 (Rq , Rs ). Finally, dr(Rq , Rs ) is defined as an average value of the total of f1 , f2 , f3 .

302

Hironori Washizaki and Yoshiaki Fukazawa

 f1 (Rq , Rs ) ::= (q,s)∈Sf dx(q, s)  f2 (Rq , Rs ) ::= q∈Rq −{q :x|q ∈Sf :q } dx(q, rootx ) (if #Rq > #Rs )  f3 (Rq , Rs ) ::= s∈Rs −{s :x|s ∈Sf :s } dx(rootx , s) (if #Rq < #Rs ) dr(Rq , Rs ) ::=

f1 (Rq ,Rs )+f2 (Rq ,Rs )+f3 (Rq ,Rs ) max(#Rq ,#Rs )

The method structure MS is composed of the method name and the functional type of the signature. The method structural similarity dms(mq , ms ) between method structures mq and ms is defined as follows, using the string similarity between method names and the functional similarity df between signatures. MS ::= {name : String, signature : F } mq , ms : MS mq = {name=nameq , signature=sigq } ms ={name=names , signature=sigs } dw(nameq ,names )+2df (sigq ,sigs ) dms(mq , ms ) ::= 3 When the instance set similarity between instance sets of the method structure is calculated at DRSS , dms is used as the internal similarity and the following rootms is used as the least instance of the method structure: rootms = {name = ” ”, signature = {params = {} → return = rootT }}. The functional similarity df (fq , fs ) between functional types fq and fs uses the instance set similarity for arguments and the normal type similarity dt for the return value. Since arguments of the functional type in an object-oriented type system follow a contravariance rule in terms of subtyping[10], arguments after replacement (ps ) are compared with those before replacement (pq ). In the following, T denotes the power type of normal types. Each instance of T is the normal type itself. Functional type F ::= {params : {t1 : T, ..., tn : T } → return : T } fq , fs : F fq = {params = pq → return = rq } fs = {params = ps → return = rs } dr(ps ,pq )+dt(rq ,rs ) df (fq , fs ) ::= 2 The value type (int etc.), the object type (Object etc.), and the value-wrapper type (Integer etc.) are enumerated as the normal type. By introducing the least instance of T (rootT ) as the super type of all normal types, normal types form a single partially ordered Is-a graph. We use the subclass relation as the subtyping relation of the object type. Since value-wrapper types have primitive values (instances of value types), we use the subset subtyping of these primitive values as the subtyping relation of the value-wrapper type. Figure 2 shows a standard Is-a graph in Java language. The subtyping relation is described as subtype