Supporting Web Application Evolution by ... - RCOST - Unisannio

Supporting Web Application Evolution by Dynamic Analysis Giuseppe Antonio Di Lucca*, Massimiliano Di Penta*, Anna Rita Fasolino°, Porfirio Tramontana° {dilucca, dipenta}@unisannio.it, {fasolino, ptramont}@unina.it *RCOST - Research Centre on Software Technology, University of Sannio Palazzo ex Poste, via Traiano, 82100 Benevento, Italy ° Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II Via Claudio, 21, 80125 Napoli, Italy

appropriate notations (see, for example, the Conallen’s UML extension [4]) have been proposed. However, in most cases the only source of documentation available is just the source code of the WA itself. This because the fast required development, often, does not permit the production of adequate development documentation, useful to reduce the effort of maintenance/evolution operations. The only possibility is, in this case, to recover the needed missing information by reverse engineering the WA [8].

Abstract The evolution of Web Applications needs to be supported by the availability of proper analysis and design documents. UML use case diagrams are certainly useful to identify features to evolve, as well as to study the Web Application evolution in terms of features added/removed or changed. Unfortunately, very often the only source of documentation available is constituted by the Web Application source code. This paper proposes an approach to abstract use case diagrams from execution traces of a Web Application. The approach is mainly based on the analysis of a graph modelling the transitions between the pages navigated along user sessions and the clustering of the navigated pages. A case study carried out to validate the proposed approach and showing its feasibility is reported in the paper.

In this paper we will focus on the reverse engineering of WA use case diagrams. These diagrams can support the WA evolution in different possible ways, such as: - identifying the features to be evolved, mapping the modification requests to use cases; - identifying the features impacted by a modification; - analyzing how the WA evolves in terms of features added, removed or changed: this can be made analyzing use case diagram snapshots taken at different releases; or - supporting the regression testing of the evolved WA.

Keywords: web application reverse engineering, dynamic analysis, UML diagram abstraction

1. Introduction The rapid diffusion of Web Application (WAs), and the growth of their complexity, have raised the need for supporting their evolution with a disciplined life cycle and proper methodologies. Due to the market pressure, Web applications are characterized by a fast development and by a high rate of maintenance and evolution operations, to continuously adapt the application to the new needs. When performing maintenance/evolution activities, it is necessary to have the WA analysis and design documentation available to effectively and correctly perform the required intervention. To this aim,

Many WA reverse engineering tasks, included the one described in this paper, are quite difficult to be performed relying only on static analysis of the code. This is already well-known for traditional software: tasks such as the recovery of design patterns [10], scenarios and sequence diagrams [16, 3] are just some examples. WAs tend to be more and more highly interactive and dynamic than traditional applications: HTML pages can be dynamically built by server pages, thus, according to the user inputs or requests, the WA user interface may

1

Proceedings of the 2005 Eighth International Workshop on Principles of Software Evolution (IWPSE’05) 1550-4077/05 $20.00 © 2005

IEEE

adoption of the Conallen notation [4] for WA documentation introduced the need for reverse engineering UML documentation relying on that extension. To this aim, Di Lucca et al. [8] proposed an approach and a tool, named WARE, to recover WA's documentation represented by UML diagrams. In particular, Di Lucca et al. [6, 7] presented an approach to abstract use case diagrams, sequence diagrams and business object models from WAs. The approach proposed relies on static information, which may not suffice for an effective and complete abstraction of UML diagrams, due to the dynamic nature of some WA components.

change at run-time. Moreover, even pieces of code (e.g., client-side scripts) can be dynamically generated. Thus, for highly dynamic WAs, static analysis is likely to give only an imprecise and approximate picture, and only dynamic analysis allows a proper understanding of complex and dynamic application behaviour (such as the client-side logic). Dynamic analysis also allows to track several other information, such as the session and cookie data, the DBMS tables and queried entities, or the frequency of exercising a particular link [17], and the type of link actually exercised (e.g., hyperlinks, submit with GET or POST). This paper proposes an approach for abstracting UML documentation from information collected dynamically by the execution of instrumented WAs. In particular, the execution traces of user sessions are exploited to recognise the use cases the users executed and to highlight their relationships in use case diagrams. The approach is based on the production of a graph, named Transition Graph, depicting the Web pages the users navigated along a set of user sessions. Some criteria are defined and applied to analyse this graph and deduce use cases.

In the past, approaches for the abstraction of sequence diagrams and interaction scenarios have been proposed for traditional applications. Kollmann et al [11] compared the (static) reverse-engineering capabilities of the existing commercial UML tools. T. Systä [16] presented a tool for abstracting scenarios from traces obtained from debugging Java bytecode. Similarly, Richner and Ducasse [15] used dynamic information for recovering collaboration diagrams and roles. Tonella and Potrich [18] presented an approach, based on static flow analysis, to extract interaction diagrams. Briand et al. [3] presented an approach for reverse engineering sequence diagrams from execution traces. They also present a survey on the existing techniques, highlighting their pros and cons. El Ramly et al. [9] presented an approach to recover use cases from execution traces for the purpose of reengineering legacy systems. Use case extraction was performed by detecting patterns over sequences of screens. Similarities can be found here when detecting sequences of pages, even if, as shown in Section 3, peculiarities such as the need for clustering similar generated pages emerge when analyzing WAs.

The paper is organized as follows. After a review of the related work in Section 2, Section 3 describes the proposed approach. Section 4 shows the approach availability on a case study. Finally Section 5 concludes and outlines the directions for future work.

2. Related Work The lack of a disciplined development process for WAs has introduced the need for suitable reverse engineering approaches. Antoniol et al. [1] proposed an approach, based on the Relational Management Methodology (RMM), to recover web site architectures. Ricca and Tonella, developed the ReWeb tool to analyze web sites [12, 13, 14]. In particular, they extended to WAs traditional static flow analyses such as reachability, dominance, and data flow analysis. Ricca and Tonella also proposed to enhance the analyses considering dynamic information [17]. We agree with their statement: the abstraction of use case diagrams relies, in the present paper, on dynamic information extracted from WA execution. However, while ReWeb obtains dynamic information from web server logs we obtained dynamic information by instrumenting the WAs in order to capture some other data that are not available from server logs, such as data stored into/read from a data base. The first approach does not require page instrumentation: however, the fact extraction capability is limited. To obtain information such as variable passed between pages, database or file access, instrumentation is necessary. The

Antoniol et al. [2] proposed a tool, named WANDA, for WA dynamic analysis. The tool enables a fine-grained level dynamic analysis of WAs under execution. This permits the extraction of extended UML diagrams, using stereotypes and tagged values, with information such as the frequency of traversing a link, the percentage of read or write operations performed on a file, the type of operations performed on databases, the interaction with Web Services. Also, by analyzing session variables, cookies and variables passed by the GET or POST method, the tool permits the identification of the data flow between pages. The information extracted by WANDA relies on a metamodel of a WA, and is stored into a database designed according to such metamodel. A similar metamodel is used in the present paper as a baseline to extract UML documentation from WAs.

2


IEEE

3. The Abstraction Process User Session Trace

Abstracting the behaviour model of a WA based on static analysis of its source code may produce incomplete and imprecise results, due to the technologies currently adopted to implement Was (such as JSP. PHP, and so on). Dynamic analysis represents a viable solution to overcome static analysis limitations and to support an effective abstraction of diagrams describing the WA behaviour. In this section, a reverse engineering approach based on dynamic analysis will be presented to abstract use case diagrams of an existing WA. The approach exploits the information that are recorded during the executions of an instrumented WA. To this aim a WA is modelled as a set of web pages that a user can access along a working session. An accessed page may be a Server Page or a Client Page, by which the user interacts with the WA. A Client Page may be a Static Client Page (its content is fixed, stored in a permanent way) or a dynamically Built Client Page, (its content may vary over the time and it is generated on-thefly as output of an execution of a server page). A Transition is composed of sequentially visited pages by navigating a link from a Starting Page to a Target Page. A Transition is due to different types of relationships between pages (Hyperlinks, form Fubmission, Build, Redirection, Inclusion). A Transition may be characterised by a set of Parameters passed from the Starting page to the Target one. It is worthwhile to note that we consider just the transitions corresponding to links between two pages actually implemented in the WA, i.e. we do not consider the transitions a user makes by using the forward/back browser buttons or by directly typing the URL of a page in the browser command line. However, this does not affect the effectiveness of the proposed analysis because when a user go back o forward to a page already accessed before, the WA will show the same previous behaviour Thus, each WA execution is modelled by a User Session Trace, i.e. a sequence of Web Pages accessed by a user along her/his working sessions. All the user Session Traces may be collected in Execution Trace, representing all the executions of the WA. The class diagram in Figure 1 models this view of a WA; this model is based on the one proposed in [8]. The complete set of the execution traces can be represented by a directed graph, named Transition Graph, where each node represents a web page and a directed edge between two pages represents a transition from the starting page to the target one a user made along a session.

+starti ng page T ransi tion Web Page

Inclusion

Client Page

Server Page

Hyperlink

Form Subm issi on Static Cli ent Page

Build

Redirection

Built Client Page

Figure 1: The WA model Appropriate criteria have been defined and have been applied to the Transition Graph for identifying subsets of Web Pages potentially implementing different Use Cases of the system and suggesting possible relationships between use cases. The abstraction process proposed in this paper includes the following steps: 1) WA Instrumentation 2) Execution of the Instrumented WA 3) Identification and Grouping of Equivalent Built Client Pages 4) Generation of the Transition Graph 5) Use Case Diagram abstraction a. Clustering of the Transition Graph b. Abstracting Use Cases and their relationships Figure 2 depicts the proposed process, whose steps will be described in the remainder of this section.

3.1 Web Application Instrumentation The instrumentation of the WA is obtained by using the tool WANDA [2] that automatically instruments the code of a WA by inserting probes able to identify relevant information, such as which pages a user accessed, which transitions he/she activated, which parameters were involved in the activated transition, which database or file was accessed. This information is stored in a repository. Moreover, WANDA stores in the repository, also, the HTML source code of each Built Client Page that is generated by the Server pages of the application as a response to a user request.

3

IEEE

Parameter *

+target page


Execution Trace

User

WA Instrumentation

WA

Instrumented WA

Use Case Diagram Abstraction

Use Case Diagram Human Expert

Trace Repository

WA Execution

Use Case Identification

Validation Validated Diagrams

Clustered Transition Graph

Relationship Identification

Built Client Pages User Session Traces

Cloned Built Pages Identifier

Transition Graph Generation and Analysis Pruning Backward Transitions

Classes of equivalent Built Pages

Clustering

Figure 2: The Abstraction Process

control and data component. Pages with the same control component, but different data component, can be considered as equivalent pages, belonging to a same equivalence class. Of course, all the pages included in the same class exhibit the same behaviour; thus we can reduce the comprehension effort because we shall analyse just a page for each class and not all of them. The set of BCPs generated from a server page is analysed to identify groups of equivalent pages. An equivalence class will be defined for each groups and a single equivalent page will be used to represent each groups of equivalent Built Client Pages of a given Server Page. The identification of clusters of equivalent BCPs will be obtained by exploiting the clone detection techniques proposed in [5]. These approaches identify as clones groups of similar pages according to a Levenshtein distance over structural information.

It is not possible to know a-priori the complete set of client pages that a server page will be able to build at runtime. However, the classification of observed Built Client Pages in different groups of equivalent pages is possible. A method for grouping these pages will be defined in section 3.3. Therefore, each server page will build client pages belonging to a finite set of Built Client Pages Equivalent Classes.

3.2 Execution Application

of

the

Instrumented

Web

To collect execution traces useful for an effective extraction of use case models, the instrumented WA needs to be executed in a real usage environment. This allows the collection of information about the interaction of users with the WA, by storing into a repository the information 'captured' by the probes.

3.3 Identification and Grouping of Equivalent Built Client Pages

3.4 Generation of the Transition Graph Once a significant set of user session traces has been obtained by executing the instrumented WA, and after the clone detection has been able to detect equivalent BCPs from these traces, the next step of the abstraction process requires that a graph representing all web pages reached during the navigations and all transitions from a page to a successive one is produced. Such a graph is called Transition Graph, TG(N, E), where N is a sub-set of the WA pages, and E is the set of edges associated to the transitions between consecutive pages.

In general, the layout of Built Client Pages (BCP) resulting from different executions of a server page will be different, depending on the input data provided by the users. In particular, Built Client Pages will differ either for their control component (i.e., the set of items - such as the HTML code and scripts - determining the page layout, business rule processing, and event management) or for the data component (i.e., the set of items - such as text, images, multimedia objects - determining the information to be read/displayed from/to a user), or for both the

4


IEEE

The TG can be obtained by analysing the available execution traces, and collecting all pages and transitions into a graph. The TG is generated by the three-steps process described in the following sub-sections.

3.4.2 Unification of the Trace Graphs The Transition Graph is produced by merging the Trace Graphs. Supposing that m Trace Graphs have been generated, where Ti(Ni, Ei) is the generic i-th graph and BT is the set of identified Backward Transitions, the Transition Graph (N, E) will be defined as follows:

3.4.1 Analysis of the User Session Traces for Identifying and Pruning ‘Backward Transitions’ Each user session trace can be represented by an oriented graph, called Trace Graph, Ti(Ni, Ei) where Ni is the set of pages included in this trace and Ei is the set of edges representing the transitions between a pair of consecutively navigated pages. This graph may contain cycles, since a trace may include ‘Backward Transitions’, i.e. those transitions representing the user navigation from a page to another one that she/he had already accessed during the navigation1. These transitions are not meaningful for our scopes, since they do not indicate the activation of any new WA behaviour. Therefore, for each Trace Graph, edges associated with backward transitions will be detected and pruned. A possible process for detecting Backward Transitions in the traces requires that the Trace Graph edges are analysed, and the corresponding nodes are stored in a list. If an edge reaches a node already included in the list, then this edge can be classified as a Backward Transition, and the corresponding edge is removed from the trace, that will result divide into two separated sub-sequnces. The remaining part of the trace will be analysed independently from the previous part of the trace. As an example, let’s consider the following trace (each letter represents a page, each arrow represents a transition): aÆbÆcÆdÆaÆcÆgÆc The Figure 3 (a) shows the corresponding Trace Graph. Analyzing this trace, the transition dÆa will be identified as a backward transition because it reaches the already visited page a. After this identification, the transition dÆa is removed and the trace results made up by the two separated sub-sequences aÆbÆcÆd and aÆcÆgÆc. The remaining sub-sequence (aÆcÆgÆc) is analysed independently from the aÆbÆcÆd, and therefore only the transition gÆc will be identified as a backward transition. Figure 3 (b) shows the final subsequences pruned of the backward transitions.

N= N1 N2 .... Nm E= E1 E2 .... Em - BT Figure 3 (c) shows the Transition Graph resulting from the unification of the Trace Graph sub-sequences in figure 3 (b).

a)

b)

c) Figure 3: Examples of graphs: a) Trace Graph - b) Pruned Trace Graph - c) Transition Graph

3.5 Use Case Diagram abstraction The final step of the abstraction process consists of the identification of use cases and their relationships. This step will be carried out on the basis of the following assumptions. Usually, in a WA a use case is implemented by a set of pages that interacts through the links existing among them. In an execution trace the execution of a use case will correspond to a sequence of linked pages. In the Transition Graph (TG) such a sequence will correspond to a TG sub-path made up by nodes, each of which is characterised just by one entering edge and just one leaving edge. Moreover, web pages associated with TG nodes having more than one leaving edge usually correspond either to client pages allowing a user to choose among several actions/functions, or to server

1

These backward transitions can be associated to the occurrence of backward connections between web pages implementing shortcuts among pages, such as those due to anchors towards the home page, menu pages, or pages navigated previously.

5


IEEE

generated, with clusters including a growing number of nodes. Depending on the desired granularity level of the clusters included in the CTG, the software engineer carrying out the analysis will decide when the clustering process will have to be stopped. Therefore, the final CTG will be submitted to the next step of the abstraction process.

pages activating different actions/functions according to a user input. Finally, TG nodes with more than one entering edge may be associated to pages (client or server) implementing a common behaviour, included by all the pages belonging to TG sub-paths reaching those nodes. On the basis of these considerations, the TG is analysed in order to identify groups of linked nodes composing notable sub-graphs, and group these nodes into clusters. After the clustering activity, a number of heuristics are used to define the WA use cases and possible relationships between them.

3.5.2 Abstracting Use Cases and their relationships The use cases of the WA will be deduced from the CTG, according to the rule that associates each cluster of the CTG to a use case. Moreover, analysing the composition of the clusters, the existence of alternative use case scenarios, or possible ‘extend’ or ‘include’ relationships between use cases will be suggested. In particular, the following rules will be applied for proposing the existence of relationships between clusters: any CTG cluster obtained by the b) rule is a candidate to implement a use case that is likely to be extended by other use cases, associated with clusters whose nodes are reached from the Fork node; any CTG cluster obtained by the c) rule is a candidate to implement a use case that is included in other use cases: the including use cases are those associated with clusters whose nodes reach the Join node; any CTG cluster obtained by the e) heuristic rule is a candidate to implement a use case showing more than one interaction scenario.

3.5.1 Clustering of the Transition Graph Clustering of the TG requires that all TG nodes be classified (and labelled) according to the number of edges entering and leaving them: Groupable (G) nodes: each node with just one entering edge and just one edge leaving it. Fork (F) nodes: each node with just one entering edge and with more than one edge leaving it. Join (J) nodes: each node with more than one entering edge and with just one edge leaving it. Join/Fork (N) nodes: each node with more than one entering edge and with more than one leaving edge. The following heuristics are used to carry out the TG clustering: a) two or more consecutively linked G nodes (i.e. a sequence of two or more G nodes) will be clustered together; b) a F node will be clustered with the G node (or sequence of G nodes) reaching it; c) a J node will be clustered with the G node (or a sequence of G nodes) it reaches; d) a J node will be clustered with a F node it reaches; and e) a F node will be clustered with the G nodes it reaches if all the edges forking by the F node are G nodes.

A validation of the proposed use cases and their relationships will have to be carried out by analysing the semantic of each web page included in the involved clusters. 3.5.3 Associating Actors to Use Cases Actors will be associated to each use case corresponding to clusters including at least a client page. However to make readable the resulting use case, just the actors associated to base use case are drawn in the diagram. The type of each actor has to be defined by the software engineer.

When these clustering rules will have been applied to the TG, each group of clustered nodes will be replaced by a single node representing that cluster. This new node will inherit the edges reaching or leaving the cluster nodes and that reach (or are reached by) at least a node not included in the cluster. Consequently, the set of Transition Graph nodes and edges will change and a new graph, called Clustered Transition Graph (CTG), will be obtained. This new graph needs to be analysed in order to detect if there are new groups of nodes that can be clustered together, and collapsed into new single nodes. The rules can be applied iteratively on the Clustered Transition Graph while they are able to group nodes on this graph. In this way, a hierarchy of CTGs can be

4. Case Study To validate the proposed approach, a case study aiming at assessing its effectiveness has been carried out on some small/medium sized WAs. In the following, the results obtained by applying the approach on a small WA will be reported. The Web Application under analysis allows users to make predictions about some sport events (such as football matches); the Player who made the greatest

6


IEEE

number of right predictions wins the game. WA users registration is required to participate to the game. An Administrator inputs the results of the considered sporting events; according to the inputted results and the predictions made, each player is assigned a score, and a ranking of the players is computed. The WA consists of 11 server pages, 2 static client pages while a database is used to record predictions and results. It is implemented using ASP and Javascript scripting languages. The Web Application was instrumented by the WANDA tool and 1587 user session traces of the instrumented version of the Web Application were recorded and stored in the repository. Each of the 11 server pages generated several Built Client Pages (BCPs), that were stored in the repository, too. These BCPs were analysed using clone detection techniques and for each server page the groups of equivalent BCPs were collected in equivalence classes. In total 20 equivalence classes were identified. Table 1 reports the identifiers and the filenames of the static pages of the application and, in the third column, the identifiers of the equivalence classes of BCPs each server page generates. Conventionally, Server Pages are labelled as SPxx, Client Pages as CPxx and Built Client Pages as BCPxx.

The TG nodes were classified as: -

20 groupable nodes (G); 11 fork nodes (F); 2 join nodes (J).

Figure 4 reports the Transition Graph. In this figure, F nodes are depicted with diamonds, J nodes with ellipses, N with circles and G nodes with boxes. In the figure, boxes are drawn around the nodes to show the clusters generated at each step of the iterative application of the clustering rules; each box has a tag Cx, where x is a number indicating the step where the cluster was generated. Thus inner box shows the clusters generated in the first steps of the clustering process. The final Clustered Transition Graph presented 8 singleton clusters (i.e. cluster made up by just one page) and 7 clusters with more than one node. Each cluster in the Clustered Transition Graph was associated to a candidate use case. The use cases were submitted to a validation process carried out by a software engineer that had no knowledge of the application. He was able to assign a concept to each of the candidate use cases, i.e. all the clusters made up valid use cases. He also defined two types of Actors: the Player and the Administrator.

The user session traces were analysed and nine Backward Transitions were identified. The analysis of the WA executions confirmed that they were actual Backward Transitions. The generated Transition Graph included 33 pages and 35 transitions among them.

Table 2 reports the list of clusters (the cluster IDs correspond to the ones reported in Figure 4 near the larger box delimiting the clusters, or the page identifier for singleton clusters) and the concepts assigned to each corresponding use case.

Table 1: WA pages

Table 2: Abstracted Use Cases

Page ID CP13

Filename

Equivalence Classes of Built Client Pages

Cluster ID C4 C5 C8 C9 C10 C11 C12 SP1 SP14 BCP2.3 BCP15.1 BCP15.2 BCP18.1 BCP18.2 CP13

/login.htm

CP21

/nuovo.htm

SP9

/class.asp

SP11

/insscomm.asp

BCP12.1, BCP12.2, BCP12.3

SP4

/menu.asp

BCP15.1, BCP15.2

SP3

/menuadm.asp

BCP18.1, BCP18.2

SP1

/accept.asp

BCP2.1, BCP2.2, BCP2.3

SP19

/nuovo.asp

BCP20.1, BCP20.2

SP17

/risult.asp

BCP22

SP16

/scomm.asp

BCP23.1, BCP23.2

SP5

/adminsr.asp

BCP6.1, BCP6.2

SP7

/admris.asp

BCP8.1, BCP8.2

SP14

/logout.asp

BCP10

7

Use case Description View Ranking Registration Insert Result Input the Predictions View Results Validate Player Validate Admin Check Login Logout Access Denied Player Menu Access Denied Admin Menu Access Denied Home Page


IEEE

The relationships among the validated use cases were defined according the guidelines described in section 3.5. In two cases, the relationships proposed by the heuristic were refused; in both cases an include relationship was proposed while an extend one was considered more suitable by the software engineer. In another case, both include and extend relationships were proposed, and the extend one was chosen in this case as well. Figure 5 shows the resulting Use Case diagram. It can be observed that only extend relationships among the use cases exist. This is because the fork nodes correspond to pages whose behaviour is conditioned by the selections the users may do in the client pages. To not affect the readability of the diagrams, in figure 5 the two actors Player and Administrator are not reported. Figure 5 also highlights that there is more than one use case named 'Access Denied'. These use cases correspond to the generation of a BCP when a user is not recognised as a registered one by different server pages making this check operation. In this case, a reengineering intervention could be suggested to encapsulate all the user checking operation in just one page.

In future work, a wider experimentation involving larger size WAs will be carried out with the aim of assessing the scalability of the approach. It will be also interesting to experiment how developer will benefit of these diagrams when maintaining/evolving their WAs.

References [1] G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia, “Web site reengineering using RMM,'' in Proceedings of International Workshop on Web Site Evolution, Zurich, Switzerland, March 2000, pp. 9-16 [2] G. Antoniol, M. Di Penta and M. Zazzara “Understanding Web Applications through Dynamic Analysis”, in Proceedings of the 12th International Workshop on Program Comprehension, 24-26 June 2004, Bari, Italy, pp. 120-129 [3] L. Briand, Y. Labiche, and Y. Miao, “Towards the reverse engineering of UML sequence diagrams,'' in Proceedings of 10th IEEE Working Conference on Reverse Engineering, WCRE 2003, 13-16 November 2003, Victoria, British Columbia, Canada pp. 57-66 [4] J. Conallen, Building Web Applications with UML (2nd Edition). Addison-Wesley Publishing Company, 2002.

5. Conclusions In this paper dynamic analysis has been proposed for abstracting UML Use Case Diagrams from WAs. These diagrams, together with other extracted documentation, constitute an important support for evolving WAs. Indeed, the knowledge of WA pages implementing use cases boundaries makes WA maintenance and evolution easier. The approach first models the recorded WA executions by a graph called Transition Graph; then this graph is analysed and clusters of nodes are defined. Each cluster is associated to a candidate use case. The use cases are arranged in a use case diagram where the relationships among the use cases are defined according to some heuristics. Actors are defined according to the semantic of each use case and associated to base use case. Results obtained abstracting use case diagrams from medium size WAs showed that the dynamic information collected via WAs instrumentation allows to precisely identify the set of WA use cases. As expected, the analysis of a higher number of user session traces improved the meaningfulness of the abstracted diagrams. The proposed approach allows the comprehension effort needed to evolve a web application to be reduced sensibly. Indeed it provides an automated support to identify the groups of web pages responsible of the use cases the application implements, and that will easy the identification of the pages impacted by the changes an evolutionary operation requires.

[5] G. A. Di Lucca, M. Di Penta, A. R. Fasolino, “An approach to identify duplicated web pages”, Proceedings of the 26th Annual International Computer Software and Applications Conference, COMPSAC 2002, 26–29 August 2002, Oxford, England, UK, pp. 481 - 486 [6] G. A. Di Lucca, A. Fasolino, P. Tramontana, and U. De Carlini, “Abstracting business level UML diagrams from web applications,'' in Proceedings of 5th IEEE International Workshop on Web Site Evolution, WSE 2003, 22 September 22 2003, Amsterdam, The Netherlands pp. 12-19 [7] G. A. Di Lucca, A. Fasolino, P. Tramontana, and U. De Carlini, “Recovering a business object model from web applications,'' in Proceedings of 26th Annual Conference on Computer and Software Applications, COMPSAC 2003, 3-6 November 2003, Dallas, Texas, USA, pp. 348-353 [8] G.A. Di Lucca, A.R. Fasolino, P. Tramontana, "Reverse Engineering Web Application: the WARE approach", Journal of Software Maintenance and Evolution: Research and Practice (Wiley), Volume 16, Issue 1-2, 2004, pp. 71101 [9] M. El-Ramly, E. Stroulia and P. Sorenson, “Mining SystemUser Interaction Traces for Use Case Models”, in Proceedings for the 10th IEEE International Workshop on Program Comprehension, IWPC 2002, 26-29 June 2002, Paris, France pp.21-29

8


IEEE

Figure 4: The Clustered Transition Graph of the WA from the case study

Figure 5: The Use Case Diagram of the WA from the case study

9


IEEE

[10] D. Heuzeroth, T. Holl, G. Högström, and W. Löwe, “Automatic design pattern detection'' in Proceedings of the 11th IEEE International Workshop on Program Comprehension, IWPC 2003, 10-11 May 10-11, Portland, Oregon, USA pp. 94-103

[15] T. Richner and S. Ducasse, “Using dynamic information for the iterative recovery of collaborations and roles”, in Proceedings of IEEE International Conference on Software Maintenance, ICSM 2002, 3-6 October 2002, Montréal, Canada, pp.34-43

[11] R.Kollmann, P. Selonen, E. Stroulia, T. Systä and A. Zundorf, “A Study on the Current State of the Art in Tool-Supported UML-Based Static Reverse engineering“, In Proceedings of the. 9th Working Conference on Reverse Engineering, WCRE 2002, 29 October - 1 November 2002, Richmond, Virginia, USA pp. 22-32

[16] T. Systä, “On the relationships between static and dynamic models in reverse engineering Java software'' in Proceedings of 6th IEEE Working Conference on Reverse Engineering, WCRE 1999, 6-8 October, 1999, Atlanta, Georgia, USA, pp. 304-313 [17] P. Tonella and F. Ricca, “Dynamic model extraction and statistical analysis of web applications'' in Proceedings of 4th IEEE International Workshop on Web Site Evolution, WSE 2002, 2 October 2, 2002, Montréal, Canada, pp. 43-52

[12] F. Ricca and P. Tonella, “Web site analysis: Structure and evolution'' in Proceedings of IEEE International Conference on Software Maintenance, ICSM 2000, 1114 October, 2000, San Jose, California, USA, pp. 76-85

[18] P. Tonella and Alessandra Potrich. “Reverse Engineering of the Interaction Diagrams from C++ Code”, in Proceedings of the International Conference on Software Maintenance, ICSM 2003, 22-26 September 2003, Amsterdam, The Netherlands pp. 159168

[13] F. Ricca and P. Tonella, “Understanding and restructuring web sites with ReWeb“, IEEE Multimedia, vol. 8, pp. 40-51, Apr-Jun 2001. [14] F. Ricca and P. Tonella, “Analysis and testing of web applications'' in Proceedings of the International Conference on Software Engineering, ICSE 2001, 1219 May 2001, Toronto, Ontario, Canada, pp. 25-34

10


IEEE