Extracting Business Rules from Existing Enterprise

Extracting Business Rules from Existing Enterprise Software System Kestutis Normantas, Olegas Vasilecas Information Systems Research Laboratory, Vilnius Gediminas Technical University, Sauletekio al. 11, LT-10223, Vilnius, Lithuania {kestutis.normantas, olegas.vasilecas}@isl.vgtu.lt

Abstract. As software systems evolve, it becomes increasingly complex for maintainers to keep them aligned with rapidly changing business requirements. Therefore the cost of software maintenance often exceeds the cost of its initial development or adaptation. As a result, automated approaches for software comprehension emerge providing valuable improvements and cost-savings for the software maintenance. This paper presents an approach that facilitates software comprehension by enabling traceability of implementation of business rules and business scenarios in the software system. It also describes a case study on application of this approach for comprehension of business logic implemented in the enterprise content management system and reports obtained results. Keywords: Business Knowledge Extraction, Business Rules Discovery, Knowledge Discovery Meta-Model, Architecture-Driven Modernization, Model-Driven Reverse Engineering

1

Introduction

Modern enterprises are characterized by a large amount of integrated software systems, designed using heterogeneous technologies to fulfill flexible business processes. Over time, software systems evolve and it becomes increasingly complex for maintainers to correct faults, improve performances or other attributes, or adapt to a changed environment [1]. Canfora and Cimitile [19] reveal that the cost of maintenance consumes 60% to 80% of the total life cycle costs while Seacord et al. [14] observe that the relative cost for maintaining and evolving the software has been steadily increasing and reached more than 90% of the total cost. At the same time, the Standish Group study [15], [22] shows that only one-third of code in applications contained business logic, while the remaining code was intended to support infrastructure and design activities. It follows that by identifying business logic implementing code parts, the costs of maintenance could be lowered significantly. Recently, the Object Management Group (OMG) within the Architecture-Driven Modernization (ADM) Task Force initiative [8] provides a number of standards [10] for representation and analysis of existing software systems in order to support mod-

ernization activities. The Knowledge Discovery Meta-model (KDM) [9] plays the fundamental role as it defines representation of all aspects of the software system and enables interoperability for tools that captures and analyses information about the existing system. A number of modernization projects [17] report significant cost savings by applying architecture-driven approaches in the modernization of information systems. However, not many works have been done in order to facilitate business logic extraction from the KDM representation of the software system. Therefore, in our research we aim to develop a method for business specific knowledge segregation from the knowledge about the existing software system represented within the KDM. In this paper we address related issues and present our approach for the discovery of business logic from existing software systems. We apply this approach in a case study of enterprise content management system used in several governmental organizations of Lithuania. The cost of the system maintenance has already exceeded its budged; therefore, automation of system comprehension is essential to reduce the maintenance costs. The rest of this paper is organized of follows. In the next section we give an overview to related work in the field of BR extraction from the source code. After, we explain our approach by presenting the process of BR discovery. Then, we present a case study from our on-going project and discuss on the obtained results.

2

Related Work

Numerous methods and techniques for business rules discovery from existing software systems have been contributed in the field of reverse engineering. Though they differ by their objectives, application domains and scopes, it can be seen that most of them are concerned with certain kind of program slicing1 to identify parts of code that implement business rules. Chiang [3] presents a static program slicing based approach, which for any data output determines functions that create or change it, locations of those functions, and conditions that affect the execution of functions. For this reason, a control flow graph (CFG), representing program code statements and their control dependence, is sliced in accordance with criterion C=, where p – is a program point (statement) of interest, and W is a set of output data variables. A slice is a set of executable statements that contribute to the values of W just before statement p is executed. The approach considers selection of program point p to be the last statement of program slice contributing to the computation of a business rule. It follows that a number of slices become extremely large in case of large software system. For this reason, the program code is separated into the three categories: user interface; business logic; and data access. The starting point for slicing user interface is considered as a statement reading or displaying the data. Data I/O statements such as read, write, rewrite, open, or close are considered as candidates for the starting point of slicing data access layer.

1

The term program slicing has been introduced by Mark Weiser [20], as a method for program comprehension using data flow analysis [6].

During the analysis of business logic category, code parts that affect data variables are separated and presented for software developers for validation. Another program slicing based approach for business rules extraction from legacy code is proposed by Huang et al. [5]. They define a number of heuristic rules for domain variables identification, slicing criteria identification, and slicing algorithm selection. Domain variables are identified considering every input and output variables of the system, arguments and return parameters of procedures. Slicing criteria involves input and output statements of the program, dispatch centers and return statements of procedures. A slicing algorithm is selected according formulated slicing criteria: for input variables and dispatch centers forward slicing is used; for output variables backward slicing is used to extract the relevant computation logic. Extracted business rules are represented in three ways: code-view; formula-view (three parts formulae – left hand side for a variable, right hand side for an expression that modifies variable, and conditions under which modifications may be executed); and inputoutput dependence view (bidirectional data flows between input and output parameters). An extension to Huang et al. [5] solution is proposed by Wang et al. [18]. The approach proposed by them consists of five steps as follows. First, the program is sliced into multiple slices. Then, two types of domain variables are identified: pure domain variables that represent system’s input and output; and derived domain variables that depend on pure domain variables. Dependences are established using forward slicing algorithm proposed by Bergeretti and Carre [2]. Having extracted domain variables and their dependences, the next step, called data analysis, identifies business items that are actually implemented in the selected slice. According to the obtained information, a set of business rules is extracted and represented using multiple views in order to be validated with stakeholders. An improvement to Wang et al. work is proposed by Gang [4]. The approach computes backward slices from a program dependence graph (PDG). Resulting slices are presented for validation with stakeholders as code fragments. However, code views requires deep understanding of technological aspects of the software system, therefore they are difficult to be validated with stakeholders. Putrycz and Kark [13] emphasize the fact that business analysts require more than just code snippets referring the business rules. For this reason, they propose an approach that use document (in HTML format) content extraction and key phrase analysis to link the source code implementing business rules with technical and other related documentation. To separate business processing logic from infrastructure related, they focus on single program statements that carry a business meaning, such as calculation and branching, since they most often represents high level processing. Resulting production rules in the form of are represented using business vocabulary and business rules (SBVR). From this we can see that many important issues must be taken into account: what intermediate representation of a software system should be chosen for the software comprehension; what algorithm to employ for the source code analysis; how to identify starting points for the analysis; and finally, how to represent extracted business rules in order to evaluate them with stakeholders. In contrast to related works, our

research concentrates on software systems that are built using heterogeneous technologies, and aims to gather any kind of information about the software system to facilitate the comprehension of business logic implemented in the system. We therefore rely on the KDM standard to represent the knowledge about the software system and apply source code analysis to abstract the business logic.

3

An Approach

An approach for business rules discovery in existing software systems is based on reverse engineering process that obtains intermediate representation of different aspects of the software system using the KDM and abstracts the business logic from this representation (fig. 1). It consists of the three main stages that will be overviewed in this section.

Fig. 1. The process of business logic discovery from existing software systems

3.1

Preliminary Study

The first phase of business rules discovery process involves preliminary study of the existing software system and it consist of the following steps: gather initial information; and define the strategy for knowledge extraction and representation within KDM. During the first step, the present architecture of software system is reviewed, the architectural components are identified, and the high-level dependencies between

them are established. Based on the acquired initial knowledge about the software system, a strategy for obtaining the representation within KDM is defined. The strategy establishes the list of software artifacts that will be processed, the ways they will be processed, and the time expected for delivery of each representation. 3.2

Knowledge Extraction

This phase involves several steps whose purpose is to build knowledge base used as the main source for business logic abstraction. The knowledge base consists of a set of KDM models that represent software system (referred thereafter as KDM representation), the data base of indexed software documentation, and the data base of classifiers (i.e. lookup table values) and system log information. KDM defines representation of the software system at several layers of abstraction: Infrastructure, Program elements, Runtime resources, and Conceptual layers. Within each layer, one or more architectural views on certain aspects of software system might be produced. The Infrastructure layer involves the Inventory model that is intended to represent software system’s physical artifacts: containers, folders, source files, resource definitions, etc. The Inventory model serves as a bridge between physical artifacts and their representation within higher level of abstraction. Therefore it is naturally to discover this model in the early steps, and we employ file system scanning or version control system querying to obtain required information. The Program elements layer involves Code model to represent the structure and behavior of the source code. In addition, the Code model might serve as a container for data types of particular programming language or platform resource definitions. While the representation of the structure of the source code might be transformed directly from abstract syntax trees (AST), the behavior representation requires numerous AST traversals to discover the data and control flow. The AST is generated by parser, which is built from the source code language grammar defined according to specialized AST meta-model (ASTM, [11]). The grammar is supplemented with the software API definition to facilitate identification of API usage. Transformation rules are defined according to ASTM-to-KDM mapping rules specified in [11] and considering MicroKDM [9] semantics. The latter allows obtain KDM representation of source code at sufficiently low level of granularity – statements and expressions represented using certain kinds of KDM ActionElements. The general principles of this process are illustrated in the figure 2.

Fig. 2. Process for obtainment of source code representation within KDM Code model

The Runtime Resources layer involves several kinds of models to represent resources of the system: UI, Data, Platform, and Events models. Depending on the software system, there may be different types of resource definitions, including user interface, data, workflow, system job, task, or component definitions. During runtime, these definitions are processed by the software platform to create runtime objects (e.g. form instances) that might be manipulated by the application code using software API. In order to establish dependencies between source code and runtime resource objects, we additionally create a set of Code models containing elements that represents structural composition of resources as ClassUnit element consisting of either other ClassUnit or MemberUnit elements. The content of resource definition files is structured according to particular schema definition. However, the definition of schema not always may be available; therefore, it might be reversed automatically from the content [7] or defined manually by considering only relevant parts of the content. Then, according to the predefined set of mapping rules between the schema elements and elements of particular KDM model, the content of resource definitions is parsed and corresponding KDM representation is created. KDM runtime resource models are augmented with representation of content of configuration files. While discovering their content, it is possible to discover platform resources other than previously identified. It should be noted that such information does not necessary mean that they are actual, because the configuration data may be obsolete, written by resources that are changed or removed in time. The creation of database of software documentation is built in several steps. Digital documents are parsed using specialized document parsing libraries to retrieve trees representing logical structure of document content: document, chapter, section, subsection, and body. Depending on type of document, it may be retrieved directly from the structure definition (e.g. TOC in word documents), bookmarks that links to different pages within a PDF, or considering a set of rules established regarding the properties of physical content (i.e. blocks) of document. Retrieved information is further tokenized, supplemented with corresponding attributes and indexed with full-text index engine to be available for linking with elements of KDM representation. Finally, a database of classifiers and log information is built by reviewing known lookup tables, files containing classifying data definitions, log files or tables. For each resource, a local copy of data is created and stored in the database to be available for further analysis. The data in this database is later used to define base facts from the established business terms. 3.3

Business Logic Abstraction

Having extracted all the available and relevant knowledge about the software system into the KDM representation, the next phase of the recovery process involves activities to separate KDM model parts that represent business logic implementation from the infrastructure related ones. For this reason, we first of all establish dependencies between model elements. Then, we derive initial set of term and fact units, and validate it with maintainer of the system. We follow the SBVR standard [12] as guidance to classify the business rules and to enable their formal definition. We further apply

static source code analysis techniques to refine this set and to identify the implementation logic of business rules, and to extract particular business scenarios. Finally, extracted knowledge is reviewed and approved with the maintainers of the system. The aim of dependencies establishment step is to build a system-level control flow graph (SCFG) from the KDM representation. SCFG consists of collection of procedure control flow graph (CFG). A CFG is a graph containing nodes that represent statements and predicate expressions (KDM ActionElement), and directed edges between nodes representing the flow of control (KDM ControlFlow subclasses). To obtain system-level graph, CFGs are interconnected by traversing call dependencies. Call dependencies between runtime resources and source code are established considering the following:  Source code handling software platform events (e.g. periodical event, application starting/stopping);  Source code handling user interface events (e.g. form event or form control event);  Source code handling object instantiation or access events (e.g. create, update, delete);  Source code handling events produced due to access to particular software platform functionality (e.g. user login, user management event);  Source code handling events of workflow activities and transitions between them (e.g. on enter to activity or exit from it, on transition);  Source code accessing runtime resources that produce particular kind of event (source code that sends signal to runtime resource invokes execution of source code that handles the event of signal receive, e.g. setting form field value programmatically invokes execution of source code that handles OnSetFieldText event).

Fig. 3. A fragment of extended KDM representing conceptual model elements for definition business rules

Having established dependences and built a SCFG, source code analysis techniques may be applied to identify the business logic implementing source code and related resources, and to represent it within the KDM Conceptual model. The Conceptual model enables mapping of KDM compliant model to models compliant to other specifications. Currently, KDM provides “concept” classes – TermUnit, FactUnit and RuleUnit facilitating mapping to SBVR [9]. We slightly extend this representation (fig 3) in order to facilitate further mappings. A vocabulary containing a set of term units and fact units is derived by primarily considering the representation of structural elements of runtime resources. We first refer to the Data and UI models, because they contain elements that convey business terms explicitly and therefore may be merely understood by the maintainer. Thus, for each KDMEntity which is an instance of specific type of AbstractDataElement or AbstractUIElement a TermUnit is created and added to ConceptualModel. A reference to that entity is added to the collection of elements representing implementation of TermUnit (property “implementation”). The collection is further supplemented with references to elements that bind with UI elements (i.e. data definition fields upon which form and report fields are built). Then, the set of TermUnit elements is augmented with elements that correspond to instances of specific types of AbstractPlatformElement (extracted from the configuration files). To facilitate definition of business terms, we reference TermUnits with indices from the data base of software documentation. For each TermUnit, we construct several types of search queries: the first one limits the search scope to a title of structural elements of indexed documents (i.e. chapter, section, subsection); the second one limits search scope to a body of document’s structural element. Search results ranked by relevance are referenced with TermUnit element as source property containing a set of references for each matching index. Table 1. Fact Types derivation Fact Type

Derivation principles

1) Associative fact type fact type that has more than one role.

References to objects; e.g. lookup fields (foreign keys) within data structure definitions; object properties containing a collection of references to another objects; Primitive object properties or references to another objects; e.g. fields within data structure definitions; form fields; object properties;

1.1) is-property-of fact type fact type that involves two concepts, where the first concepts define essential quality of the second concept. 2) Partitive fact type fact type determining the whole-part relationship 3) Specialization fact type

Object composition; e.g. nested objects; groups of UI controls (framed, tabbed);

3.1 ) Categorization fact type fact type declaring concept dependence on certain categorization concept

Classifying properties; e.g. lookup table values that defines particular categories, such as a kind, type, etc.

Fact Type

Derivation principles

3.2) Contextualization fact type is-role-of fact type and is-facet-of fact type

Data type instantiation; e.g. object instantiation; object reference assignment to a certain variable; Object inheritances; e.g. class inheritances; properties that defines object dependence to certain general concept;

4) Assortment fact type fact type that is defined with respect to a given general concept and a given individual concept.

In order to derive FactUnits from the software system we refer to SBVR fact types categorization scheme that classifies facts based on their semantic nature. The SBVR distinguishes fact types into the following categories: associative fact type, partitive fact type, specialization fact types, and assortment fact types. The table 1 presents general principles for derivation of certain types of FactUnits. Having derived initial set of candidates to business terms and related facts (TermUnits and FactUnits), the vocabulary is reviewed and validated to separate design specific TermUnits and FactUnits from the business specific ones. To identify how particular TermUnit is derived, static inter-procedural backward slicing [16] of the SCFG is performed with respect to slicing criterion CDer=, where DE is a set of KDM DataElements representing implementation of this TermUnit, and aeout is a kind of KDM ActionElement representing output of these data elements to user interface, database or other kind of data repository. A resulting slice SDer={ActionElement} containing a set of control and data dependent ActionElements is added to the property abstraction of derived RuleUnit. ScenarioUnits are derived to represent behavioral paths from platform or user interface events through the SCFG. They are computed by traversing SCFG along the control flow and call edges [6]. ActionElements representing conditional statements (i.e. if or switch statements) within those paths and their intermediate successors are considered as candidates to RuleUnits.

4

A Case Study

So far we have discussed on general principles of the approach for business logic abstraction from the representation of the existing software systems. In this section we will present a case study from our on-going research project, where we apply it to abstract business logic from the commercial-off-the-shelf (COTS) Enterprise Content Management (ECM) system. 4.1

An Overview of the Software System

The system is used for document management, records management, web content management, and collaboration in several governmental organizations of Lithuania. It is designed using multi-tiered Client/Server architecture (fig. 4). The software system serves as a platform for development of business specific solutions: it provides customization capabilities by enabling to define specific data objects (using data defini-

tions); create forms and reports; specify workflows that may be assigned to data objects; configure periodic server jobs. It also provides application programming interface (API) allowing automation of particular system events using a dialect of the Visual Basic for Application (VBA) language and integration with other software systems using the Component Object Model (COM) interface. Internally the system may be considered as a black-box – the logic behind the interface may be understood only from software technical documentation, or from experience gained by using API in the development of business specific solutions.

Fig. 4. Architecture of Enterprise Content Management System

In order to meet the requirements of the organization and the rules of management of electronic documents, it has been greatly customized (table 2 summarizes the results of our initial study on what artifacts contains software system). However, over the time, requirements change and the software system must be updated to reflect those changes. It leads to many undocumented modifications, and even worse, because the impact of these modifications to other parts of the system is very difficult to evaluate, they usually tend to be not fully tested. The maintenance cost exceeds its budged; therefore, the approach for automated system comprehension is essential to reduce the maintenance costs. Table 2. Overview of gathered initial information about the software system Component Client -is a desktop application that interacts over network with broker server using RPC; with external applications over COM API.

Broker Server -is a service component containing subcomponents

Resources

Description

Format

Qnt

Local configs Log files

Client specific configurations: defined actions, application window setting, etc. Client specific log information.

INI

200

CSV

200

Folders

User specific collection of documents.

XML

155

Application files Macros

Platform components

DLL, EXE, etc VBA

132

Visual Basic for Application script mod-

57

Component for interaction with clients (by distributing its resources to client applications), database management system (ODBC), web server (Gateway Server), and stores document objects in repository.

Resources

Defs Reports Forms Workflows

Config. files Log files Database Management System -used to store and retrieve document metadata, lookup table values, audit, and other system data.

DB Tables

DB Views

DB Procedures DB Triggers DB Functions

ScanStation -scans physical document, performs optical recognition, and transfers textual information to client.

Office Integration -transfers user filled document templates to client.

4.2

Lookup table data Log Templates Config. files Config. files Document templates

Description ules containing module level constants, external library declarations, procedure and function declarations (that handles certain resource event). Document, lookup and audit table definitions. Data representation and data source definitions that are based on one or more Defs. Definitions of forms for data editing or querying; are based on one or more Defs. Workflow definitions: activities, transitions, decision points; events corresponding to these workflow elements (on entry, on exit, etc) Platform specific configuration. Server log information. Database tables storing document metadata, lookup table values, audit values, or custom data. Stored SELECT statements that associate one or more tables to retrieve data from them, applies predicates for data filtering, calls functions for output data formatting. Stored procedures that are called from VB script over application provided interface to database connectivity component. Procedures that are executed in response to particular events on table; used for internal audit, temporary tables. User Defined Functions that process input data to produce required output, and are called from stored procedures or views. Information classifying data, such as document kind, type, status and etc. Database specific log. Template definitions for optical content recognition. Fields mapping definitions Configuration mapping Fields mapping definitions Word document templates containing fields mapped to document fields.

Format

Qnt (>40K LOC)

XML

187

XML

16

XML

83

XML

5

INI, XML CSV

3

T-SQL

80

T-SQL

92

T-SQL

30

T-SQL

10

T-SQL

26

Data

20

Data XML

2 23

INI, XML INI

2

doc, docx

3

1 20

The Strategy

Having identified the artifacts of the software system and its integration components, we defined the strategy for the knowledge extraction. Within the strategy it was identified that 8 custom parsers and 13 transformation rules sets must be developed in order to obtain the “as-is” KDM representation of the system. For the implementation

of these tools set, we chose the Eclipse platform, because of numerous reasons: it is an open-source; there is already developed the KDM framework for this platform; it supports model-driven engineering (i.e. model driven parser generators, meta-model based transformation tools, model graphic editors, etc.); and finally, it is extendable and allows additional tools integration. To process documentation of the software system and store it in full-text search database, we selected to use Apache Lucene and Solr technologies in combination with PDF and DOC(X) document parsers (all available documentation has been provided in these formats). We have not considered any specific kind of phrase comparison algorithm to obtain more results in search queries, because the number of documents was not so large (less than 30, including platform documentation, requirements specifications, manuals, etc.), and it was clear that most of identifiers obtained from definitions of platform resources would have identical terms used in the documentation. 4.3

Discovering Knowledge About the Software System

In order to evaluate the feasibility of our approach, we selected to discover artifacts of Client and Broker Server components. Their interaction with other components we decided to represent as external calls and leave the analysis of these components for the further research. By discovering analyzed components content, we obtained a set of KDM models representing architectural views on platform components, UI composition, automation, and data structure of the ECM system. Client.ini configuration file

Dialog Configuration

UI Resource Binding

A B Form received_doc_i definition

Event Handler Binding

Form Definition

C onCloseDialog subroutine definition at module FORMS (FORMS!onCloseDialog())

D Fig. 5. The main dialog of ECM and the form to fill document metadata (A); a snippet of configuration file that defines left side tray bar (B); a snippet of form definition (C); a fragment of code module containing subroutine that handles form event (D).

The picture above presents a fragment of ECM: the main application dialog and the form to fill metadata of receivable document (fig. 5, A). The structure of document object is defined within the data definition file that maps to database object (table or view). The configuration of trays for the particular client is predefined within client.ini configuration file (fig. 5, B). A configuration of tray is a key-value pair that reference to form definitions, which are opened due to action performed by a user of the system (drag-and-drop document object and a double click). The form definition (fig. 5, C) contains a description of UI elements within the form and references to the form or its controls event handling code procedures declared within macro modules (fig. 5, D). The left side of figure 6 shows the snippet of tree view of KDM representation discovered from the software resources and code. The right side represents it logical view with UML Class diagram (where stereotypes correspond to KDM element types).

Fig. 6. KDM representation of discovered system resources and its logical view within UML class diagram

4.4

Business logic abstraction

Our initial attempt to identify a vocabulary resulted in more than 2000 TermUnit elements and more than 4000 FactUnit elements. By analyzing bindings between elements of UI definitions and data definitions and variable assignments we have identified synonymous terms and reduce the number of the actual TermUnit elements to 884. From the data definitions we obtained about 321 structural RuleUnit elements (mandatory fields, unique indices, default values, etc.). The picture below presents the vocabulary entry for TermUnit Receivable Document.

Fig. 7. Vocabulary entry representation of TermUnit

Identification of scenario units was relatively straightforward, because almost each business use case could have been traced from the system actions (menu and trays functionality). We have sliced KDM representation and identified 32 scenario units in total, which actually corresponded to functionality of the system: document registration, preparation, forwarding, notifying, assigning tasks and execution control, etc. Then, we traversed obtained slices to identify implementation of RuleUnit elements that has affect to the flow of control within the scenario unit (ActionElement representing control statement that uses a subset of TermUnits in conditional predicate expression). The CodeModel we have analyzed contained many ActionElements representing nested if and select case statements (nesting level more than 10); however, after preliminary study we have found that a relatively large number of conditional if statements are platform specific or introduced for technical purposes, e.g. check object instantiation, check if document object is loaded or locked, etc. We have identified that at an average the number of such conditional elements for each scenario unit was up to 60%. By eliminating these elements we reduced the number of candidates to operational RuleUnits from more than 2000 to 800.

5

Conclusions and Further Research

In this paper we have presented the approach for business logic discovery from existing software systems. Within this process-centric approach we identified the activities for knowledge extraction and representation with KDM, and determined the steps for business logic separation from this representation. We have presented a case study from our on-going research project, and discussed on our initially obtained results. The vocabulary and related rules as well as business scenario units identified within this case study facilitate the maintenance of the software system providing traceability to the business logic implementation parts representing KDM elements. However, KDM representation is only intermediate format valuable for automated analysis. Seeking to produce more comprehensive representations of views of particular software system aspect, the conversion to static and dynamic UML models must

be considered in the further research. In order to be able to validate discovered knowledge with stakeholders, candidates to business rules should be transformed to SBVR templates, decision tables or trees, or other acceptable format.

References 1. 2. 3. 4.

5. 6. 7.

8. 9. 10.

11. 12. 13.

14.

15. 16. 17. 18.

19. 20.

IEEE Standard Glossary of Software Engineering Terminology, IEEE Std 610.12-1990, IEEE, 1990 Bergeretti, J.-F. & Carré, B.A. Information-flow and data-flow analysis of whileprograms, ACM Trans. Program. Lang. Syst., ACM, 1985, Vol. 7(1), pp. 37-61 Chiang, C.-C. Extracting business rules from legacy systems into reusable components, System of Systems Engineering, 2006 IEEE/SMC International Conference on 2006 Gang, X. Business Rule Extraction from Legacy System Using Dependence-Cache Slicing, Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering, IEEE Computer Society, 2009, pp. 4214-4218 Huang, H. Business Rule Extraction from Legacy Code, Proceedings of the 20th Conference on Computer Software and Applications, IEEE Computer Society, 1996, pp. 162-168 Khedker, U., Sanyal, A. & Karkare, B. Data Flow Analysis: Theory and Practice, CRC Press, Inc., 2009 Nečaský, M. Reverse engineering of XML schemas to conceptual diagrams, Proceedings of the Sixth Asia-Pacific Conference on Conceptual Modeling - Volume 96, Australian Computer Society, Inc., 2009, pp. 117-128 OMG. Architecture Driven Modernization Task Force, URL http://adm.omg.org, 2012 OMG. Knowledge Discovery Metamodel Specification Version 1.3, http://www.omg.org/spec/KDM/1.3/PDF/, 2011 OMG. Architecture driven modernization standards roadmap, http://adm.omg.org/ADMTF Roadmap.pdf , 2009 OMG. Abstract Syntax Tree Metamodel v1.0, http://www.omg.org/spec/ASTM/1.0/, 2009 OMG. Semantics of Business Vocabulary And Business Rules v1.0, http://www.omg.org/spec/SBVR/1.0/, 2008 Putrycz, E. & Kark, A.W. Connecting Legacy Code, Business Rules and Documentation. Proceedings of the International Symposium on Rule Representation, Interchange and Reasoning on the Web Springer-Verlag, 2008, pp. 17-30 Seacord, R.C., Plakosh, D. & Lewis, G.A. Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices. Addison-Wesley Longman Publishing Co., Inc., 2003 Standish, T.A. An Essay on Software Reuse. IEEE Trans. Software Eng., 1984, pp. 494497 Tip, F. A Survey of Program Slicing Techniques. Journal of Programming Languages, 1995, Vol. 3, pp. 121-189 Ulrich, W.M. & Newcomb, P. Information Systems Transformation: Architecture-Driven Modernization Case Studies Morgan Kaufmann Publishers Inc., 2010 Wang, X., Sun, J., Yang, X., He, Z. & Maddineni, S. Business Rules Extraction from Large Legacy Systems. Proceedings of the Eighth Euromicro Working Conference on Software Maintenance and Reengineering (CSMR'04). IEEE Computer Society, 2004, pp. 249-254 Canfora, G. & Cimitile, A. Software Maintenance In Proc. 7th Int. Conf. Software Engineering and Knowledge Engineering 1995, pp. 478-486 Weiser, M. Program slicing. IEEE Transactions on Software Engineering SE-10 (4), 1984, pp. 352-357.

21. The Standish Group, The Internet Goes Business Critical, 2009.