Testing Relational Database Schemas with ...

2 downloads 499 Views 139KB Size Report
incorrect or absent definitions of constraints in the data and can ... integrity, data validity and data recovery [11]. ..... personal, academic and professional data.
Testing Relational Database Schemas with Alternative Instance Analysis Maria Claudia F. P. Emer

Silvia Regina Vergilio

Mario Jino

State University of Campinas [email protected]

Federal University of Paraná [email protected]

State University of Campinas [email protected]

Abstract Databases are employed to store a great amount of data extremely important for business operations. Database test evaluates if a database meets its requirements. In this context, to test the database schemas is a fundamental activity to increase the confidence on the integrity of the data being manipulated by the database applications. There are few works that address this subject. Approaches to test database schemas can help to find faults related to the incorrect or absent definitions of constraints in the data and can contribute to avoid failures in the applications. With this in mind, in this paper we present a fault-based testing approach for testing database schemas and introduce testing criteria based on the classification of the most common types of fault in database schemas. The approach proposes the use of database instances and queries to test the schemas. The instances are generated according to patterns defined for each fault class and represent possible faults. Preliminary results are discussed.

1. Introduction Testing activity in software development contribute to generate reliable products and to evaluate software quality. Test techniques and criteria have been proposed to guide the test process, which may involve many correctness aspects in software applications. For instance, database testing aims to evaluate how well a database meets its requirements and can determine query response time, data integrity, data validity and data recovery [11]. Database systems testing involves many correctness aspects, such as [4]: application behavior with respect to the specification; how well the database schema models the real world; accuracy of data in the database; safety and privacy; and correct execution of insertions, updates and deletions of the data and information from data schemas. Schemas are frequently used in data base applications to define the logical structure and the relationships among data. Schemas are designed according to the data specification. Testing of database schemas is important and can contribute to increase the confidence in the integrity and accuracy of the data being manipulated. If some incorrect data is validated by a schema and passed to the application, this can cause a failure. In spite of its

importance, the testing of schemas has not been a popular subject in the area. Most works address data generation, application and database design testing [2, 3, 4, 5, 7, 12, 15, 17, 18]. In previous works we introduced an approach to test schemas [10] and explored its use mainly in the context of XML [8, 9]. The promising results motivated us to investigate the use of such approach in the context of other largely used kind of schemas such as relational database schemas. We report a result of that investigation. We present a fault-based testing approach, called Alternative Data Instance Analysis (ADIA) which considers generic fault classes and that can be applied to any schema that can be represented by a model based on MOF specification [14]. In this paper we address relational database and the entity-relationship model [6], a model very popular and commonly employed in database applications. The idea of the proposed approach is to reveal faults by executing queries on database instances created by simple modifications and that describe specific fault classes. A classification of typical faults is presented. In this paper we introduce testing criteria based on fault classes. A support tool was implemented and experimental results to evaluate the approach in the context of relational database schemas are presented. The remainder of this paper is organized as follows. Section 2 presents the fault-based testing approach. Section 3 presents the results of case study. Section 4 describes related work. Section 5 contains the conclusion and directions for future work.

2. Alternative Data Instance Analysis This section introduces the approach, named Alternative Data Instance Analysis (ADIA). It has the goal of revealing constraint faults related to the definition of entities, attributes, relationships and semantics in a database schema. These faults can be related to issues such as incorrect or absent restriction definitions for those schema elements. The idea is to evaluate these issues to avoid faults in the database schema that can affect data integrity in the database and cause failures in the database application. ADIA is fault-based and includes database instance alternatives and queries to reveal the faults. The data models that represent schemas, fault classes identified in

the database schema, schema representation and test process are presented next.

2.1. Data Model The data model is represented by a metamodel M defined on the basis of MOF (Meta-Object Facility) Specification [14]. Fig. 1 illustrates the metamodel M, described in UML notation [1], consisting of the following classes: Element (elements or entities); Attribute (elements’ properties) and Constraint (restrictions associated to elements and attributes).

Figure 1. Metamodel M

To illustrate the data model, consider a fragment of an ER diagram (Fig. 2). The diagram describes data on students from a university: course and type of course. Fig. 3 shows the correspondent class diagram based on M.

Figure 2. Fragment of an ER diagram

Figure 3. Data Schema for Course Data concerning the Metamodel M

Table 1. Fault classes Fault Class Description Group 1 (G1) - Domain Domain faults related to domain Constraints definition of the element or attribute values IDT - Incorrect Data Type incorrect definition of data type IV - Incorrect Value incorrect definition of default value IEV - Incorrect Enumerated incorrect definition of the list of Value acceptable values IMMV - Incorrect incorrect definition of upper and Maximum and Minimum lower bound values Values IL - Incorrect Length incorrect definition of number of characters allowed for values ID - Incorrect Digits incorrect definition of total amount of digits or decimal digits for numeric values; IP - Incorrect Pattern incorrect definition of sequence of characters or numbers allowed for values; IWSC – Incorrect White incorrect definition of how white Space Characters space characters must be treated Group 2 (G2) – Definition faults related to the attribute constraints definition concerning data integrity IU - Incorrect Use the attribute is defined incorrectly as optional or obligatory IN – Incorrect Uniqueness the attribute is defined incorrectly as unique IK - Incorrect Key the attribute is defined incorrectly as primary key or foreign key Group 3 (G3) faults related to relationship Relationship Constraints definition among elements IO - Incorrect Occurrence incorrect definition of number of times a same element may occur IR - Incorrect Order incorrect definition of the order elements may appear IC - Incorrect Association incorrect definition of an association: cardinality, generalization/specialization, aggregation, associative element Group 4 (G4) - Semantic faults related to constraints Constraints definition in relation to data content expressed by business rules IO - Incorrect Condition incorrect definition of predicate expressed for a condition that must be satisfied by attributes

2.2. Fault Classes

2.3. Formal Representation

ADIA is fault-based; hence, common faults introduced during the conceptual design of a schema are organized into four constraints groups: domain, definition, relationship and semantic. These faults are identified through analyses of data schemas. Table 1 presents the fault classes.

A formal representation is used to algorithmically process data schemas, by providing the identification of the entities, attributes, constraints and of the association among them. A data schema S is denoted by S = ( E , A, R, P), where: • E is a finite set of entities; • A is a finite set of attributes;

R is a finite set of constraints concerning domain, definition, relationship and semantics associated to the elements and attributes; • P is a finite set of association rules among elements, attributes and constraints. Consider U = E ∪ A . The association rules are represented by: o p ( x, y ) | x, y ∈ E ∧ x ≠ y; o p ( x, r ) | x ∈ E ∧ r ∈ R ; o p ( x, r , SU ) | x ∈ U ∧ r ∈ R ∧ SU ={u1, u2 , ..., um } ⊂ U , ∀ ui ≠ x, 1 ≤ i ≤ m, m ≥1, where m is the number of elements and attributes in SU . Example 1 shows the notation used to describe a data schema.



Example 1: Formal representation for ER diagram of Fig. 2 S = ( E , A, R, P ) E ={course, course _ type} A ={ID _ course, name, duration, ID _ course _ type, description} R ={type, key, use, length, uniqueness, association} P = { p1 (course, ID _ course), p2 (course, name), p3 (course, duration), p4 (course, association, course _ type), p5 (course, key, course _ type), p6 ( ID _ course, type), p7 ( ID _ course, key ), p8 (name, type), p9 ( name, use), p10 ( name, length), p11 (duration, type), p12 (duration, use), p13 (course _ type, ID _ course _ type), p14 (course _ type, description), p15 (course _ type, associative, course), p16 ( ID _ course _ type, type), p17 ( ID _ course _ type, key ), p18 ( description, type), p19 (description, use), p20 ( description, length), p21( description, uniqueness)}

2.4. Testing Process The schema under test and the correspondent database instance (original database instance) are provided by the tester. To illustrate, these entries to the testing process are presented in Examples 2 and 3. Example 2 shows a fragment of the schema written in the DDL script related to the ER diagram of Fig 2. Example 3 presents a sample of the original database instance associated to the ER schema of Example 1. Example 2. DDL related to the ER diagram of Fig 2. CREATE TABLE course ( id_course int IDENTITY, name varchar(40) NOT NULL, id_course_type int NOT NULL, duration int NOT NULL ) go ALTER TABLE course ADD PRIMARY KEY NONCLUSTERED (id_course) go

Example 3. A sample of the original database instance Course Id_course Name id_course_type duration 1 Computer science 1 4 2 Computation 1 5 Engineering 3 Database 3 2 4 Images Processing 4 2

Initially, the representation S of the schema under test is built (Example 1). Based on S , schema elements (entities, attributes, relationships among entities and semantic constraints related to attributes) are associated to the fault classes. These associations are named fault associations. Table 2 presents some associations identified in S . Table 2. Fault associations identified in S Entity/Attribute Fault Class Course Incorrect Association (cardinality) Incorrect Key ID_course Incorrect Data Type name Incorrect Use Incorrect Data Type Incorrect Length

The tester can also identify other fault associations among schema elements and fault classes. This provides the detection of faults that could be not detected with the fault associations identified through representation S . Thus, the tester can determine fault associations to reveal faults related to absent constraint definitions. However, these faults are covered by the fault classes. Next, these fault associations are selected. The selected fault associations are used to guide the generation of the alternative database instances, indicating the schema element that should be modified in the original database instance and the fault class which defines the modification patterns that should be applied. The alternative database instances are generated through modifications on the original database instance. These single modifications are made by insertions and changes into records of the original database instance according to patterns defined for each fault class. These modifications are representative and enough to detect the fault of each fault class. For example, to reveal a fault related to G1-IL (Group 1 - Incorrect Length) in the attribute a∈ A of the database schema S , a record should be altered, for example, by content with a number x of characters out of the bound allowed for attribute a in an alternative database instance generated. Example 4 illustrates an alternative database instance for table course of the original database instance presented in Example 3. In Example 4, the attribute name is associated to fault class G1-IL (Group 1 Incorrect Length).

Example 4. Alternative database instance Course Id_course Name id_course_type Duration 1 Computer science 1 4 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx 2 Computation 1 5 Engineering 3 Database 3 2 4 Images Processing 4 2

The records of the generated alternative database instances are separated into valid or invalid with respect to the schema under test; that is, a record generated by modification patterns may not be in conformity with the schema under test. An invalid record is not accepted in the alternative instance; thus, this alternative instance is not queried, but it is part of the test result. The selected fault associations also guide the generation of the queries. The queries are automatically generated in SQL statements according to the query patterns associated to each fault class which can be detected in the schema. Queries for each selected fault association are generated and executed for each correspondent valid alternative database instance. A test data is formed by a valid alternative database instance and a query to this alternative instance. Expected result for the test data is obtained from the database specification. The tester compares test results with the specification to discover faults in the database schema.

2.5. Testing Criteria Testing criteria are used to select fault associations to be exercised in the testing process. These criteria are based on fault classes. They request that the fault associations are exercised through query execution on the alternative data instances related to these associations. Consider z an element or attribute. These criteria are: All constraints – all fault associations with regard to z related to fault classes of the groups of domain, definition, relationship and semantics constraints must be exercised; All domain constraints - all fault associations with regard to z related to fault classes of the group of domain constraints must be exercised; All definition constraints - all fault associations with regard to z related to fault classes of the group of definition constraints must be exercised; All relationship constraints - all fault associations with regard to z related to fault classes of the group of relationship constraints must be exercised; All semantic constraints - all fault associations with regard to z related to fault classes of the group of semantic constraints must be exercised; All constraints groups – at least one fault association with regard to z related to each group of domain, definition, relationship and semantics constraints must be exercised, if such association exists.

3. Case Study Our case study used a database application developed by graduate students, containing data on university students: personal, academic and professional data. Testing process was performed during the development of the database. ER schema for the application has 19 relations and 20 relationships. Relational database application was implemented using PostGreSQL. The testing process was performed by XTool [13], a tool that supports the testing approach for database schemas described in the last section. This tool was developed in Java and tests relational database schemas by using JDBC (Java Database Connectivity) to manipulate and query the information in PostGreSQL database. XTool found 19 entities and 73 attributes in S . These entities and attributes were automatically associated to fault classes in the schema under test. Moreover, the tester found other fault associations among schema elements and fault classes. The testing criterion used was all constraints. By using the selected fault associations, XTool generates alternative database instances and SQL queries automatically. The total number of fault associations (identified by XTool and the tester) was 297 and 1240 queries were executed. Table 3 presents an example with fault associations, the number of records modified in the original database instance to generate the alternative instances and the number of generated queries for entity “Academic_records” and some of its attributes. Table 3. Fault associations found in S , number of modified records and generated queries for entity Academic_records Fault Associations Number Number of of generated Entity/Attribute Fault Class records queries Academic_ Incorrect 41 12 records Association (cardinality) Incorrect 6 1 Association (Associative Element) Incorrect Key 1 1 Incorrect 1 1 Uniqueness Begin_date Incorrect Data 7 1 Type Incorrect Use 4 1 Incorrect 48 25 Condition

The tester had the task of comparing the queries results with the expected ones, according to the data specification of the database. Records of invalid alternative database instances generated were also considered test results and used during the testing analysis. Table 4 shows the number of faults revealed with the test data generated by XTool.

Table 4. Faults revealed Number of revealed Incorrect Data Type Incorrect Length Incorrect Digits Incorrect Enumerated Value Incorrect Use Incorrect Cardinality Total Fault classes

faults 5 3 4 1 13 1 27

The faults revealed in the test process are related to: the incorrect definition of the data type, length, and digits constraints; the absence of constraints for use and enumerated values for attributes; and, incorrect cardinality for a relationship. Faults of absence of constraints were revealed with the fault associations found by the tester. Faults were removed and the use of ADIA contributed to improve the quality of the tested application. It is important to remark that the original database instances are not replicated by XTool to generate the alternative instances. The original instance is updated with a single modification pattern, queried by the generated query and the modification is undone.

4. Related Work As mentioned previously, many works in the literature address the testing of database applications [2, 4, 5, 7, 12, 17, 18]. Some of them [3, 15] propose the use schema information to test the application. Robbert and Maryanski [15] use information obtained from the database schema to generate a test plan for the database application, indicating points that should be verified in the test. Chan et al. [3] propose a fault-based approach to test SQL statements of database applications using information captured from the conceptual data model to generate SQL statement mutants. They use schema information to test the application or SQL statements, but they do not address schema testing. The abovementioned works do not have the goal of validate schemas that is the focus of the present paper. In this way, the introduced approach has a different objective and contributes to increase the reliability of the data stored in the database.

5. Conclusions The fault-based approach ADIA for database schemas and testing criteria introduced in this paper were derived from our previous work [9, 10]. The main goal of this testing approach is to reveal faults in the database schema to ensure the quality of the data stored in the database and, consequently, to contribute to increase the reliability in the database application. Data integrity in databases is fundamental to avoid incorrect data processing resulting in failures in the database application.

ADIA contributes by: presenting a metamodel and a formal representation for database schemas; defining fault classes for database schemas based on the entityrelationship model; introducing testing criteria based on fault associations, generating alternative database instances based on fault classes; and using queries to reveal faults. XTool was implemented to support ADIA and it does not need the database application to test the database schema. XTool needs only the schema to be tested and the original database instance. In addition to that, the alternative database instances automatically generated by XTool could be used to test the applications that access the database. The case study presented in this paper, performed by using XTool, shows that ADIA is effective to reveal faults covered by the fault classes in database schema and that the largest cost is related to the analysis of the results by the tester which compares the test results with the expected ones from data specification. In this work, ADIA was used to reveal faults in schema of relational database model. However, it can also be used to detect faults in schemas of other database models, for instance, XML database. Other experiments are necessary to better evaluate the effectiveness of the testing approach and criteria for detection of faults, in the context of other database models. We also intend to use this testing approach to reveal faults in more complex integrity constraints or faults involving several relations.

6. Acknowledgments We acknowledge the partial financial support from the Brazilian Research Agency (CNPq).

References [1] BOOCH, G.; RUMBAUGH, J.; JACOBSON, I. The Unified Modeling Language User Guide. AddisonWesley, 1999. [2] CHAN, M.; CHEUNG, S.. Testing Database Applications with SQL Semantics. In Proc. of the 2nd Intl. Symp. on Cooperative Database Systems for Advanced Applications, pp 364-375, March 1999. [3] CHAN, W. K.; CHEUNG, S.C.; TSE, T. H.. FaultBased Testing of Database Application Programs with Conceptual Data Model. In Proc. of the 5th Intl. Conference on Quality Software, pp 187-196, 2005. [4] CHAYS, David; DAN, Saikat; FRANKL, Phyllis G.; VOKOLOS, Filippos I.; WEYUKER, Elaine J.. A Framework for Testing Database Applications. In Proc. of the 2000 ACM SIGSOFT Intl. Symp. on Software Testing and Analysis, Vol. 25 Issue 5, August 2000. [5] CHAYS, D.; DENG, Y. Demonstration of AGENDA Tool Set for Testing Relational Database Applications. In Proc. of the 25th Intl. Software Engineering

Conference, 2003. IEEE Computer Society, pp 802 – 803, May 2003. [6] CHEN, P. P.. The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems, Vol. 1, No 1, pp 9-36, 1976. [7] DENG, Yuetang; FRANKL, Phyllis; CHAYS, David. Testing Database Transactions with AGENDA. In Proc. of the 27th Intl. Conference on Software engineering. ACM Press, May 2005. [8] EMER, M.C.F.P.; NAZAR, I. F.; VERGILIO, S.R.; JINO, M. Evaluating a Fault-Based Testing Approach for XML Schemas. In Proc. of the 8th IEEE LatinAmerican Test Workshop, March 2007. [9] EMER, M.C.F.P.; VERGILIO, S.R.; JINO, M.. A Testing Approach for XML Schemas. In Proc. of the 29th Annual Intl. Computer Software and Applications Conference, Vol. 2, pp 57 – 62, July 2005. [10] EMER, M.C.F.P.; VERGILIO, S.R.; JINO, M.. A Testing of Data Schemas. In Proc. of the 19th Intl. Conference on Software Engineering and Knowledge Engineering, July 2007. [11] FREEMAN, H.. Software Testing. IEEE Instrumentation & Measurement Magazine. Volume 5 Issue 3, pp. 48 – 50. September 2002. [12] KAPFHAMMER, Gregory M.; SOFFA, Mary Lou. A Family of Test Adequacy Criteria for Database-driven Applications. In Proc. of the 9th European Software Engineering Conference, held jointly with 11th ACM SIGSOFT Intl. Symp. on Foundations of Software Engineering, Vol. 28 Issue 5, September 2003.

[13] NAZAR, I. F. A Tool for Data Schemas Testing. Master thesis, Computer Science Department, Federal University of Paraná. March 2007 (In Portuguese). [14] OMG. Meta-Object Facility Core Specification Version 2.0. http://www.omg.org/ cgi-bin/doc?formal/ 2006- 01-01, January 2006. (accessed in September 2006). [15] ROBBERT, M. A.; MARYANSKI, F. J.. Automated Test Plan Generator for Database Application Systems. In Proc. of the ACM SIGSAMLL/PC Symp. on Small Systems, pp 100-106, 1991. [16] SILBERSCHATZ, A.; KORTH, H. F.; SUDARSHAN, S. Database System Concepts. 3rd ed., McGraw-Hill, 1998. [17] SUÁREZ-CABAL, M. J.; TUYA, J.. Using an SQL Coverage Measurement for Testing Database Applications. In Proc. of the 12th Intl. Symp. on the Foundations of Engineering, November 2004. [18] ZHANG, Jian; XU, Chen; CHEUNG, S.-C.. Automatic Generation of Database Instances for White-box Testing. In Proc. of the 25th Annual Intl. Computer Software and Applications Conference, pp 161 – 165, October 2001.