Untitled

2 downloads 0 Views 7MB Size Report
an automated test oracle framework that addresses these challenges. ...... employed GA with a Program Analyzer to propose an automated framework for.
AN AUTOMATED FRAMEWORK FOR SOFTWARE TEST ORACLE BASED ON MULTI-NETWORKS

SEYED REZA SHAHAMIRI

UNIVERSITI TEKNOLOGI MALAYSIA

BAHAGIAN A – Pengesahan Kerjasama* Adalah disahkan bahawa projek penyelidikan tesis ini telah dilaksanakan melalui kerjasama antara _______________________ dengan _______________________

Disahkan oleh: Tandatangan

:

Nama

:

Jawatan (Cop rasmi)

:

Tarikh :

* Jika penyediaan tesis/projek melibatkan kerjasama. BAHAGIAN B – Untuk Kegunaan Pejabat Sekolah Pengajian Siswazah Tesis ini telah diperiksa dan diakui oleh: Nama dan Alamat Pemeriksa Luar

: Prof. Madya Dr. Nor Adnan Yahya School of Information and Communication Technology, Albukhary International University, 05200 Alor Setar, Kedah, Malaysia

Nama dan Alamat Pemeriksa Dalam :

Assoc. Prof. Dr. Norazah Yusof Faculty of Computer Science & Information System, Universiti Teknologi Malaysia, UTM Skudai, 81310 Johor, Malaysia

Nama Penyelia lain (jika ada)

:

Disahkan oleh Timbalan Pendaftar di Sekolah Pengajian Siswazah: Tandatangan : Nama

:

Tarikh :

AN AUTOMATED FRAMEWORK FOR SOFTWARE TEST ORACLE BASED ON MULTI-NETWORKS

SEYED REZA SHAHAMIRI

A thesis submitted in fulfilment of the requirements for the award of the degree of Doctor of Philosophy (Computer Science)

Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia

MAY 2011

iii

To my beloved father and mother, who always support and encourage me in the good times as well as the bad times. To my wife, Safoura, for her patience, love and purity that lights up my life.

iv

ACKNOWLEDGEMENTS

I would like to take this opportunity to thank my main supervisor, Assoc. Prof. Dr. Wan Mohd Nasir Wan Kadir for his patience, constant supports, and invaluable guidance throughout this research work. Special thanks go to my cosupervisor, Assoc. Prof. Dr. Suhaimi bin Ibrahim for his encouragement, priceless advices, and inspiration throughout this research. Without his patience, I would never have completed this thesis.

A special thank and gratitude to my parents, for teaching me to love and respect others. My thanks also to my wife who sacrificed a lot, for me to be at the point I am now. My gratitude also goes to my father-in-law, my mother-in-law in memoriam, and my whole family for their true love and support. And with no doubt, the endless love of my family kept me on track and helped me survive this adventure.

v

ABSTRACT

One of the important issues in software testing is to provide an automated test oracle. Test oracles are reliable sources of how the software under test must operate. In the generation of an automated oracle, three challenges were identified i.e. output domain generation, input domain to output domain mapping, and a comparator to decide on the accuracy of the actual outputs. The aim of this research is to propose an automated test oracle framework that addresses these challenges. In particular, I/O Relationship Analysis is proposed to generate the output domain automatically, and Multi-Networks Oracles based on Artificial Neural Networks are introduced to handle the mapping challenge. The last challenge is addressed using an automated comparator that adjusts the oracle precision by defining the comparison tolerance. The proposed framework was evaluated using two case studies. The quality of the proposed oracle was measured by assessing its accuracy, precision, misclassification error, practicality and usability using mutation testing. In addition, Single-Network Oracles were also provided in order to highlight the superiority of the proposed Multi-Networks Oracle.

Similarly, a fully automated test driver is provided to

execute and evaluate the test cases using the proposed oracle model. Finally, a comparative study between the prominent oracles and the proposed one is provided based on how they solve the oracle challenges and the degree of automation they provide. The results of the study show the proposed frameworks may automate the oracle generation process up to 97.5% with accuracy up to 98.93%. Moreover, the quality of the Multi-Networks Oracle was higher than the Single-Network one in all of the conducted experiments.

vi

ABSTRAK

Salah satu daripada isu utama dalam pengujian perisian ialah untuk menyediakan peramal ujian berautomat. Peramal ujian adalah sumber yang boleh diharap dalam menentukan bagaimana suatu perisian seharusnya beroperasi. Dalam penjanaan peramal berautomat, tiga cabaran telah dikenalpasti iaitu penjanaan domain output, pemetaan domain input ke domain output, dan pembanding yang memutuskan tentang kejituan output sebenar.

Tujuan kajian ini adalah untuk

mencadangkan rangka kerja peramal ujian berautomat yang mengatasi semua cabaran tersebut. Secara khusus, Analisis Hubungan I/O digunakan untuk menjana domain output secara automatik, manakala Peramal Multi-Rangkaian berasaskan Rangkaian Neural Buatan diperkenalkan bagi menyelesaikan cabaran pemetaan. Cabaran ketiga diatasi dengan menggunakan pembanding berautomat yang melaraskan kepersisan peramal dengan mentakrifkan had kejituan perbandingan. Rangka kerja yang dicadangkan telah dinilai menggunakan dua kajian kes. Kualiti peramal yang dicadangkan telah diukur dengan menilai ketepatan, kejituan, ralat salah pengelasan, aspek praktikal dan kebolehgunaan menggunakan pengujian mutasi.

Di samping itu, Peramal Rangkaian-Tunggal juga disediakan bagi

membuktikan kelebihan Peramal Multi-Rangkaian yang telah dicadangkan. Demikian juga, pemacu ujian berautomat sepenuhnya disediakan bagi melaksanakan dan menilai kes ujian menggunakan model peramal yang dicadangkan. Akhir sekali, kajian perbandingan di antara peramal utama dan peramal yang dicadangkan disediakan berdasarkan bagaimana mereka menyelesaikan cabaran peramal dan tahap automasi yang disediakan.

Hasil kajian menunjukkan rangka kerja yang

dicadangkan boleh mengautomat proses penjanaan peramal sehingga 97.5% dengan ketepatan sehingga 98.93%.

Di samping itu, kualiti Peramal Multi-Rangkaian

didapati lebih tinggi daripada Peramal Rangkaian-Tunggal dalam semua ujikaji yang telah dijalankan.

vii

TABLE OF CONTENTS

CHAPTER

1

2

TITLE

PAGE

DECLARATION

ii

DEDICATION

iii

ACKNOWLEDGEMENTS

iv

ABSTRACT

v

ABSTRAK

vi

TABLE OF CONTENTS

vii

LIST OF TABLES

xii

LIST OF FIGURES

xiv

LIST OF APPENDICES

xvi

INTRODUCTION

1

1.1

Background of the Problem

1

1.1.1 Facts or Issues in Software Testing

1

1.1.2 Test Oracles

3

1.1.3 Test Oracle Challenges

5

1.2

Statement of the Problem

7

1.3

Objectives of the Study

8

1.4

Scope of the study

8

1.5

Significance of the Study

9

1.6

Glossary

11

1.7

Thesis Outline

14

LITERATURE REVIEW

16

2.1

Software Testing

16

2.2

Software Testing Activities

17

2.2.1 The First Activity: Modeling the Software Environment 18

viii 2.2.2 The Second Activity: Selecting Test Scenarios

18

2.2.3 The Third Activity: Running and Evaluating Test Scenarios 2.3

19

2.2.4 The Fourth Activity: Measuring the Testing Process

19

A Survey on Methods to Automate the Testing Activities

20

2.3.1 Automated Methods in Modeling the Software Environment 2.3.2 Automated methods in Selecting the Test Scenarios

21 24

2.3.3 Automated Methods in Running and Evaluating Test Scenarios 2.4

26

2.3.4 Automated Methods in Measuring the Testing Process

27

Test Oracles and Automation

31

2.4.1 Test Oracle and Automation Challenges

32

2.4.2 Prominent and State-Of-The-Art Automated Oracles

33

2.4.2.1 Decision Tables and Cause-Effect Graphs

34

2.4.2.2 Formal Oracles

35

2.4.2.3 N-Version Diverse Systems and M-Model Program Testing

36

2.4.2.4 IFN Regression Tester

36

2.4.2.5 AI Planner Test Oracles

37

2.4.2.6 ANN-based Test Oracles

39

2.4.2.7 Input/Output Analysis Based Automated Expected Output Generator 2.5

42

Artificial Neural Networks

46

2.5.1 Introduction

47

2.5.2 Artificial Neural Networks definition

49

2.5.2.1 Background

50

2.5.2.2 Linear neural model

50

2.5.3 Linear Neuron Parameters Determination Methods

51

2.5.3.1 Direct Determination

52

2.5.3.2 Repetitive Procedures (Gradient Descent)

52

2.5.4 Multi-layered Perceptron Networks

53

ix 2.5.4.1 Multilayer Perceptron Network Training Algorithm (Back-Propagation) 2.6 3

4

Summary

56 59

RESEARCH METHODOLOGY

62

3.1

Research Design

62

3.2

Operational Framework

66

3.3

Sampling

67

3.4

Research Instruments

68

3.5

Evaluation Criteria

69

3.6

Assumptions and Limitations

70

3.7

The Proposed Framework

71

TEST ORACLE MODELING

76

4.1

The Motivation

76

4.2

The Proposed Framework in Detail

77

4.2.1 Element 1: Training Data Generation (Applying I/O Relationship Analysis)

5

80

4.2.2 Element 2: Multi-Networks Oracle

83

4.2.3 Element 3: Test Case Verification

85

4.3

Evaluation Model

86

4.4

Data Analysis

89

4.5

Summary

89

DESIGN AND IMPLEMENTATION OF THE AUTOMATED AND INTELLIGENT ORACLE-BASED TESTING TOOL

91

5.1

The Design of the Oracle-Based Testing Tool

91

5.1.1 Define the I/O Equivalence Classes

92

5.1.2 Determine the I/O Relationships

94

5.1.3 Provide the Reduced Datasets

95

5.1.4 Generate the Complete Dataset (Training Samples) Automatically 5.2

96

The Implementation of the Oracle-Based Testing Tool

99

5.2.1 Define the Multi-Networks Oracle

99

5.2.2 Create the Multi-Network Oracle

102

x

6

5.2.3 The Automated Comparator

102

5.3

Deploying the Automated Oracle

103

5.4

Summary

105

EVALUATION

107

6.1

Experiment

107

6.1.1 The First Case Study Experiment

108

6.1.1.1 Define the I/O Equivalence Classes

110

6.1.1.2 Determine the I/O Relationships

110

6.1.1.3 Providing the Reduced Data Set

111

6.1.1.4 Generate the complete dataset

113

6.1.1.5 Defining the Oracle

113

6.1.1.6 Make the Oracles

115

6.1.2 The Second Case Study Experiment

6.2

118

6.1.2.1 Define the I/O equivalence classes

128

6.1.2.2 Determine the I/O Relationships

129

6.1.2.3 Provide the Reduced Data Sets

130

6.1.2.4 Generate the Complete Dataset

134

6.1.2.5 Define the Oracles

134

6.1.2.6 Make the Oracle

138

6.1.3 The Automated Test Driver

140

Experimental Results

144

6.2.1 Implementing the Evaluation Model

144

6.2.1.1 Mutation Testing the First Case Study

145

6.2.1.2 Mutation Testing the Second Case Study

145

6.2.1.3 Thresholds

148

6.2.2 Quantitative Results

150

6.2.2.1 The First Case Study

150

6.2.2.2 The Second Case Study

152

6.2.3 Precision and Accuracy Relationship Dissection

155

6.2.4 Discussion of the Results

156

6.2.5 The Comparative Study

157

xi 7

CONCLUSION

162

7.1

Research Summary and Achievements

162

7.2

Summary of the Contributions

168

7.3

Recommendations for Future Research

169

7.4

Summary

170

REFERENCES

172

Appendices A-G

181-227

xii

LIST OF TABLES

TABLE NO. 2.1

TITLE

PAGE

A comparative study on methods to automate software testing activities (excluding the third activity)

31

2.2

A decision table template for client page testing

35

2.3

Automated Test Oracles Comparative Study

45

3.1

Operational Framework

68

3.2

Data Normalization

72

5.1

The Input Domain Equivalence Classes (D(x))

93

5.2

Tred (W) (Reduced expected output W values)

95

5.3

Tred (Z) (Reduced expected output Z values)

96

5.4

The Complete I/O Dataset (Training Samples)

97

5.5

Normalized Input Values

101

6.1

The first case study Input Domain and D(X)

109

6.2

The first case study Output Domain

109

6.3

Tred (Output #1) (Reduced expected outputs for IsAllowedToRegsiter)

6.4

112

Tred (Output #2) (Reduced expected outputs for MaxAllowedCourses)

113

6.5

Tred (Output #3) (Reduced expected outputs for Discount)

113

6.6

Single-Network Oracle for the first case study

114

6.7

The first case study Multi-Networks Oracle training parameters and MSEs

114

6.8

The second case study D(X) (Input Values)

129

6.9

Tred (Output #1) (Reduced expected output values for Insurance extension allowance)

131

xiii 6.10

Tred (Output #2) (Reduced expected output values for Insurance elimination)

6.11

Tred (Output #3) (Reduced expected output values for the payment amount)

6.12

132

Tred (Output #4) (Reduced expected output values for the credit)

6.13

132

133

The second case study Multiple-Network Oracle training parameters and MSEs

135

6.14

Test Driver Usability Survey Questionary and Results

142

6.15

The comparison results categorization

145

6.16

The second case study mutants

147

6.17

The first case study Single-Network Oracle evaluation results

151

6.18

The first case study Multi-Networks Oracle evaluation results

151

6.19

The second case study Single-Network Oracle evaluation results 152

6.20

The second case study Multi-Networks Oracle evaluation results 154

6.21

The first case study oracles quality comparison

154

6.22

The second case study oracles quality comparison

155

6.23

A comparative study between the proposed approach and the existing prominent automated oracles

159

xiv

LIST OF FIGURES

FIGURE NO.

TITLE

PAGE

2.1

A classification of automated software testing method

21

2.2

Overview of the regression tester

24

2.3

Automated test generation and reduction using ANNs

26

2.4

Predicting software faults using software metrics

29

2.5

CBR two-group risk classifier

30

2.6

Using an automated test oracle

34

2.7

AI Planning based Test Oracle

38

2.8

Using IFN for running and evaluating test cases

38

2.9

Regression Test Single-Network Oracle

42

2.10

Human Neuron Structure

47

2.11

An Artificial Neuron Structure

48

2.12

Simple Artificial Neuron Structure with Bias

51

2.13

Multilayer Perceptron Network with one hidden layer

55

3.1

Research Design

64

3.2

Single-Network and Multi-Networks Structures

66

3.3

Overview of the Proposed Framework

74

4.1

The Overall Procedure of Developing and Evaluating the Proposed Oracle Model

79

4.2

Elements of the Proposed Oracle Model in details

82

4.3

A Single-Network Oracle

86

4.4

A Multi-Networks Oracle

86

4.5

The Evaluation Model

88

5.1

The process of applying I/O Relationship Analysis

92

5.2

Sample I /O Relationships within the SUT

94

5.3

Generating the Complete Dataset (i.e. the merging process)

98

xv 5.4

Define The ANNs Using NeuronDotNet Library

101

5.5

The ANN Training Code

102

5.6

The Automated Comparator Pseudo Code

103

5.7

Automated Test Tool Use Case Diagram

104

5.8

The Automated Test Driver Procedure

105

6.1

The First Case Study I/O Relationships

111

6.2

The first case study Single-Network Oracle error graph

116

6.3

The first case study Multi-Network Oracle error graph

117

6.4

The Second Case Study Database Tables

120

6.5

The Second Case Study Home Page

121

6.6

The InsuranceOperation Class Scheme

125

6.7

Saina Insurance Automobile Insurance Module Use Case Diagram

127

6.8

I/O relationships of the second the case study

130

6.9

Normalizing and preparing the training samples (Second Case Study)

6.10

137

The second case study Single-Network Oracle error graph (8*30*4)

138

6.11

The second case study Multi-Network Oracle error graphs

139

6.12

The Multi-Networks Oracle Producer

140

6.13

The automated Test Driver for the Second Case Study

143

6.14

An Automated Test Driver Report Sample

143

6.15

The Altered Comparator Pseudo Code

148

6.16

The Comparator for the Second Case Study

149

7.1

Quality comparison graphs of the First Case Study oracles

166

7.2

Quality comparison graphs of the Second Case Study oracles

167

xvi

LIST OF APPENDICES

APPENDIX

TITLE

A

InsuranceOperation.cs Class Listing

B

The Complete Reduced Sets for The Second

PAGE 181

Case Study

186

C

DataProvider.cs Class Listing

204

D

Automated Dataset Generation Script

207

E

The MutatedInsuranceOperation.cs Class

211

F

Automated Test Driver

216

G

Published Papers

226

CHAPTER 1

INTRODUCTION

This chapter provides an introduction to the research work reported in this thesis. It contains numerous important aspects related to the research work. First, it describes the background of the problem, followed by the problem statements and objectives of the research. Next, it describes the scope of the research work, and finally it explains the importance of the research.

1.1 Background of the Problem

Before investigating the problems in software test oracles, it is important to have an overview of the general aspects of software testing. This section begins with the overview of software testing and its activities. Next, test oracles as an element in software testing are described.

The description includes: the definition of test

oracles, the process of using them, and challenges to provide an automated test oracle. Each of them is described in the following sections.

1.1.1 Facts or Issues in Software Testing Since software applications are critical in modern life, it is very important to provide adequate quality and minimize the probability of faults in software products. In order to ensure the quality of a software product, testing and evaluation are

2 considered as one of the important activities in software engineering. There are many attempts trying to increase the quality of software products in software engineering literature.

Software Testing is used to improve software quality by finding errors and failures in software applications. Errors or faults are any repugnance between what the software expects to do and what the actual software behavior is [1]. Software testing is one of the popular research topics recently.

In order to detect errors, various levels of testing such as unit, integration, system and acceptance testing [2, 3], and test methods such as stress testing [4], load testing [5] and regression testing [6], are considered with different test strategies such as Black-Box and White-Box testing to design the test scenarios. Each of them focuses on different methods to identify software faults. For example, Black-Box testing investigates the software behavior accuracy without any consideration on how these results are generated. On the other hand, White-Box testing evaluates the internal structure of the Software Under Test (SUT) [1].

Performing an adequate testing process may effectively increase software quality. However, while software testing is an expensive process in terms of time, budget and resources, and the process must balance these factors [7], comprehensive testing may be very difficult in practice. Therefore, many software companies do not keep enough attention to adequately test their products; hence, their products become very risky to fail and loosing market [8].

Thus, it is very important to find approaches decreasing the testing cost while increasing the efficiency of the testing process. Test Automation is one of the main approaches being applied to facilitate the process. Previous researches have shown that automated and intelligent test process, or at least portion of it, can significantly decrease the testing costs [9-11].

In automated testing, an attempt is made to

transform the testing activities that are done manually into activities that perform automatically by software applications using intelligent techniques and algorithms like Artificial Intelligence and Statistical Methods [12, 13].

3 Sometimes human testers can influence the testing process negatively. They may not only increase the testing costs but also decrease the software quality [14]. To put it differently, since most of the testing activities include repetitive tasks and heavy calculations, human testers may influence on software testing quality and speed. Automation is one of the solutions to decrease the human impact on the testing process.

According to [14], software testing process can be divided into four activities including: 1. Modeling the software environment 2. Selecting test scenarios 3. Test cases execution and evaluation 4. Measuring the testing process The first activity simulates the interactions between the SUT and its related environment. The second activity deals with generating the test cases. The third activity executes the generated test cases and evaluates the results produced by the software. The last activity identifies when the testing may be stopped.

Automated test models try to perform above activities automatically. The next chapter explains these activities in detail and provides solutions to automate them.

1.1.2 Test Oracles Test Oracle is a mechanism to determine whether an application executed correctly. It is used in the third software testing activity (i.e. test case execution and evaluation). Test oracle is considered as a reliable source of how the SUT must operate [14]. It is also expected to provide correct result(s) for any input(s) specified in the software specifications, and a comparator to verify the actual behavior. Automated test oracles are helpful in providing an adequate automated testing framework.

4 After test cases are executed and results of the testing are generated, it is necessary to decide whether the results are valid in order to determine the correctness of the software behavior. To verify the behavior of the SUT, correct results need to be compared with the results generated by the software. The results produced by the SUT that need to be verified are called actual outputs, and the correct results that are used to evaluate actual outputs are called expected outputs. Test oracles are used as a complete and reliable source of expected results and a tool to verify actual results correctness. Usually, the verifier makes a comparison between actual and expected outputs.

In addition, according to [15], oracle information and oracle procedure are building blocks of each test oracle. The former is a source of expected results and the later is the comparator. Thus, the activities to provide an oracle and verify test cases could be as follows [16]:

1. Generate expected outputs 2. Execute the test cases 3. Map the input domain to the expected outputs and fetch the corresponding output for the executed test case 4. Compare expected and actual outputs 5. Decide whether there is a fault or not Among these activities, an oracle may deal with all of them instead of the test case execution activity. In particular, oracles are not responsible to execute the test cases but they are employed to verify the execution results.

A non-automated test oracle may be the program specifications [17], domain expert knowledge or the developers information of how the software must operate [18]. Usually, human oracles are the source to determine the expected behavior of the software. Nevertheless, considering the size and complexity of the modern software, it may be very expensive, unreliable and difficult to use only human oracles. Thus, it is extremely important for software engineering community to find automated methods that enable the test oracles being provided automatically.

5 1.1.3 Test Oracle Challenges Automated oracle models attempt to automate the related activities as much as possible. Nonetheless, there are some challenges regarding the oracle related activities automation [16, 19, 20]. The first challenge deals with the first activity, which is how to provide the output domain automatically. It could be difficult and expensive to provide the expected outputs manually. In general, expected outputs are manually generated based on software specifications, domain specialist information and programmers knowledge of how the software should operate [21]. An automated oracle needs automatic output domain generation.

The next challenge is to map the input domain to the output domain automatically, i.e. the third activity in the oracle procedure. A test oracle must provide expected results for any input combinations mentioned by the software specifications. Moreover, it is impossible to provide an automated oracle without automated mapping.

The final challenge is the automated comparator that deals with the two last activities.

Sometimes it is not sufficient to perform a direct point-to-point

comparison, and the comparator needs to consider some tolerance when comparing the actual and the expected results. As an illustration, it is possible that the actual and expected outputs are different slightly but they still can be considered the same because the SUT may not need ultimate precision. Particularly, some tolerance may be acceptable during the comparison.

Several studies have been conducted to provide automated and intelligent oracles, aiming to increase the quality of software products while decreasing the testing costs. As an illustration, some studies applied Artificial Neural Networks (ANNs) to simulate the SUT behaviors and used them as oracles [18, 22, 23]. Some other studies considered Info-Fuzzy Networks (IFNs) [24, 25] to replace ANNs. Moreover, Decision Tables [26] were studied as well in order to map the input domain to the output domain. They are explained in the next chapter in detail.

6 Previous studies proved that ANNs are capable of learning and modeling the functional software behavior from pairs (i.e. Training Samples), and showed that they are a promising approach to automate the testing process [22, 2729]. Several attempts have been made to provide automated test oracles using one ANN and proved the productivity of neural networks as test oracles [18-20, 22, 23, 29, 30].

An ANN-Based oracle that is comprised of only one ANN may not be

reliable in case the SUT is complex. Apart from that, we need automated methods to create the output domain and the required data to produce the ANN-based oracle itself. Therefore, previous ANN-based oracles have two issues. First, they did not discuss how to provide the training samples and output domain independent of the testing methodology being applied hence the framework they proposed is not complete.

In particular, previous ANN-based oracles cannot address the first

challenge, i.e. output domain generation instead of regression testing [23]. Second, they may fail to provide a high quality oracle when the complexity of the SUT is increased or the size of the I/O domain is huge because they only applied one ANN to approximate the SUT.

In this thesis any type of ANN-Based Oracles that uses only one ANN to model the SUT and make the oracle, regardless of the ANN type, is called a SingleNetwork Oracle [20].

Note that an ANN may be comprised of one or many

connected layers, in which each layer is composed of one or many neurons [87, 88]. In particular, the one ANN that is applied to make the oracle can be any type of neural networks with any structure such as different number and level of layers, neurons, different learning algorithms, activation function and so on, but the entire structure should act as a single, standalone neural network. However, most of the previous studies considered Multi-Layer Perceptron networks with Sigmoid activation function. To put it differently, they used a three-layer neural network but all of the layers form a single ANN.

Providing an accurate ANN-based oracle needs necessary training samples [18, 23].

They are inputs to the software and corresponding expected outputs.

Normally large commercial software applications have a very large input and output domain, and traditionally, they are generated by human hands that cost a great deal.

7 Therefore, testers need automated approaches to prepare the expected outputs and training samples in order to model the software behavior and provide an automated ANN based test oracle.

1.2 Statement of the Problem

Available oracles have some issues.

For example, they may be very

expensive, unreliable, test strategy dependent, or un-automated. The aim of this research is to investigate a framework based on I/O Relationship Analysis, MultiNetworks ANN Based Oracles and an automated comparator in order to develop an automated test oracle that may address the oracle challenges with minimal human effort. It is expected to decrease the testing cost while increasing the software quality.

Note that the proposed framework cannot automate the entire oracle

production process because it still needs some manual activities to provide the necessary data and set the environment. However, after the oracle is produced, it can be used automatically. The general research question this research tries to answer is:

How is it possible to develop an automated test oracle framework in order to address the oracle challenges automatically with adequate quality and cost? The sub-questions of the main research question are as follows.

1. How software testing is performed? 2. What are software testing activities? 3. What is a test oracle? 4. How test oracles are applied in software testing? 5. What are the challenges to provide an automated test oracle (as explained in section 1.1.3)? 6. How to produce automated oracle with high accuracy in order to overcome the identified challenges and to support software testing process?

8 7. How to evaluate the proposed oracle?

1.3 Objectives of the Study

Based on the problem statements mentioned above, this research encompasses a set of objectives that is associated with the milestones of the research process. The research objectives are mentioned below.

1. To investigate and determine the challenges in the existing software test oracles. 2. To design and propose an automated framework for oracle testing that addresses the above challenges and considers I/O Relationship Analysis, Multi-Networks Oracle and an automated comparator. 3. To demonstrate and measure the effectiveness of the proposed approach by the application of the developed tools in the selected case studies, and the comparison with other existing prominent oracles.

1.4 Scope of the study

Scope of the study includes methods that address the identified oracle challenges. Based on the exhaustive literature review, three challenges are identified to provide an automated oracle. The first challenge is output domain generation, in which the proposed approach uses I/O Relationship Analysis to address it. The second challenge is I/O mapping, which is handled by Multi-Networks ANN Based Oracles.

The comparison challenge, i.e. the last challenge, is overcome by an

automated comparator.

The proposed approach can be used to automate the evaluation phase of the third software testing activity mentioned before. In particular, it can be employed as

9 an automated test oracle in order to evaluate the functional behavior of the SUT. It may also decrease the testing cost significantly.

Finally, this research provides an automated test oracle without considering the development environment and the technology and/or the programming language being used developing the software. To put it differently, the proposed approach is platform independent and may be used to test software functionalities that are implemented by any programming languages.

1.5 Significance of the Study

As explained in section 1.2, previous studies on automated oracles failed to provide a reliable model that can address all of the challenges without some limitations of choosing the test methods. In particular, they are dependent to the testing methods. For example, there are fully automated oracles to be employed in regression testing but they could not be applied for fresh testing. Furthermore, other oracle models are either unreliable or very expensive.

Among several automated oracles, it seems that ANN-based oracles are more acceptable because they are easy to be used and can be cheap in case the required data to make them are already provided; in addition, their productivity as test oracle has been shown as well [18-20, 22, 23, 29, 30]. Although all of the studies on ANNbased oracles highlight the significance of them, none of them provides a complete automated framework because they did not consider the first challenge, the outputdomain generation. They assumed the expected results were preexisted hence they did not discuss how they can be produced to make the oracle. In particular, they only tried to address the second oracle challenge, i.e. the mapping.

Moreover, an oracle consisted of one ANN may not be reliable when the complexity of the SUT is increased because they require larger training samples that may make the ANN learning process complicated [94, 97]. A tiny ANN error could

10 increase the oracle misclassification error significantly in large software applications. Finally, ANN-based oracles may not provide the very same output vector as the expected results. In particular, it is possible that there is a minuscule difference between the expected outputs generated by the ANN-based oracles and the correct ones. Any direct comparison may classify these types of results as faulty where they can be perceived as correct because the SUT does not required ultimate precision.

Although there are some automated oracles are available, they have some drawbacks and limitations.

They are unreliable, expensive, or test strategy

dependent [23, 18]. Therefore, more studies need to be conducted to address not only the oracle challenges providing an automated oracle but also to decrease its cost and increase the oracle quality. In this research, an automated test oracle model is proposed to address all of the oracle challenges. I/O Relationship Analysis is used to generate the output domain automatically and Multi-Networks Oracles are introduced to handle the mapping challenge. The last challenge is addressed using an automated comparator. Two case studies were applied to evaluate the proposed approach; in particular, the quality of the proposed oracle was measured by assessing its accuracy, precision, misclassification error and practicality while testing the case studies. Mutation testing was considered to provide the evaluation framework by implementing two different versions of the case studies: a Golden Version, which is a fault free implementation of the SUT to produce correct expected results, and a Mutated Version, which was injected with some faults.

The effectiveness of

mutation testing was studied before [31-34]. The cost of providing the proposed oracle is low considering the existing tools to implement I/O Relationship Analysis and ANNs. Moreover, a comparative study between the existing oracles and the proposed one is provided as well, based on how they solve the oracle challenges and the degree of automation they provide.

The major differences between this study and previous ones can be summarized as follows: •

The oracle process and its activities are explained in detail.



The oracle challenges and possible solutions are provided.

11 •

The proposed approach provides an automated framework to create test oracles taking the advantages of two different automatic techniques, ANNs and I/O Relationship Analysis.



Multi-Networks Oracles are introduced to model the SUT behavior with better accuracy than the Single-Network Oracles.



It addresses the oracle challenges.



It can be applied to test the SUT without considering the testing methods.



It is platform independent.



To evaluate the proposed approach, two industry-strength case studies were applied.



A comparative study between the existing oracles and the proposed one is provided.

The results of this research could be useful to decrease the testing cost and time. To put it differently, automated testing has several advantages. First, testing time could be significantly reduced because computers are faster than humans are in order to perform repetitive tasks. Second, human resources may be reduced during testing. Third, more aspects of the software under test may be tested. Finally, by reducing human impact on the process, it is possible to prevent intentional or unintentional human faults. Consequently, testing cost may be reduced while testing quality can be improved and the software product becomes more reliable.

1.6 Glossary

This section explains some of the terms that used in this research. However, detailed explanation is provided in the following chapters. •

Automatic Testing: Any transformation in testing process, which performs by human, to be performed by computer applications or other automated methods.

12 •

Test Oracle: A test oracle is a complete and accurate source of expected behavior of the SUT.

Test oracle is used to compare

expected result and actual results that generated by the SUT. More details are provided in the next chapter. •

Automated Oracle Challenges: The challenges to provide an automated oracle. They are output domain generation, input domain to output domain mapping, and the comparison.



Artificial Neural Network (ANN): A mathematical model of human neural networks, which can learn from previous experiences using pairs in a training phase and generate outputs for unknown inputs based on previous data. An ANN consists of layers, each layer represented by one or more processing unit called neurons, and connections between them. ANNs can learn by adjusting the connections values within the network [35]. For complete details on ANNs and the training algorithm, which used in this research, please refer to chapter 2.



I/O Relationship Analysis: A mathematical analysis technique to discover the relationships between software inputs and outputs in order to generate expected outputs automatically [50]. It generates a reduced set of I/O data and then expands them to create a complete one automatically. The method is used to handle the output domain challenge and to generate the required ANN training samples.



ANN-based Oracles: Any oracle that uses ANNs to do the input to output mapping and addresses the corresponding oracle challenge. They simulate the software behavior.



Single-Network Oracles: In this thesis, any kind of ANN-based oracles that use only one ANN to learn the SUT and map the input vector to the output vector is called Single-Network Oracle. The ANN can be consisted of one or many layers according to the type of the ANN. The reliability of these oracles may not be adequate in case the SUT is complex.



Multi-Networks Oracles: The proposed model of ANN-based oracles that take advantages of several ANNs instead of one to handle the

13 software complexity and provide more reliable oracle.

They are

consisted of several Single-Network Oracles, which each of them can be different in structure and type. For example, each of the ANNs can be a standalone Multi-Layered ANN. Nevertheless, all of the ANNs that were used in the experiments are Multi-Layer Perceptron networks with Sigmoid activation function. Please refer to section 2.5.4 for more information regarding Multi-Layered networks. We introduced Multi-Networks Oracles in order to overcome the limitation of Single-Network Oracles. •

Mutants: They are altered SUT, which are employed in mutation testing. Particularly, the SUT source code is modified in order to produce some intentional faults.



Mutation testing: It is a testing methodology to evaluate the effectiveness of a test tool and model by providing some mutants of the SUT [31]. To put it differently, the SUT is altered and some faults are injected to it in order to provide mutants. Then, the test tool is applied to find the injected faults. Since this research introduces a new testing model and its quality and effectiveness must be assessed, mutation testing is one of the best approaches to verify the proposed oracle model.



Web Applications: A web application is any software program that uses a browser as GUI, which is usually represented by the World Wide Web (WWW). A web application performs its operation by Scripts and uses HTML to present itself in browsers. These scripts can be server side or client side [36].



Quality Parameters: The quality factors are considered to measure the efficiency of the proposed oracle.

They are accuracy, precision,

misclassification error and practicality. They are explained in chapter 5 in details.

14 1.7 Thesis Outline

This thesis covers some discussions on specific issues associated to software test oracle especially to support oracle automation.

It also describes the new

proposed automated oracle framework in details. The thesis is organized as follows:

Chapter 2: It discusses the literature review of software testing and test oracles. First in this chapter, software testing activities are explained and some existing solutions to automate each activity are presented. Next, test oracles are defined and the challenges to automate them are identified. Finally, after some prominent oracles and state-of-the-art automated oracles are reviewed in details, ANNs are described as one of the tools employed to provide the proposed approach. Moreover, a comparative study among automated oracle models is presented as well.

Chapter 3: This chapter describes research design and procedure that is utilized in this research work. It also describes the sampling, research instruments and evaluation criteria that are considered in this research. Finally, it explains some research assumptions and limitations.

Chapter 4: It explains the proposed model in details.

The discussion

includes the necessary elements that make the proposed oracle. In addition, this chapter describes the evaluation model that is used to assess the proposed approach followed by data analysis.

Chapter 5: This chapter explains the design and implementation of the automated tool that supports the proposed approach.

In order to clarify the

implementation, the process of providing the proposed oracle model is described using a simple example.

Chapter 6: It explains the evaluation of the proposed approach in details. First, two case studies that are applied to evaluate the proposed oracle are described. Then, the process of implementation of the proposed approach is presented for each

15 of the case studies. This research performs evaluation based on the results of the case studies and the synthetic experiment by measuring accuracy, practicality, misclassification error and precision of the proposed oracle. At the end of this chapter, the experimental results are provided and discussed in details. Furthermore, a comparative study among oracle models mentioned in chapter two and the proposed one is presented.

Chapter 7: Provides the statements on research achievements, contributions and conclusion of this thesis. suggestions for future work.

It is followed by the research summary and

CHAPTER 2

LITERATURE REVIEW

In this chapter, a background on software testing activities and test oracles is provided and a review on automated test oracles is presented. It begins with the discussion on the testing activities and some solutions to automate them. Next, some methods that can be used to automate each activity are presented. Then, after test oracles are defined as an element of the third testing activity, challenges to provide an automate oracle are discussed. Finally, prominent and state-of-the-art automated oracles are reviewed based on how they manage to handle the challenges.

In

addition, there is some brief discussion on ANNs at the end of this chapter.

2.1 Software Testing

This section provides a general definition of software testing. During and after coding, the SUT must be evaluated to ensure that it conforms to its specifications and delivers the functionalities as expected by customers [37, 38]. Verification and Validation (V&V) is the process of this checking and analysis. Validation ensures that the specifications are determined as user expected and verification checks that they are implemented correctly [8, 39]. The primary goal of V&V process is to establish confidence that the software works as users expected. V&V starts with requirement validation and continues in each software process model activities and finally to software testing.

17 Software testing is a process of finding faults to achieve two goals [39]. First, it provides an approach for the developers and customers to discover that the software fits its purpose. Second, it finds faults in the software. Faults or errors are any unexpected software behavior such as system crashes, undesirable interactions, incorrect computations, or faulty outputs. Patton explained software faults occur when one or more of the following conditions are true [40]: 1. The software does not do something that the product specification says it should do. 2. The software does something that the product specification says it should not do. 3. The software does something that the product specification does not mention. 4. The software does not do something that the product specification does not mention but should. 5. The software is difficult to understand, hard to use, slow, or in the tester's eyes as end users, it seems as just simply not right. As briefly mentioned in the previous chapter, software testing process can be generally divided into four main activities. The next sections discuss what these activities are and how to automate them.

2.2 Software Testing Activities

Software testing activities are suggested to determine which problem testers must address before moving to the next problem.

Then, the requirements to

automate each activity are explained. Some methods to automate each activity are described in the next section.

18 2.2.1 The First Activity: Modeling the Software Environment The first activity in software testing is to provide the software environment model. Testers must simulate relationships and interactions between software and its environment. Usually these interactions performed via interfaces such as human, software, file system and communication interfaces. Methods that can simulate the interfaces may be helpful in order to automate this activity [41].

2.2.2 The Second Activity: Selecting Test Scenarios The second testing activity is to select and generate test cases.

In this

activity, testers must select proper (or effective) test scenarios (i.e. test cases) that cover each line of source code, input sequences and execution paths to ensure all software modules tested adequately. Because the number of test cases can be very large to be executed within limited time of testing, it is important to select test cases that have higher probability of finding errors. Test case generation methods can be either black-box or white-box. Black-box methods consider software specifications to generate the test cases such as Equivalence Partitioning and Boundary Value Analysis. On the other hand, white-box methods investigate source code to generate the test cases such as Statement and Branch Coverage [42]. Gray-box methods are approaches that are neither completely black-box nor white-box [38].

It is important to select approaches that generate effective test cases. According to [43], an effective test case should: 1. Have a high probability to find an error. 2. Not re-evaluate the pre-tested sections. 3. Be the best of its breed. 4. Be neither too complex nor too simple. Thus, approaches for automatic determination and selection of important test cases are highly demanded [28].

19 2.2.3 The Third Activity: Running and Evaluating Test Scenarios The third activity is to execute the selected test cases and monitor the outcomes. After preparing and selecting test cases, testers must execute them on the SUT and evaluate the results to find whether there is a fault. Expected output domain is provided by a test oracle. The test oracle can be unstructured, such as Human oracles or program documentations, or any structured oracles such as Decision Tables and Cause-Effect Graphs (they are explained later). In order to automate this activity, methods are required to map the input domain to the corresponding output domain. The input and output domain should cover the entire operational environment. Moreover, an automated comparator to validate the actual output is required as well. Particularly, an automated oracle is extremely applicable [23, 44, 45]. The purpose of this research is to reveal an approach to provide an automated test oracle. More detail about test oracle is presented in section 2.3.

Because of the complexity or uncertainty in software behavior, or lack of complete specifications, sometimes the expected output domain is not clearly defined. Stochastic software modeling methods may address this difficulty [46].

2.2.4 The Fourth Activity: Measuring the Testing Process The last activity in software testing is to measure the process of testing. It is important to identify what the status of testing process is and when the testing process can be stopped. Testers need quantitative measurement to identify the status of the testing process by predicting the number of bugs in the software and the probability that any of these bugs may be discovered. Approaches that automatically predict the number of bugs based on software specifications or previous similar projects are useful.

For example, Software Quality Estimation Techniques are

considered to automate this activity [47-49].

20 2.3 A Survey on Methods to Automate the Testing Activities

Software testing activities were explained in the previous section. section surveys some methods to automate each activity.

This

Methods that are

mentioned here may automate the whole activity or at least some parts of each activity in the testing process. Nonetheless, methods to automate the third activity, i.e. automated test oracles, are provided in the next section in detail.

Note that because the scope of this study is limited to automated test oracles, a comprehensive reference to other test automation methods is not presented here. The purpose of this section is only to show some state-of-the-art techniques to automate the testing activities. It is possible that dozens of other test automation techniques are existed in testing literature but they are not related to our topic. However, a comprehensive survey on automated test oracles is provided in the next section.

A classification of existing software testing automation tools that are referenced in this section is shown by Figure 2.1 [50]. The aim of this classification is to explain how different methods were applied to automate each activity. The following sub-sections describe the methods that are shown in Figure 2.1 in details.

21

Modeling the software’s environment

AI Planning

GUI Regression Tester

I/O Relationship Analysis

Test Case Generation

ANN

Test Case reduction

GA

Test Case Optimization

Selecting test scenarios

Formal Oracles Software Testing Activities

AI Planner Test Oracles Running and evaluating test scenarios

N-Version Diverse Systems

Automated Oracles

M-mp Program Testing IFN Regression Tester Modeling Linear Relatioship Measuring the testing process

Quality Modeling Methods

Regression Medel CBR

Modling NonLinear Relatioship

Quality Control

Risk Analysis ANN Testability

Figure 2-1: A classification of automated software testing method

2.3.1 Automated Methods in Modeling the Software Environment As explained before, the first software testing activity is to model the software environment, which it simulates the SUT interactions. Nowadays, most of the software has Graphical User Interfaces (GUIs).

Modeling a GUI is a

challenging task in the testing process. A GUI state modeler is explained here in order to deliver proper test cases.

22 A GUI test case may contain a reachable initial state, a legal event sequence and expected states. The initial state applied to initialize the GUI in a desired state for specific test cases.

An expected state is the state after specific events

execution[51]. Therefore, a modification to the GUI can influence any of these parts and makes pre-designed test cases useless.

Regression testing is a process to re-test unchanged software functionalities that remained unchanged in new versions of the application; thus, regression GUI testing is a process to re-evaluate pre-tested parts of modified versions of the software GUI. The GUI test designer must regenerate the test cases to target these common functionalities; however, keeping track of such parts is an expensive and challenging process. Therefore, usually in practice, no regression testing on GUI is performed [52]. The GUI regression test cases can be divided into two groups: affected test cases and unaffected test cases. Affected are test cases to be re-executed but they need to be redesigned due to modifications have been made on the GUI. Unaffected are test cases that can be executed without any changes but it is unnecessary to re-test them because they are already evaluated in the previous testing process.

The

unaffected test cases are verified functionalities of the software GUI, and they remained unchanged in the new version as well. As mentioned before, re-designing the affected test cases could be expensive and difficult [52].

Memon [41] presented a method to perform GUI regression testing using an AI Planner. He defined GUI test cases using tasks as pairs of initial and goal states. These tasks remain valid in the modified GUI even if changes to the GUI cause test cases being unusable.

Each task represents GUI functionality; therefore, it is

possible to generate affected test cases from these tasks automatically. In addition, this technique used a GUI model to automatically detect changes to the GUI and identify test cases that must be re-executed.

Furthermore, a Regression Tester was designed to determine and regenerate affected test cases by the author. An overview of this regression tester is shown in

23 Figure 2.2. As can be seen, one of the inputs is called original test suits that were generated to test the original GUI. Other inputs are called representations of original and modified GUIs. The regression tester determines which test cases are affected, unaffected or must be discarded.

Because discarded test cases verifies the

functionalities that are not existed anymore to modify the software GUI, they can be eliminated from the process. Test case selector partitions the original test suits into 1) unaffected test cases, 2) obsolete tasks test cases, 3) illegal event sequence affected test cases, and 4) incorrect expected states affected test cases. Illegal event sequence affected test cases are regenerated by the Planning-based test case regenerator. Nevertheless, if the planner failed to find a plan, the test case marks as “discarded” because it belongs to absolute tasks. Expected-state regenerator is used to regenerate the expected state for incorrect expected state test cases and the test case will be discarded in case it fails.

Finally, this method performs regression testing based on re-planning the affected test cases and associates a task with each test case, and creates an interface between the original and modified GUI to generate the test cases. In other words, this method automates test case selection activity (i.e. the second activity of the software testing activities) in regression GUI testing.

24

Figure 2-2: Overview of the regression tester

2.3.2 Automated methods in Selecting the Test Scenarios Test case selection is the second activity of the testing process. Testers consider effective test cases that are likely capable of revealing the majority of software faults. Each test case is defined by a set of inputs and expected output values. In addition, pre and post conditions of the test case execution may be presented as well. Since the numbers of a complete set of test cases is very large in modern software, it is impossible to execute them all in the limited testing time and resources. On the other hand, while most of the similar test cases evaluate same sections or behavior of the SUT, there is no need to execute them all. Therefore, testers must wisely select effective test cases with higher probability to find faults. If executing a test case does not report any faults, testers must not ever imagine that the software is fault free and completely reliable. In fact, in these situations, testers only waste their time because it is impossible to claim that a software application is completely fault free mathematically.

25 Therefore, it is important to determine and select effective test cases. Automating this process can significantly decrease the testing cost and increase its quality. A good effective test case selection approach is introduced by [28]. This research revealed that input-output analysis could identify which input attributes have more influence over the outputs. The study showed that I/O analysis may significantly reduce the number of test cases. An ANN was applied to automate I/O analysis by identifying important attributes and ranking them.

This study modeled the software behavior using ANNs and identified which input has less influence on producing the results by an ANN pruning algorithm. Pruning an ANN removes unnecessary connections between units but retains significance ones. The removing process deletes less important inputs and decreases the number of test cases. Finally, they generated test cases using the remained most significant inputs. Figure 2.3 depicts this process.

Manual searching to find test cases that increase condition coverage is difficult when the SUT has many nested decision-making structures. Therefore, using an automated approach to generate effective test cases is helpful. Michael et al. [53, 54] introduced an automated Dynamic Test Generator using Genetic Algorithms (GA) in order to identify effective test cases. Dynamic test generators are one of the white-box techniques that examine the source code of the SUT in order to collect information about it; then, this information can be used to optimize the testing process. The authors applied GA to perform condition coverage for C/C++ programs identifying test cases that may increase the source code coverage-criteria. The problem of this approach is it may not be reliable to test complex programs consisted of many nested conditional statements that each of them have several logical operations in their statements. Furthermore, this model is not platformindependent.

Taking advantages of the above approach, Sofokleous and Andreou [55] employed GA with a Program Analyzer to propose an automated framework for generating optimized test cases, targeting condition coverage testing. The program analyzer examines the source code and provides its associated control flow graph,

26 which the graph can be used to measure the coverage that was achieved by the test cases. Then, GA applies the graph to create optimized test cases that may reach high condition coverage criteria. The authors claimed that this framework provides better coverage than the previous one in case of testing complex decision-making structures. However, it is still platform dependent. In addition to the above approach, Schroder et al. [56, 57] applied I/O Relationship Analysis to create test cases automatically. Their approach is explained later in detail.

Creating and training an ANN using inputs and corresponding outputs

Pruning the trained ANN and extracting most significant rules

Generating test cases based on the remained rules

Figure 2-3: Automated test generation and reduction using ANNs

2.3.3 Automated Methods in Running and Evaluating Test Scenarios As mentioned before, evaluating test results at the third activity requires accurate oracles. Testers need an approach to generate expected results for each input vector. Then, they can compare the results (i.e. expected results) with the actual outputs so a fault may be detected in case they are different. This is the place that testers need automated test oracles. A detailed process of using an oracle is explained in the next sections where a comprehensive review of the prominent oracle and state-of-the-art automated ones is presented.

27 2.3.4 Automated Methods in Measuring the Testing Process The final testing activity is to measure the testing process and identifies when to stop testing. Software Quality Modeling can be used to perform that. It has many applications in modeling the software reliability by predicting a statistical measure of the software reliability and enabling the testers to perform quality control and risk analysis. Quality Control can be applied to answer the question “When to stop the testing process?”. Answering this question can help testers to measure the process of testing. One approach is to consider Software Metrics.

Prior studies have shown that software metrics are correlated to the number of faults [47, 49, 58]. Therefore, software metrics can be applied to predict the number of faults in program modules; furthermore, testers can evaluate the quality level of the SUT and make a decision when to stop testing process based on previous experiences.

These metrics are quantitative descriptors of modules attributes.

Similarly, software metrics can be applied to perform risk analysis, which it helps testers to identify risky modules and make special attention to such modules.

There are two types of methods to perform quality modeling automatically: first, methods that model linear relationships between input and output patterns such as regression analysis, and second, methods that model non-linear relationships such as ANN and Case-Based Reasoning (CBR) [59]. A CBR system is a computational intelligent expert system to find solutions for new problems based on the provided solutions of the similar past problems. The new solution is represented by cases in a library according to the prior experiences. A CBR system consists of a case library, a solution process algorithm, a similarity function, and the associated retrieval and decision rules. CBR is useful when the environmental knowledge is not enough and an optimal solution is not known. To put it differently, CBR is an automated reasoning process aimed to solve new problems [49].

Since the relationships

between software metrics and quality factors are usually complex and non-linear, approaches which use former methods have better accuracy.

28 Khoshgoftar et al. [58] proposed a method to apply ANN and Regression Modeling predicting the number of faults in the SUT, and comparing the results of both methods using the software metrics. The process is shown in Figure 2.4. Software metrics was provided as inputs to a trained ANN and the independent variable to the regression model. Outputs of ANN and regression model (dependent variable) are an approximation of the number of faults in the module under test. By comparing the prediction fault in both methods, this study showed that ANN prediction was superior to regression model. In addition, testers must manually choose which metrics are related to the program quality and have effects on fault prediction in regression modeling. On the other hand, since effective metrics are selected automatically during the ANN learning process by adjusting the network parameters, the metrics choosing activity is not necessary for the ANN based approach.

Modern complex software systems may have a large number of quality metrics while some of them have a little influence over fault prediction. Thus, modeling the quality control may need lots of processing resources. As a result of a study conducted in [60], Principle Component Analysis (PCA) was suggested to reduce the number of software metrics and to derive most important and effective metrics for modeling the quality of the software. PCA is a statistical technique to find patterns in data of high dimension, and expressing the data in such a way to highlight their similarities and differences. Once these patterns are found in data, PCA compress the data by reducing the number of dimensions, without much loss of information [61]. Considering an n × m matrix, it is possible to reduce it to an n × p matrix (p, < i2 , b2 >,, < id , bd >} where each < in , bn > is a training sample. Therefore:

T

T

T

iw = b  w = i b

2.5.3.2 Repetitive Procedures (Gradient Descent)

Before explaining the procedure, the Gradient Vector ∇ x F must be declared:  dF ( x)   dx  1   ( dF x)   dF ( x) =  dx2 ) = ∇ x F dx     dF ( x)     dxd 

It shows the maximum changes in function F. Choosing random weights values, the Gradient Descent initializes and generates network outputs. The ∇ w F is calculated in a repetitive procedure and proceed to decrease the network error. In other words, the procedure changes w to minimize the difference between generated and expected outputs. The error of the network for w n is ew = i wn − b while ew is the function being minimized. Therefore: n

w n+1 = w − α n

∂ew ∂w

n

= w −αn i

The network error is any differences between the network output (O) and the expected output (t):

1 e p = (t p − o p ) 2 2

53 The total error is: n

E = e p p =1

Mean Square Error (MSE) is considered to measure the training quality. It is the squared distances between the results generated by the network and the results from the training samples. It can be calculated as:

E=

1 n p  (t − o p ) n p =1 .

The basic principle is to calculate the error derivation of w and to force the error variation moving toward zero. To put it differently, we need to calculate

∂ (e p ) k which k is the training cycle number: ∂w ∂(e p ) ∂e ∂o p ∂g p = p. p. k ∂o ∂g ∂wk ∂w Consequently, the correction of weight jth on the kth cycle is: wKj +1 = wkj − α [(o p − t p ) The

∂F ( g p ) p ik ] ∂g p

∂F ( g p ) is the activation function derivation and α is the Learning Rate ∂g p

between (0, 1). It is one of the parameters that show how fast the ANN learns and how effective the training is. Choosing a value very close to zero requires a large number of training cycles and makes the training process extremely slow. On the contrary, large values may diverge the weights and make the objective error function fluctuate heavily; thus, the resulted network reaches a state where the training procedure cannot be effective any more. Typically, learning rate is chosen to be in between 0.1 and 0.3 in most practical applications.

2.5.4 Multi-Layer Perceptron Networks

Neural networks with only one neuron have limitations due to inabilities to implement non-linear relationships; hence, Multi-layer Perceptron networks are one of the most popular types of ANNs to solve the non-linearity problem [99]. As an

54 illustration, XOR function is not possible to be modeled by these networks. MultiNetworks Oracles [94, 97] and Multi-Layer Perceptron Networks [84-89, 98, 99] are different. Multi-Networks Oracle may have several standalone ANNs, in which each of the ANNs may have a different structure and layer. A Multi-Networks Oracle may be comprised of several Multi-layer networks. For example, one of the ANNs in a sample Multi-Networks Oracle can be a three layered Perceptron network; the other network may be a network with only one layer, and the other is another Perceptron ANN with four layers.

According to [84, 87, 88, 98, 99], a Perceptron ANN is a multi-layer network with no limitation to choose the number of neurons, but it must have an input layer, one or more hidden (middle) layers and one output layer. The general model of Multilayer Perceptron networks is Feed-Forward with Back-Propagation training procedure, which feed-forward are networks with inputs of the first layer connected and propagated to middle layers, the middle layers to the final layer, and the final layer to the output layer. In the back-propagation procedure, after results of the network are generated, the parameters of the last layer to the first layer will be corrected in order to decrease the network errors, i.e. MSE.

Figure 2.13 shows a Multi-layer Perceptron network with one hidden layer [87]. In this figure, X is the input layer, Z is the hidden layer and Y is the output layer. Note that each of the neurons in the figure is a standalone neuron so they have their input weights, a bias weight, and an activation function. The bias weight is a constant that works as an offset. The output of each neuron is calculated as:

d

net i =  w j i j + wi0 j =1

where d is the number of inputs to the neuron, wj is the associated weight, ij is the input value, and wi0 is the bias weight. The activation function is applied on the neuron output (i.e. neti).

55

Figure 2-11: Multilayer Perceptron Network with one hidden layer

Perceptron networks have some considerations [88]: 1. Each neuron of each layer is only connected to neurons of the next layer. 2. All of the neurons must be connected to all of the neurons of the next layer. 3. Neurons of input layer do not perform any function and their weights are fixed and equal to one. They do not have any squashing function too. 4. The propagation of the operation is feed-forward. All the neurons (except input neurons) have an adder and independent squashing function. 5. Each neuron may have independent bias. 6. The number of hidden layers can be selected as necessary. 7. The number of neurons at each hidden layer can be selected as necessary.

56 2.5.4.1Multilayer Perceptron Network Training Algorithm (BackPropagation)

This section describes the Perceptron training algorithm based on network showed in figure 2.13 [87]. The notations that are used by the training algorithm are explained as follows: •

x: The training input vector.



t: Expected output vector.



δ k :The error percentage uses for w jk correction, based on output error Yk . It is propagated to hidden layers.



δ j : Error percentage for vij correction, based on error information propagated from output layer to hidden unit Z j

.



α: Learning rate.



X i : Input unit ith.



v 0 j : Hidden unit ith bias.



Z j : Hidden unit jth.



The network input to Z j that is shown by z _ in j = voj + i xi vij .



The output signal Z j that is determined by z j = f ( z _ in j ) .



w0k : The output unit bias kth.



Yk : The output unit kth.



The input to Yk that is calculated by y _ ink = w0k +  z j w jk j



The output signal Yk that is calculated by yk = f ( y _ ink )

There are three steps to train a network using the back-propagation procedure. First, feed-forward the learning patterns inputs; second, back-propagation the resulted error and finally, adjusting the weights.

Each input unit X i distributes its signal to each hidden units Z1 Z p during the feed-forward procedure. Then, each hidden unit calculates its activation function

57 and sends the resulted signal to each output neuron. Finally, each output neuron Yk applies its activation function and generates y k as the network result to the input signal.

Each output neuron compares y k to t k and generates the corresponding error while the training process. The factor δ k (k = 1,..., m) is calculated using this error.

δ k is applied to back-propagate the error into all of the neurons within the previous layers. In addition, it is used to update the weights between hidden and output layer. Similarly, δ j is calculated for each hidden units. Since it is not necessary to backpropagate to input layer, δ j applies to update the weights between input and hidden layer.

Finally, all of the weights will be corrected simultaneously when all of the δ are calculated. The correction of the weights w jk (between hidden and output layer) will be based on δ k and z j , and vij (between input and hidden layer) based on δ j and

xi .

The training algorithm is as follows: Level 0: Initialize weights by random values Level 1: While the end condition is false, repeat level 2 through 9 (the end condition is introduced later) Level 2: For each training pair do level 3 through 8 Level 3: Each input unit X i ,i=1,…,n, receives input signal xi and distributes it to all units of the next layer Level 4: Each hidden unit Z j , j=1,…,p, sums its weighted input signals z _ in j = voj + i xi vij and applies its activation function z j = f ( z _ in j ) , and propagates the resulted signal to all units of the

next layer

58 Level 5: Each output unit Yk , k=1,…,m, sums all of its weighted input signals y _ ink = w0k +  z j w jk and applies its activation function j

yk = f ( y _ ink ) in order to generate the network output Level 6: Error Back-Propagation: 1. Each output unit Yk , k=1,…,m, calculates δ k using outputs from the training

set

and

the

outputs

generated

by

the

network

δ k = (t k − y k ) f ' ( y _ ink ) and gets the values to correct w jk and bias w0k : 2. Δw jk = αδ k z j 3. Δw0 k = αδ k 4. Finally, it sends δ k to the neurons of the previous layer. Level 7: Generating the Error and Correction Data: 1. Each hidden unit Z j , j=1,…,p, sums its input delta, which it has been m

received from the next layer units, δ _ in j = δ k w jk and multiplies the k =1

result to the activation function derivation and generates the error data δ j : m

δ _ in j = δ k w jk k =1

2. Then it calculates the values needed to correct vij and bias v 0 j : Δvij = αδ j xi Δv 0 j = αδ j 3. Level 8: Weights and Biases correction:

1. Each

output

unit

(j=0,…,p)

updates

its

weights

and

bias:

updates

its

weights

and

bias:

w jk ( new ) = w jk (old ) + Δw jk

2. Each

hidden

unit

(i=0,…,n)

vij ( new ) = vij (old ) + Δvij

Level 9: The end condition is true. The training algorithm is ended. Note that it is necessary to build separated arrays for output units delta (i.e. level 6, δ k ) and hidden units deltas (i.e. level7, δ j ) while the implementation. An

59 Epoch is a complete cycle on the training set, and the back-propagation may need too many epochs to train an ANN. Usually these epochs continue until the network error, i.e. MSE, riches a desirable minimum or zero (the end condition). Moreover, the algorithm will be ended in case the average error value is not changed during several epochs.

Weights and biases initializations are performed by choosing some small random variables usually. It is proved that each random selection of these values is convergence to the correct values [87].

To put it differently, choosing any

initializing values can cause the ANN to train. The only difference is selecting the wrong values may increase the number of the training epochs.

There is a method to select the number of the learning patterns. Suppose P is the number of the training pattern, W is the number of network weights and e is the maximum allowed network error, then the following relation must be true in order to make the network works fine with accuracy l-e :

W W =eP= e P As an illustration, if e=0.1 and W=80, we need 800 learning pattern (P=800) to ensure that the ANN works with 90% accuracy. Perceptron multi layered networks may have any number of hidden layers. Although in many cases only one layer is enough, sometimes two hidden layers simplify the network learning. However, it is necessary to perform the learning algorithm for each hidden layer.

2.6 Summary

Software testing process can be divided into four main activities: modeling the software environment, selecting test scenarios, test case execution and evaluation, and measuring the testing process. Each testing activity shows what testing problem needs to be addressed before moving to the next activity. This chapter explains each activity and provides solutions to automate them. Then, a survey on some state-of-

60 the-art automated methods that can automate each activity is performed.

The

methods are varied between AI methods like ANNs, Case Based Reasoning (CBR), Info Fuzzy Networks (IFN), or statistical methods such as Regression Modeling and Principle Component Analysis (PCA). IFN is an approach developed for knowledge discovery and data mining. A CBR system is a computational intelligent expert system to find solutions for new problems based on the provided solutions of the similar past problems. PCA is a statistical method to find patterns in data of high dimensions and expressing the data in such a way to highlight their similarities and differences. Some of the methods mentioned here can be applied by any testing strategy but some of them by specific strategies.

Next, test oracles, which testers use in the third activity, and the process of using them were explained. The challenges to provide an automated oracle were identified by the exhaustive literature review as generating the output domain, map input to the output domain, and the comparison challenge. Some of the previous attempts to handle the oracle challenges were shown ranging from un-automated models such as human oracles to automated models such as Info-Fuzzy Network Regression Testers and AI Planning oracles. Finally, a comparative study on the oracle models was presented that highlights their advantages and disadvantages, limitations, and which of the oracle activities can be automated by them.

Although automated test oracles can be achieved by various tools, several studies have shown that ANNs have good abilities to act as test oracles.

The

previous studies on automated Single-Network ANN based oracle were surveyed in detail.

ANN based test oracles require training samples to model the software behavior and act as test oracles.

However, none of the previous researches

mentioned how these data can be generated since they are all assumed that the data are already available. Nevertheless, it is not true in most of the situations especially when a fresh testing is being performed; thus, methods to generate the training samples were discussed as the research gap. Furthermore, Single-Network Oracles

61 may not be effective in case the SUT performs complex operations or it has a large I/O domain.

Consequently, this chapter performs a comprehensive literature review on automated test oracles and identifies the main problems to provide them. Finally, it is explained that how previous attempts to overcome the identified challenges succeed or failed. In particular, although there are some fully automated test oracles, they are extremely expensive, unreliable, or limited to the testing strategy such as regression testing. Thus, it is very important to propose an approach in order to provide an automated oracle that is not only overcomes the oracle challenges but also be economical, reliable, and testing strategy independent.

CHAPTER 3

RESEARCH METHODOLOGY

This chapter describes the research methodology used in this research. First, it describes research design that is utilized in this research.

Next, it explains

operational framework of the work. It also describes some research assumptions and limitations. Finally, it explains the evaluation criteria that are used to evaluate the proposed oracle.

3.1 Research Design

In [90], the authors proposed a hypothesis on classification of the research problems of the software engineering discipline in: Engineering problems, which concerned with the construction of new objects such as methodologies, approaches, techniques, and Scientific problems, which concerned with the study of the existing object such as algorithmic complexity, software metrics and testing techniques. In the case of Engineering, it would be necessary to study the existing methodologies, reflecting on them to determine their advantages and disadvantages and proposing a new one, which retaining the advantages of the methodologies studied and lack their shortcomings as far as possible [91-93].

This research aims to propose an automated framework for software test oracles. Thus, producing the framework is an engineering problem and this research

63 will be focused on engineering design such as modeling, constructing, and evaluating the new object.

The design of the research is presented by Figure 3.1. At the beginning of the research, literature review was performed. The aim of it was to discover the testing activities, automated methods in each activity, test oracles and their components, identifying the oracle challenges, automating the oracle process and an approach to provide an automated framework. Several studies highlighted the applications of ANNs as an automated oracle and proved that ANNs have good abilities to act as oracle in special situations theoretically. After analyzing the test oracle challenges, some problems were discovered that mentioned in chapter one. For example, any automated oracle framework must provide the expected output domain automatically but most of the previous researches did not consider it. Similarly, the ANN-based oracles that offered by the previous studies may not be adequate when it comes to large software applications. Previous ANN-based oracles use only one ANN to approximate the SUT; however, the ANN can be any type of neural networks as required.

A survey was performed to find solutions to the identified oracle

challenges and the proposed approach was formulated to address them as follows. The first challenge, i.e. the output domain generation, is handled by I/O Relationship Analysis that was applied to generate the expected output vectors and provide a way to prepare the training samples automatically in order to model an ANN-based test oracle. The second challenge, i.e. the oracle mapping challenge, was addressed by introducing Multi-Networks Oracle, which it enhances the previous model by using several ANNs instead of only one to approximate the SUT. The last challenge is the comparison challenge that was handled by an automated comparator, which applies some thresholds to compare the actual and expected output vectors and make decision on the results generated by the SUT.

64

•A survey on software testing process and its activities •A survey on automated methods that may be applied to automate the testing activities •A survey on test oracle Literature Review •A survey to identify the oracle challenges •A survey on automated methods that may address the oracle challenges

Requiremnets Analysis

Solutions

Implementation

Evaluation

•An evaluation of State-of-The-Art approches on automated test oracles •An inventory of the problems in automated test oracles and an analysis of these problems

•A solution for the first oracle challenge using I/O Relationship Analysis •A solutions for the second oracle challenges by introducing the Multi-Networks Oracle •A solution for the third oracle challenge by defining an automated comparator

•Preparing the necessory data using the proposed solution •Making the proposed oracle for each of the case studies • Making the same oracles with the previous method, i.e. the Single-Network Oracle, in order to compare with the poposed Multi-Networks Oracle •Providing an automated test driver to conduct the testing automatically using the provided oracles

•Providing the mutants •Using the proposed oracle to find the injected faults whitin the mutamts •Measuring the quality of the porposed oracle • Using the same method to test the case studies using the Single-Network Oracle •Comparing the quality of the Multi-Networks Oracle with the Single-Network Oracle • Identifying weakness and strong points of the solutions and try to eliminate them •Providing the comparative study between existing test oracles, the Single-Network Oracle and the proposed oracle

Figure 3-1: Research Design

Evaluating the proposed framework, the proposed oracle needed to be implemented and examined by adequate case studies. In order to strengthen the

65 proposed approach, we employed two case studies to verify the suggested oracle. Mutation Testing is the evaluation method; in particular, first, a mutated version for both of the case studies was provided and injected with some faults. Then, a faultfree version of the SUT was developed as a Golden Version for each of the case studies in order to evaluate the capability of the proposed oracle to find the injected faults. An automated test driver was provided to conduct the testing using the provided oracles automatically. The evaluation was done by testing the case studies, observing the result, establishing the model quality, finding the weakness and strong points of the proposed solution, and trying to eliminate the identified weakness as much as possible. Furthermore, we measured the accuracy, practicality, precision and misclassification-error of the proposed oracle for both of the case studies to perform the evaluation flawlessly.

In addition, in order to compare the quality of the proposed Multi-Networks Oracle with the Single-Network Oracle, we provided the Single-Network Oracle for both of the case studies using the very same data and applied them to test the case studies the same way we did for the Multi-Networks Oracle. The quality of the Single-Network Oracles was measured by the same parameters that we considered for the Multi-Networks Oracles. Then, we compare the results generated by both of the oracle models to show how efficient the proposed oracle is in compare to the previous one. Figure 3.2 shows the structure of Single-Network and Multi-Networks Oracles. As can be seen in the figure, a Single-Network Oracle consisted of one ANN to learn the SUT behaviors but a Multi-Networks Oracle applies several neural networks to do the same job. In order to design the oracles, the required ANNs must be defined first. We considered Multi-Layer Perceptron ANNs to define each ANN in our experiments. Defining the oracle should be according to the number of inputs to the oracle and the number of outputs that the oracle should provide. As an illustration, the oracle for the first case study receives eight inputs and generates three outputs. Thus, a Single-Network Oracle for the first case study consisted of one Perceptron network with eight input neurons and three output neurons. On the other hand, a Multi-Networks Oracle for the same case study consisted of three Perceptron networks, which each of them has eight input neurons but only one output neuron because each ANN is responsible to produce one of the outputs only.

66

Multi-Networks Oracle Single-Network Oracle

ANN1

ANN1

ANN2 ... ANNn

Figure 3-2: Single-Network and Multi-Networks Structures

Finally, a comparative study between the existing oracle that was discussed before, the Single-Network Oracle and the proposed oracle is provided as well.

It is possible to modify the solution in order to improve it in case the evaluation does not provide adequate results. Thus, it is possible to return from Evaluation to Solution and repeat the cycle as necessary.

3.2 Operational Framework

Based on the research design mentioned above, the whole research operational framework that is applied in this research is built.

Actually, this

operational framework is based on the research questions that arise in the beginning of this research.

From those questions, the research objectives were derived,

followed by creating activities that support each objective. The activity produces some deliverables that can be measured to justify whether the objective has been achieved or not. The operational framework of this research is shown on Table 3.1 below.

67 3.3 Sampling

The sampling was based on testing the case studies and observing the test results. Mutation testing was applied to inject some faults into the case studies and the proposed approach was used to find these faults and record them. The samples are including: •

The number of identified I/O retaliations



The number of reduced expected outputs



The number of actual datasets



The networks error rate, i.e. MSE



The number of faults injected to the SUT



The number of injected faults that were found by the oracle (practicality)



The number of injected faults that were missed by the oracle



The oracle misclassification error rate



The oracle absolute error rate



The oracle accuracy



The oracle precision

68 Table 3-1: Operational Framework No. Research Question 1 What are the challenges to provide an automated test oracle? 2 How to produce automated oracle with high accuracy in order to overcome the identified challenges and to support software testing process? 3 How to evaluate the proposed oracle?

Objective To develop a new test oracle approach that addresses the identified challenges.

Activity

1.To evaluate the practicability of the proposed approach through application of the using two case studies and measuring the quality of it in practice. 2.To highlight the advantages of the proposed model and compare it to state-of-the-art automated oracles.

•Preparing the case studies •Providing the training samples unisg I/O Relationship Analysis automatically •Providing the proposed oracle for each of the case studies •Applying the oracles to test the case studies •Measuring the quality parametersof the proposed oracles •Repeat the same activities to provide the same Single-Network Oracle and apply it to the case studies •Compare the results generated by the SingleNetwork Oracles and the proposed MultiNetworks Oracles •Conduct the comparative study

Deliverables

•Building an operational •Operational framework framework •Literature study •Results of •Comparative evaluation comparative on existing approaches evaluation •Designing the proposed •Research design approach •The proposed model scheme •The proposed oracles •The SingleNetwork Oracles •Automated test driver •Test results •Results analysis •The comparative study

3.4 Research Instruments

The proposed oracles were modeled using several feed-forward Multilayer Perceptron ANNs implemented using NeuronDotNet, which it is an ANN modeling software package for Microsoft Visual Studio.Net.

Case studies were required to evaluate the proposed approach; therefore, a car insurance web-based application and a student subject-registration verifier were chosen as the case studies. The purpose of the former case study is to measure the

69 credit amount for the customers, to make decision on credit approval, renew insurance and other related insurance operations. The last case study make some decisions based on student history to approve his/her registration, determines how many courses the student can choose and whether the student can apply for a discount. The case studies were selected because their complexity can highlight the advantages of Multi-Networks Oracles over Single-Network Oracles in order to verify complex software applications. This study considers Cyclomatic Complexity [40] that implies the structural complexity of program code. Particularly, it measures how many different paths must be traversed in order to fully cover each line of the code. It is one of the parameters used to show how complicated the SUT is. Higher numbers mean more testing is required and the SUT is more complicated. The Cyclomatic Complexity of the subject-registration application (i.e. the first case study) study is 18 while the Saina Insurance (i.e. the second case study) is 38.

Since the case studies were implemented by us, I/O relationships are predetermined. Visual Studio.Net was used to provide the necessary data automatically and to generalize the I/O relationship analysis results providing the whole data.

Being one of the latest and most popular platforms in web-application development to date, ASP.NET was used to provide the case studies and the test driver. It was implemented by Visual Studio, one of the most famous and strongest IDEs by Microsoft. All of the testing tools and required materials were created using the same framework.

3.5 Evaluation Criteria

As mentioned earlier, two case studies were employed in order to evaluate the proposed approach. The first case study is a registration-verifier application, which was implemented to manage the student registrations and course selection. The second case study was an insurance application that performs normal car insurance operations. The details of each case study are provided by chapter 5.

70 The proposed approach was applied to test both of the case studies and the results of the testing were considered to measure the quality of the proposed oracle model. Previous studies show the following benchmarks can be applied to assess test oracles [18-20, 23, 83, 94]. Thus, they are measured by this study as well to demonstrate the quality of the proposed model. Note that some of them are derived from Software Engineering Product Quality Standard (IEEE 9126) [95].

1. Accuracy: How accurate the oracle results are. It conveys what percent of expected outputs generated by the proposed oracle are accurate. It was measured by comparing the correct expected results, which were derived from the Golden Version, with the results produced by the proposed oracle. 2. Precision: How precise the oracle is. In particular, it means what the precision of the comparator is in order to compare the expected and actual results. It can be adjusted using the thresholds to adjust the comparison tolerance. 3. Misclassification Error: It is the amount of false reports produced by the comparator. 4. Practicality (or Usability): It is the amount of the injected faults, which was identified by the proposed oracle. Note that it is different from accuracy because accuracy considers both successful and unsuccessful test cases whereas practicality only considers unsuccessful test cases. The evaluation process is explained in the following chapters in detail.

3.6 Assumptions and Limitations

As any other research, there are some assumptions and limitations in this research too. The basic assumption of this research was to present an ANN model of the SUT with less cost to simulate the behavior of the SUT, and to be capable of generating expected output domain. Then, this model acted as test oracle and applied to automate the third activity of the software testing process.

71 Based on the above assumptions, the limitations are explained as follows: •

This framework is only applicable to test data-centric applications. To put it differently, it is unable to test flow of events. For example, it is not possible to verify the actual results if it is not a numeric data or cannot be normalized into one.



The quality of the proposed approach is directly related to the quality of the inputs and conditions that results are being generated based on. For the best accuracy, there must not be any incoherencies and ambiguities within the SUT result generation logic. Furthermore, the training samples must include all of the input combinations and corresponding outputs in order to simulate the SUT behavior completely. Finally, the ANN structures and training procedure are important too because any inadequate training may decrease the oracle quality significantly.



Since this approach is a black-box test method that only verifies the final outputs, it does not have any applications in white-box or other structural testing methods.

The proposed approach is portable and platform independent. Thus, it can be used to test any type of software applications written by any programming language and/or methodology.

3.7 The Proposed Framework

Figure 3.3 shows an overview of the proposed framework. As can be seen, the proposed framework is consisted of three elements. The first element is training data generation using I/O Relationship Analysis. Before applying I/O relationship Analysis, testers must perform Equivalence Partitioning manually in order to identify valid classes and compress the input domain. Equivalence Partitioning is a test design technique in which test cases are designed to execute representatives from equivalence partitions. Test cases are designed to cover each partitioned at least once [1, 21, 37, 38, 40, 42, 75, 96].

72 Next, testers can apply I/O Relationship Analysis to generate the reduced sets, which must be performed manually, and then use them to create the entire I/O domain automatically. Section 4.2.1 defines I/O Relationship Analysis and section 5.1 considers a small case study to show how the training data can be generated in details.

The second element is to train the required ANNs and making the oracles. Note that before proceeding with the training, all non-numeric data must be normalized into numeric data. As an illustration, binary "True" and "False" values can be regarded to one and zero. Text based values such as "Sedan" or "MPV" can be defined as numbers. For example, the former may be defined as one and the last as two. Continuous data can be scaled to a range according to the ANN's activation function. As an illustration, in case of using Sigmoid as the activation function, the range must be [0, 1]. Table 3.2 formulates the normalization process. The only manual activity is to define each of the required ANNs and set some parameters as explained in section 5.2.1.

The rest of the training process can be performed

automatically using the available neural networks tools, such as Netlab for Matlab, or NeuronDotNet for Visual Studio.net.

Table 3-2: Data Normalization How to Normalize Data Type

1 2

Not Normalized Data

Normalized Data

True

1

False

0

Binary Continuous

Scaled to the Activation function range. Characters such as ‘A’, ‘B’ and so on

3

Text-Based Text items such as ‘Sedan’, ‘Full’ and so on

Normalized to numbers such as 1, 2, and so on

73 The last element is to apply the proposed oracle to test the SUT and use the automated comparator to verify the actual results. An algorithm for the automated comparator is presented by figure 5.5. Note that the associated activities can be fully automated if the tester uses a test driver such as one we introduced in section 5.3.

More details of each element are provided in section 4.2. Similarly, the process of providing and using the proposed oracle is discussed in details in chapter 5 where a small case study is presented to explain the process.

74

Element 1: Training Data Generation

Apply Equivalence Partitioning Manually

Apply I/O Relationship Analysis according to section 4.2.1

Element 2: Multi-Networks Oracle

Normalize the Training Data

Train the Required ANNs

Element 3: Test Case Verification

Apply the Multi-Networks Oracle

Verify the Test Cases using the Automated Comparator

Figure 3-1: Overview of the Proposed Framework

75 3.8 Summary

The research methodology was explained in this chapter. In particular, the research design, operational framework, sampling, research instruments, and evaluation criteria were discussed. Moreover, assumption and limitations of the research were considered as well.

The research began by literature review to conduct a survey on state-of-theart automated oracles. Then, available oracle models were studied so the problems that previous studies were unable to solve were identified. Applying requirement analysis, solutions to the inherited oracle problems were formulated and the process to verify the proposed solution was highlighted.

The proposed solution offers an automated framework for software test oracle. It uses different methods to solve the identified problems. Particularly, I/O Relationship Analysis was considered to address the output domain generation challenge. Multi-Networks Oracles that use several standalone ANNs in parallel to approximate the SUT are introduced, which they may be capable to handle the mapping challenge and enhance Single-Network Oracles that consisted of only one neural network.

Finally, an overview of the proposed framework was presented.

It was

shown that the framework composed of three elements. They are training data generation, Multi-Networks oracles, and test case verification. The next chapter explains each element in detail.

CHAPTER 4

TEST ORACLE MODELING

This chapter explains the proposed framework in detail. elements of the proposed model.

It shows the

Furthermore, the Multi-Network Oracle is

described and compared with the Single-Network Oracle. Finally, the evaluation model, which is considered to assess the proposed approach, is demonstrated as well.

4.1 The Motivation

As explained before, there are three challenges to provide an automated oracle. They are output domain generation, input to output domain mapping, and the comparison challenge. Chapter two explained how previous studies attempted to overcome these problems. It was shown that some of the proposed oracle models are either extremely expensive to be implemented or unreliable. The others may only provide a fully automated oracle for specific test strategies. The rest are not fully automated; particularly, they can handle some of the three above challenges.

Therefore, it is essential to propose an automated oracle model to consider all of the identified challenges and solve the problems that are remained unsolved by the previous studies, such as low reliability, high cost, requiring lots of manual activities, or test level dependency.

This chapter describes the details of the proposed

framework to do so. Note that the limitations of the proposed oracle were already discussed by section 3.6.

77 The proposed oracle model in this research consisted of three elements. Each element employs a tool to address one of the challenges. The first one is I/O Relationship Analysis to solve the domain generation challenge. Among several tools that were considered to solve the mapping challenge, ANNs are promising because they are easy to implement and can be economical if the required data to make them are available [18-20, 22, 23, 29, 30]. However, ANN-based Oracles that were used previously may not be effective enough in order to test complex software applications because their application is limited by only one ANN. Thus, MultiNetworks ANN-Based Oracles are introduced by this research in order to handle the second challenge and overcome the limitations of the previous one. The last challenge is addressed by an automated comparator that can adjust the precision of the oracle. In the following, the proposed approach is described in detail.

4.2 The Proposed Framework in Detail

Pilot study is generally advised to try out the proposed research instruments and procedures on a small example. It intends to reveal errors in the design, and unanticipated problems that appear may be solved at this stage, to save time and effort later. In particular, one of the main functions of pilot study is to evaluate the validity of the research instruments. Figure 4.1 shows the overall procedure that we have considered in order to develop and evaluate the proposed oracle model. First, the ANN must model the business rules that are implemented by the SUT using the generated required data sets. In order to train the ANNs, comprehensive training samples that cover the entire functional behavior of the SUT being modeled by the ANN are necessary. Therefore, we need the expected outputs being provided for any input vector mentioned by the program specifications to generate a complete test oracle. As mentioned earlier, this combination could be very large in complex software applications and manual output-domain-generation may be too costly in terms of human resource, time, and budget.

To address this issue, we applied I/O

Relationship Analysis to generate a reduced set of the output domain and generalize

78 it to build the complete domain that cover all of the output vectors because they are essential to create the oracle. Once the complete output domain is provided, all of the data include both input and output vectors need to be normalized into numeric values since ANNs can only understand numeric data. Then, the normalized vectors are applied to train the ANNs as the training samples. After the ANNs are trained, they can be used as a Multi-Networks Oracle; in particular, the expected output vectors are generated by the trained ANNs.

Once the ANNs are trained, mutation testing was conducted in order to verify the quality of the oracles. Two versions of the case studies were provided: a Golden version to create correct expected results and verify the oracle itself, and a mutated version, which was injected with some faults.

The final step was to perform the oracle evaluation. The purpose of this step was to measure the quality of the oracle and its ability to find the injected faults. Therefore, the oracle was applied to test the mutated version and its results were assessed against the Golden Version results.

In the following, the elements of the proposed oracle model, which is shown by Figure 4.2, is described in detail.

79

Generate Required Dataset •Determine the I/O Relationships •Generate the reduced output vectors manually •Generate the entire output domain automatically based on the provided reduced vectors

Normalization •Normalize all non-numeric input and output data items into numerci values and numeric sets •Create the training samples

ANN Training •Feed-Forward the inputs from the training samples •Generate network results •Generate network error data •Back-Propage error data and adjust the network parameters •Repeat the above process to train the other required ANNs

Mutation Testing •Create the Mutated Version by injecting some faults into the SUT (mutants) •Create the Golden Version to evaluate the oracle itself

Evaluation •Execute the test cases on both the Golden Version and the Mutated version. •Provide the test case input vector to the proposed oracle. •Apply the Automated Comparator to the results generated by the Golden version, the Mutated Version and the oracle •Measure the qulaity of the porposed oracle

Figure 4-1: The Overall Procedure of Developing and Evaluating the Proposed Oracle Model

80 4.2.1 Element 1: Training Data Generation (Applying I/O Relationship Analysis)

Training data generation involves applying equivalence partitioning and I/O Relationship Analysis.

Before proceeding with I/O relationship Analysis, it is

important to apply equivalence partitioning in order to identify valid equivalence classes. Once they are identified, we have to provide the expected results vector and prepare the training samples for each of them. In particular, I/O relationship analysis consisted of three activities namely create the reduced dataset, expand the reduced dataset, and generate the complete training data. I/O relationship analysis may be applied in expected output generation [57]. Large software applications may have thousands of I/O combinations; therefore, it could be very difficult and/or expensive to provide them manually. I/O Relationship Analysis may be applied to automate the output domain generation. Suppose X={ , vector. Any possible complete dataset

=

,…,

} as inputs vector and Y={ ,

values are (

) and any possible

( ) × ( ) × …× (

,…,

} as outputs

values are (

). A

) is any combinations of possible

input values. The size of the complete dataset is:

| | = | ( )| × | ( )| × … × | (

)|

Let X(y) be all of the input values of vector X that influence output y, then Tred(y) is a set of input combinations that influences output y and defined as follows: ( ) =

( )×

( ) × …×

where: ( ) = ( ),



∈ ( )

( )={



∉ ( )

},

(

)

81 The expected results for the provided reduced sets can be easily generated using any reliable source such as software specifications and/or domain experts. Note that these data must be accurate since the entire output domain will be generated based on them. If any of the expected outputs for the reduced sets are incorrect, this inaccuracy will be propagated to the oracle so the oracle generates incorrect expected results.

After the reduced sets were provided including their associated results, it is possible to generalize the results by combining the above tables to generate the complete expected output domain for the entire I/O domain. This was possible by computing the union of the reduced sets:

The Complete Training Samples (i.e. Dataset) = ⋃

( )

More details of how to apply I/O relationship Analysis is provided by section 5.1 where a small case study is considered to show how the required sets (i.e. the input, output, reduced sets and so on) can be defined.

82

Element 1: Training Data Generation

Create the Reduced Dataset Apply Equivalence Partitioning Expand the Reduced Dataset Identify Valid Equivalence Classes

Generate the complete Training Data

Element 2: Multi-Networks Oracle Network Training Cycles

Training Data Normalization

ANN1 Training Samples

Oracle is adjusted

⋮ Training Data

MultiNetworks Oracle

ANNn

Back-Propagate the error data to adjust each ANN weights Using the training data to measure the MSEs

Each ANN’s Results

Training Error Estimation

Element 3: Test Case Verification MultiNetworks Oracle Test Case

Input Vector

Expected Output Vector Comparator

SUT

Actual Output Vector

Verdict report

Figure 4-2: Elements of the Proposed Oracle Model in details

83 Once the output domain is generated, we can provide the training samples and normalize them easily. Then, it is possible to proceed with the second step. Note that in case any testing data are existed that they can be used as training samples (i.e. both the input domain and expected results are pre-existed), it is possible to ignore the first element (i.e. I/O Relationship Analysis) and proceed with training the ANNs. However, if the expected results are not provided already (i.e. only the input classes are provided), I/O Relationship Analysis must be applied to generate them. It is obvious that any training data must be normalized as explained in section 3.7.

4.2.2 Element 2: Multi-Networks Oracle

Before moving to the ANN training, all of the non-numeric inputs and outputs must be normalized into numeric values in order to achieve better ANN accuracy. As explained in section 3.7, non-numeric and text data are normalized to numbers while binary data are treated as zero and one.

Similarly, continuous

numeric data are scaled to [0, 1]. Then, all of the generated normalized I/O vectors are considered to train the ANNs and employed as the training samples. The ANNs are capable to decide which output vector must be chosen for each input vector by learning the training samples. In the training phase, each training sample is used to simulate the SUT functional behavior. The input vectors are provided to the network and each ANN generates the output vectors. These results generated by each ANN are compared with the outputs from the training samples. In order to adjust the networks parameters, the training algorithm backpropagates the error data into the network and modifies the network parameters automatically. Then, the whole process is repeated and continued until adequate error reached, i.e. the adequate MSE. If the training process was not successful enough (i.e. the MSE seems to be large), the network structure needs to be modified by a trial and error process until adequate error is achieved. Note that there is not a practical method to automatically determine how low the network error should be because ANNs are not absolute hence this must be achieved through trial and error. It also depends on the expertise of the tester to make ANNs. Nevertheless, if the

84 training error is very large, there might be incoherencies or ambiguities within the training samples. For example, there may be two different output vectors for a single input vector. Ann-based oracles that comprised of only one ANN may not be able to model the SUT if the software application is too complicated and generates several results. The main drawback of such oracles is they learn the entire functionalities using only one ANN. Thus, if the number of the functionalities or the complexity of them is increased, the single ANN may fail to learn them with enough accuracy. It is why we introduced Multi-Networks Oracle, which uses several standalone ANNs in parallel instead of one, in order to distribute the complexity of the SUT among several ANNs.

Suppose the output domain of the SUT consisted of outputs O1 to On.:

Output Domain = {O1, O2, …,On}

In order to test the software using a Single-Networks Oracle, the entire I/O domain is modeled using only one ANN. In particular, the ANN must learn the required functionalities to generate all outputs itself (see Figure 4.3). On the other hand, using Multi-Networks Oracle, the functionalities associated to each output are modeled by a standalone ANN. In other words, a Multi-Networks Oracle uses several Single-Network Oracles in parallel. Since the complexity of the SUT is distributed among several ANNs, each ANN has less to learn so it eases the training process; thus, it is easier for the ANNs to converge on the training data.

Therefore, because complex software applications may require a huge training dataset, the practicality of Single-Network Oracles to find faults can be reduced. We introduced the Multi-Networks Oracles that use several ANNs instead of one to learn the SUT. To put it differently, a Multi-Networks Oracle is composed of several Single-Network Oracles. A single network is defined for each of the output items of the output domain; then, all of the networks together make the oracle. As an illustration, if the SUT produces seven output items, we need seven ANNs to

85 create the Multi-Networks Oracle. Particularly, the complexity of the software is distributed between several networks instead of having a single network to do all of the learning. Consequently, separating the ANNs may reduce the complexity of the training process and increase the oracle practicality to find faults. Note that the training process must be done for each of the ANNs separately using the same input vectors but only the output to be generated by the ANN. Figure 4.3 shows a SingleNetwork Oracle and Figure 4.4 depicts the Multi-Networks Oracle. There are other ways to distribute the complexity among the ANNs if the software functionalities are increased.

For example, it is possible to consider

software modules instead of outputs. To put it differently, for each module of the software, we can use an ANN to learn its related functionalities. Consequently, we may decrease the complexity by inserting a new ANN to the oracle. Moreover, Multi-Networks Oracles increase the flexibility to use several types of ANNs with different structures and parameters.

4.2.3 Element 3: Test Case Verification

Once the networks are trained, we can apply them in the testing process. The test cases are executed on the trained networks (Multi-Networks Oracle) and the SUT simultaneously. The process of using the proposed oracle was as follows: Having applied the test cases, the SUT was executed using the test cases provided; meanwhile, the proposed oracle was given by the same test case. Actual outputs, which are being evaluated, are generated by the SUT and the expected outputs by the oracle. A comparator was used to apply the thresholds and compare the results. The thresholds define how much distances between the actual and expected outputs may be ignored. To put it differently, they set the comparison tolerance and the oracle precision. The role of the thresholds is discussed in details in the next chapter. Note that the entire process may be automated with minimum human effort to prepare the environment and the necessary dataset.

86

Single-Network Oracle Test Case

Input Vector

ANN1

O1, O2, …, On

Output Vector

Figure 4-3: A Single-Network Oracle

Multi-Networks Oracle O1

ANN1

Test Case

Input Vector

O2

ANN2 ⋮ ANNn

Output Vector

⋮ On

Figure 4-4: A Multi-Networks Oracle

4.3 Evaluation Model

In order to evaluate the proposed framework, we used mutation testing to verify the proposed oracle model. Similarly, to verify the quality of the MultiNetworks Oracle, it must be applied in practice to test a software application because its effectiveness to find faults must be investigated.

The evaluation Model is illustrated by Figure 4.5. In particular, some faults were injected into the SUT to create the Mutated Version. Then, the ability of the proposed oracle to find the mutants was evaluated. The injected faults were chosen between the most common types of mistakes that programmers perform frequently. A Golden Version was required to create the correct expected outputs. Finally, the

87 results from the Golden Version, the Mutated Version and the proposed oracle were compared together using an Automated Comparator. The Automated Comparator measures the distances between the results, compares them to a defined threshold and decides on the oracle results. The results of the comparator can be one of the followings:

1. True Positive: All of the results are the same. Therefore, the comparator reports "No fault". This means there is actually neither any fault in the mutated version nor the oracle. True positive represents the successful test cases. 2. True Negative: Although the expected and the oracle results are the same, they are different from the mutated results. In this case, the oracle results are correct. Therefore, the oracle correctly finds a fault in the mutated version. 3. False Positive: Both the oracle and mutated version produced the same incorrect results. Therefore, they are different from the expected results and faults in the oracle and the mutated version are reported. To put it differently, the oracle missed a fault. 4. False Negative: The mutated and the expected results are the same, but they are different from the oracle results. Thus, the comparator reports a faulty oracle.

88

Test Case

The Oracle

Golden Version

Mutated Version

Oracle Results Expected Results

Comparator

True Negative

True Positive

Mutated Results

False Positive

False Negative

Figure 4-5: The Evaluation Model

The reports from the comparator were considered to measure the quality parameters mentioned in section 3.5.

The model quality was measured through several

experiments with different oracle precisions as well.

In order to verify the proposed oracle model, its ability to find the injected faults was evaluated. The evaluation process was as follows:

1. The test cases were executed on both the Golden Version and the mutated version. 2. Similarly, the test cases were provided to the proposed oracle. 3. The Golden Version results were completely fault free and considered as expected results. 4. All of the results (the results of the oracles, the mutated results and expected results) were compared with each other and any distance between them more than a defined tolerance was reported as a possible fault as explained before. 5. To make sure the oracle results are correct, the outputs of the oracles are compared with the expected results (the Golden Version results) and the squared distances between them are considered as oracle absolute error.

89 In addition, in order to highlight the advantages of the proposed MultiNetworks Oracle, we provided Single-Network Oracles for each proposed oracles and compare their results with each other.

4.4 Data Analysis

The data collected from the study were quantitative, which were gathered from:



The percentage of automated dataset generation



The number of injected faults



The number of faults that the ANN could find successfully



The ANNs MSEs



The oracle misclassification error rate



The number of False Positives



The number of False Negatives



The number of True Positives



The precision of the proposed oracle

4.5 Summary

This chapter explains the elements of the proposed oracle model. Each of the elements uses a specific technique to address one of the identified oracle challenges. The first element applies I/O Relationship Analysis to handle output generation challenge and provide the required training samples automatically. The second element uses Multi-Networks Oracle to map the input domain to the output domain and improve the improper quality of former Single-Network Oracles.

The last

element employs an automated comparator to address the comparison challenge and consider some tolerance in order to adjust the oracle precision.

90 The evaluation model, which was used to measure the quality of the proposed oracle, is described as well. The evaluation model considers mutation testing to inject some faults, which were caused by some mutants, into an altered version of the SUT called Mutated Version, and uses a Golden Version to assess the ability of the proposed oracle finding the injected faults. The quality of the proposed oracle is presented by its accuracy, practicality, precision, and misclassification error, as explained in the next chapter.

CHAPTER 5

DESING AND IMPLEMENTATION OF THE AUTOMATED AND INTELLIGENT ORACLE-BASED TESTING TOOL

This chapter explains the design and implementation of the automated tool that supports the proposed approach. In order to clarify the implementation, the process of providing the proposed oracle model is described using a simple example. An automated test driver and its associated process are presented to show how the testing process can be automated using the proposed oracle model.

5.1 The Design of the Oracle-Based Testing Tool

The design of the proposed oracle begins with providing the training data using I/O Relationship Analysis. The process is explained by Figure 5.1. The first three activities must be performed manually by the tester but the final activity can be automated, as it is explained in the following sections.

92

1. Define I/O Equivalence Classes •Define the Input set (i.e. X) •Define the output set (i.e. Y) •Apply Equivalence Partitioning manually to identify equivalence classes

2. Determine the I/O Relationship •Identify the relationships between inputs and the outputs •Define the inputs that are associated to each output (i.e. X(y) ) •Define the value sets for each input (i.e. D(x))

3. Provide the Reduced Datasets •Define the reduced set for each output (i.e. Tred(y)) •Assing the correct output value for each input combinations that is mentioned in Tred(y)

4. Generate the compelete dataset automatically •Calculate the union of the redeced sets according to Figure 5.3

Figure 5-1: The process of applying I/O Relationship Analysis The above activities must be performed according to the SUT; in particular, in order to apply I/O Relationship Analysis, the I/O domain of the SUT and their relationships are required. Similarly, the number of inputs and outputs are necessary to define the ANNs later. Thus, a simple example is used to show how the proposed oracle can be provided.

5.1.1 Define the I/O Equivalence Classes Let the SUT in our example has three inputs, which their equivalence classes are recognized as shown in Table 5.1. For example, input A is a binary parameter so

93 it can be “True” or “False”, input B values are classified into four equivalence categories, and input C is either “Automatic” or “Manual”.

Table 5-1: The Input Domain Equivalence Classes (D(x)) Input 1

A

2

B

3

C

Values |D| True 2 False 0 1 4 2 3 Manual 2 Automatic

Suppose the output domain consisted of outputs W and Z. According to section 4.2.1, the following parameters can be defined as:

Input Vector X= {A, B, C} Output Vector Y= {W, Z} D(A) = {True, False} ⇒ |D(A)| = 2 D(B) ={0,1,2,3}⇒|D(B)| = 4 D(C) = {Manual, Automatic}⇒|D(C)| = 2 Therefore the size of the complete data set is:

|T| = 2×4×2 = 16 Input A has two equivalence values, B has four, and C has two. As a result, a complete training datasets requires 2×4×2=16 I/O sets to fulfill the entire I/O combinations of the SUT.

Testers must provide expected outputs for 16 cases

manually in case they do not consider I/O Relationship Analysis. Nevertheless, the proposed method applies I/O Relationship Analysis to generate the expected results automatically, as explained as follows.

94 5.1.2 Determine the I/O Relationships

Let the relationships between the inputs and outputs are as shown in Figure 5.2.

Inputs A

B

C

The SUT

W

Z Outputs

Figure 5-2: Sample I /O Relationships within the SUT

As it is shown in the previous section, the following sets should be defined: •

The input set is defined as: X= {A, B, C}



The output set (i.e. the results) are defined as: Y= {W, Z}



Possible values for each input are: D(A)={True, False} D(B)={0,1,2,3,} D(C) ={Manual, Automatic}



Possible values for each output are: D(W)={1,2,3,4} D(Z)={East, West, South, North}



According to figure 5.2, the inputs that are associated to output W and Z are defined as: X(W) = {A,C} X(Z) = {B}

95 Note that in case the I/O relationships are unknown and they cannot be identified manually, other methods such as Structural Analysis or Execution-oriented Analysis can be applied in order to determine them. Structural Analysis studies the source code to identify the relationships between inputs and outputs. On the other hand, Execution-oriented Analysis finds the relationships by changing the inputs while the SUT is being executed and studies the changes to the results.

5.1.3 Provide the Reduced Datasets

The next activity is to provide the reduced data sets as explain in section 4.2.1. Considering the I/O relationships that are shown in Figure 5.1,

( ), i.e. a

reduced set, is defined as the expected outputs that only needs to be generated based on X(y). In other words, expected outputs are only generated by considering the inputs that influence them.

For example, output Z is influenced by input B.

Therefore, Z values need to be generated only for all of B values, without considering other inputs. Tables 5.2 and 5.3 show the necessary reduced datasets and their associated output values.

For example, for any input combination in which

A='False' and C=Manual', output W=1 ('1' is one of the possible values for output W, which is defined by D(W) in the previous section). These tables must be provided by reliable sources such as software specifications or domain experts to ensure the validity of the resulted training data, which they will be generated later based on these data. Note that the reduced data sets must be provided separately for each of the outputs so the first table provides the date for the first output and the second table for the second output.

Table 5-2: Tred (W) (Reduced expected output W values) ( )

Input A

Input C

Output W

1

True

Manual

1

2

True

Automatic

2

3

False

Manual

3

4

False

Automatic

4

96

Table 5-3: Tred (Z) (Reduced expected output Z values) ( )

Input B

Output Z

5

0

East

6

1

West

7

2

South

8

3

North

5.1.4 Generate the Complete Dataset (Training Samples) Automatically

As explained earlier, the complete dataset, i.e. the training samples to be used to create the ANNs, can be generated by merging the reduced sets, which were provided by the previous activities. Figure 5.3 shows the procedure to generate the dataset.

Similarly, Appendix D shows the implementation detail in order to

automatically perform the process.

The complete I/O dataset for this example is shown in Table 5.4. In this example, the output vectors are only generated for eight input vectors. Then, they are applied to provide the other eight output vectors automatically. Consequently, 50% of the output domain is generated automatically. This amount of automation may save huge amount of time and cost in complex and large software applications from which numerous inputs and outputs are involved. Furthermore, this method may reduce the complexity of output domain generation where they are too many conditions influencing the SUT behavior.

97 Table 5-4: The Complete I/O Dataset (Training Samples) Id

Effective

( )

Input A Input B

Input C

Output W Output Z

1

1,5

True

0

Manual

1

East

2

2,5

True

0

Automatic

2

East

3

1,6

True

1

Manual

1

West

4

2,6

True

1

Automatic

2

West

5

1,7

True

2

Manual

1

South

6

2,7

True

2

Automatic

2

South

7

1,8

True

3

Manual

1

North

8

2,8

True

3

Automatic

2

North

9

3,5

False

0

Manual

3

East

10

4,5

False

0

Automatic

4

East

11

3,6

False

1

Manual

3

West

12

4,6

False

1

Automatic

4

West

13

3,7

False

2

Manual

3

South

14

4,7

False

2

Automatic

4

South

15

3,8

False

3

Manual

3

North

16

4,8

False

4

Automatic

4

North

98

Read the Generated Reduced Set

Define the input equavalance classes according to Table 5.1 (D(x))

Merge the Reduced Data Sets by producing the union of the reduced sets

Create the complete Input Vector

Assign the expected outputs to the coressponding input vectors and create the training samples

Save the generated training samples

Figure 5-3: Generating the Complete Dataset (i.e. the merging process)

99 Once the output domain is generated using the process presented in Figure 5.2, we can provide the training samples and normalize them easily. Note that the normalization process is required for any un-numeric data item because ANNs are able to learn from numeric data, as explained in the next section. Then, we can proceed with the ANN training.

5.2 The Implementation of the Oracle-Based Testing Tool

In order to implement the oracles that were designed in the previous section, the ANNs must be defined and trained to create the Multi-Network Oracle. Then, the comparator must be implemented as well.

5.2.1 Define the Multi-Networks Oracle

Defining the oracle means defining the ANNs that make the oracle. Each of the ANNs may have different structures but they all must have the same input layer because their input domain is the same. The number of required ANNs is equal to the size of output set (i.e. Y) that was defined in section 5.1.2. As an illustration, for the case study used in this chapter, two ANNs are enough to make the MultiNetworks Oracle. Similarly, the number of inputs to each ANN is the size of input set (i.e. X), which it is three in this example. Suppose both of the ANNs are Perceptron Multi-Layered neural network. Therefore, the input layer of each ANN has three neurons (one neuron to represent each input), one or more hidden layer (to be determined by trial and error), and one neuron in the output layer (because each ANN generates only one of the outputs). The structure of input and output layers must be the same in all of the ANNs that are made the Multi-Networks Oracle but the hidden layer parameters, activation function, learning rate, and the number of training cycle could be different and should be determined by trial and error.

In order to train the ANNs, testers only define the following parameters manually:

100 •

Number of hidden layers,



Number of neurons in each hidden layer,



Learning rate,



Number of the training cycles.

Adjusting the rest of the parameters is performed by the training algorithm and the tool implements it (NeuronDotNet in our experiments) automatically. Nevertheless, tester needs to monitor the training process and change the above parameters if he/she does not satisfy with the trained ANN. Furthermore, if the training samples are changed due to any software changes, the training process must be repeated.

There are several tools to implements ANNs such as Matlab ANN components, NeuronDotNet for Microsoft Visual Studio.Net, and other ANN components in most of the major IDEs. We considered NeuronDotNet because the development environment was Visual Studio.Net so it was possible to use the provided oracle easily throughout the case studies without requiring any. Nevertheless, testers can use any ANN production package as required.

NeuronDotNet provides the neural network library for .Net Framework. The library makes Microsoft Visual Studio .Net being able to model any neural networks easily with a little coding. This package is used to implement the ANNs. However, it is possible to use any other existing packages as well.

Figure 5.4 shows the scripts to define each of the ANNs. Lines 1 through 5 define the required layers and the connections between them, line 6 creates the Multilayered Perceptron Network, line 7 set the learning rate and lines 8 to 10 declare the training set and I/O vectors. In this example, the input layer has three neurons because the input domain has three inputs (one neuron for each input). The hidden layer and its neurons should be provided by trial and error attempts in order to achieve an adequate neural network. The output layer only consists of one neuron because each ANN is responsible of generating one of the outputs.

101

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

LinearLayer inputLayer = new LinearLayer(InputNeuronCount); ActivationLayer hiddenLayer = new SigmoidLayer(HiddenNeuronCount); ActivationLayer outputLayer = new SigmoidLayer(OutputNeuronCount); new BackpropagationConnector(inputLayer, hiddenLayer); new BackpropagationConnector(hiddenLayer, outputLayer); network1 = new BackpropagationNetwork(inputLayer, outputLayer); network1.SetLearningRate(learningRate); trainingSet = new TrainingSet(InputNumber, OutputNumber); double[] input = new double[InputNumber]; double[] output = new double[OutputNumber];

Figure 5-4: Define The ANNs Using NeuronDotNet Library

The complete dataset, which was provided by the earlier design section, were considered as being the training samples. Nevertheless, as ANNs can only learn from numeric values, it is necessary to normalize all un-numeric data fields into numeric data (the ANN cannot learn from string inputs such as “Sedan”) before assigning them to the training samples. Table 5.5 shows how the input domain in the given example can be normalized.

Table 5-5: Normalized Input Values Input Original Values Normalized Values True 0 1 A False 1 0 1 No normalization is required 2 B 2 3 Manual 0 3 C Automatic 1

Considering the above scenario, since the output domain has two outputs, two ANNs are required to make the Multi-Networks Oracle. The input and output layers of the required ANNs must have the same neuron number, but other parameters such as activation function, learning rate, number of hidden layers, number of hidden neurons and so on can be different as necessary.

102 5.2.2 Create the Multi-Network Oracle

After the ANNs are defined, the next step is to initiate the training process in order to make the ANNs. Using NeuronDotNet can automate the training process. Testers only need to set the learning rate, the number of hidden neurons, choose how many training cycles are required, and leave the tool to train the ANNs using the scripts that are shown by Figure 5.5. However, it is a trial and error process to achieve a high quality network because different parameters must be tried until an adequate one is obtained. The trial and error process needs human observation to make necessary changes. Furthermore, if the SUT is modified so the changes are reflected to the training samples, the training process must be repeated.

Figure 5.5 shows the required coding using NeuronDotNet library. Lines 1 to 8 define the MSE vector to plot the error curve and line 9 trains the ANN being modeled with the provided training samples. The training process starts by choosing some random weights and trying to adjust the weights as the MSE moves toward zero. It concludes when the maximum number of training cycles is reached or the minimum MSE is achieved. After the training is done for all of the ANNs, the networks are ready to use as a Multi-Networks Oracle.

1. 2. 3. 4. 5. 6. 7. 8. 9.

double max = 0d; double[] errorList=new double[]; network1 .EndEpochEvent +=delegate(object network, TrainingEpochEventArgs args) { errorList [args.TrainingIteration] = network1 .MeanSquaredError; max = Math.Max(max, network1.MeanSquaredError); progressBar.Value = (int)(args.TrainingIteration * 100d / cycles); }; network1.Learn(trainingSet, cycles);

Figure 5-5: The ANN Training Code

5.2.3 The Automated Comparator

The final activity is to provide the automated comparator. It is responsible to compare the actual and expected results and report any deviations. Furthermore, it

103 applies some thresholds to set the oracle precision and the comparison tolerance. More information is provided in the next chapter where the thresholds are discussed in detail. Figure 5.6 shows the comparator pseudo code.

1. Start 2. Calculate the distance between the expected result and the actual result as tolerance 3. If thetolerance ≤ threshold then a. The actual result is correct b. Report the test case is successful 4. Else a. The actual resultis not correct b. Report an incident 5. End If 6. End

Figure 5-6: The Automated Comparator Pseudo Code

5.3 Deploying the Automated Oracle

The resulted oracle can be used to automate the testing process. As an illustration, an automated test driver can be defined to perform the testing activities automatically using the proposed oracle model. Figure 5.7 shows a use case diagram that depicts how the tester uses the automated tool. In particular, testers use the test driver to execute and verify the test cases. The test case verification is performed by the proposed oracle model automatically.

104

Automated Test Driver

include

Test Case Verification

include

Tester

include

include

include Test Case Execution

Report Generation

The Proposed Oracle

Figure 5-7: Automated Test Tool Use Case Diagram

The automated test driver creates the test cases, executes them on the SUT, asks the proposed oracle to generate the expected outputs, verify the actual results against the expected results using the defined comparator, and create a comprehensive report, all automatically. Figure 5.8 shows the automated test driver procedure.

105

Define the Test Cases

Load the oracle and give the test cases to the oracle

Get the expected results

Execute the test cases on the SUTand get the actual results

Define the Comparator and set the thresholds

Compare the results using the Comparator

Create the report

Figure 5-8: The Automated Test Driver Procedure

5.4 Summary

This chapter explains the process of how the proposed oracle can be designed and implemented in practice. Since the design and implementation of the oracle requires a defined I/O domain and their relationships, a simple example was used to help clarifying the process.

The design includes the following activities to apply I/O Relationship Analysis in order to provide the required data:

106 1. Define the I/O equivalence classes 1. Determine the I/O relationships 2. Provide the reduced data sets 3. Generate the complete dataset automatically The above activities are described in details and the required materials are provided as well.

Implementing the propose oracle begins by defining the required ANNs and training them using the data produced by the design section. Then the automated comparator must be implemented to make the oracle complete. Finally, it is possible to employ the oracle in order to automate the testing process using an automated test driver, as explained in the next chapter.

The next chapter explains how the proposed oracle is applied to two industrial strength case studies and presents the evaluation results.

CHAPTER 6

EVALUATION

This chapter aims to evaluate and validate the effectiveness of the proposed approach. It describes the experiment of applying the proposed approach on the case studies and shows the results. The experiment section explains the case studies and the process of employing the proposed oracle model to test the case studies. The experiment was conducted by providing both the Single-Network Oracle and the Multi-Networks Oracle in order to compare their results, which the results are shown later in the chapter. At the end of this chapter, the comparative study of the proposed and the existing oracle models is presented.

6.1 Experiment

The first section of this chapter describes the process of implementing the proposed oracle model and required data in details. It illustrates how the proposed approach was provided for each of the case studies. As a result of applying the proposed oracle model, it was possible to develop a fully automated test driver that executes the test cases and uses the proposed oracle to evaluate the results hundred percent automatically. The usability of the automated test driver was assessed using a human survey.

108 Furthermore, in order to increase the proposed approach evaluation quality, the proposed oracle model was assessed using two different case studies, which they are presented in the following.

6.1.1 The First Case Study Experiment

The first case study is a registration-verifier application. The goal of the software is to maintain and manage the students’ records and validate their registration based on the Iranian Universities Bachelor-Students Registration Policies. The policies require a complex logical process based on the students’ data that are given to the software as the input vector and consisted of eight data items. The software and the oracles implement these rules and make decisions on the validity of registrations, the maximum courses the students allow to select and whether they can apply for a discount or not.

Cyclomatic Complexity [40] implies the structural complexity of program code. Particularly, it measures how many different paths must be traversed in order to fully cover each line of the code. It is one of the parameters used to show how complicated the SUT is. Higher numbers mean more testing is required and the SUT is more complicated.

The Cyclomatic Complexity of the registration policies

implemented by the first case study is 18, which was measured using Software Measurements package included in Microsoft Visual Studio.

The input domain consisted of eight inputs, as shown in table 6.1, and the output domain consisted of three outputs depicted by table 6.2. column is provided considering equivalence partitioning.

The “Values”

Similarly, the “|D|”

column is the number of the equivalence partitions. For example, input four has three equivalence classes; hence, the first case study behaves the same for any input 4 values less than or equal to 12.

109 Table 6-1: The first case study Input Domain and D(X) Inputs

Description

1 GPA

The student’s GPA. Whether the student applies for a

2 Semester

short semester or not.

Values

|D|

14 The total number of conditional-

>3

registration the student has used.

≤3

2

≤12, 6 CPA

The student’s CPA.

Between >12 and ≤17

3

>17 The applicant can be a full time or Full Time (True)

7 StudyingMode 8 HasDiscountBefore

part time student.

Part Time (False)

Whether the student applied for

True

discount before or not.

False

2 2

Table 6-2: The first case study Output Domain Outputs Description 1 IsAllowedToRegsiter Is the student permitted to register for the requested semester? To determine the maximum amount of courses that the student 2 MaxAllowedCourses can apply for. 3 Discount

To decide if the student is eligible to apply for discount.

In the following, the process of designing and implementing the oracle model for the first case study is described.

110 6.1.1.1 Define the I/O Equivalence Classes

As explained in the previous chapter, a complete set of input and output vectors is required in order to train the ANNs and cover the entire business rules implemented by the case study.

According to section 4.2.1, I/O Relationship

Analysis requires defining the following variables:



Input Vector X={input 1, input 2, …, input 8}



Output Vector Y={output 1, output 2, output 3}

where inputs and outputs are defined by tables 6.1 and 6.2.

According to table 6.1, and after applying equivalence partitioning, the number of equivalence input vectors (|T|) is determined as:

|T|= |D (1)| * |D (2)| * | D (3)| * … * |D (8)| = 2*2*2*3*2*3*2*2=576

Therefore, the complete training samples require 576 equivalence input vectors and their corresponding output vectors.

6.1.1.2 Determine the I/O Relationships

The case studies were developed by ourselves; thus, the I/O Relationships were known to us:



X(output 1)={ Input 1, Input 2, Input 4, Input 5, Input 7}



X(output 2)={ Input 2, Input 3, Input 6}



X(output 3)={input 1, input8}

For example, output 3 is only related to (i.e. influenced by) inputs one and eight. Figure 6.1 depicts the I/O Relationships of the first case study.

111

Input #8

Input #7

Input #6

Input #5

Input #4

Input #3

Input #2

Input #1

The First Case Study

Output #3

Output #2

Output #1

Figure 6-1: The First Case Study I/O Relationships 6.1.1.3 Providing the Reduced Data Set

Considering the I/O Relationships in figure 6.1, the output vectors are only need to be generated for the inputs that are related to them, and the rest can be provided automatically by calculating the union of the reduced sets as shown in the previous chapter. Tables 6.3, 6.4 and 6.5 illustrate the reduced sets for each of the outputs separately, which all of the outputs were provided by domain experts to make sure they are accurate enough.

112 Table 6-3: Tred (output 1) (Reduced expected outputs for IsAllowedToRegsiter) ID Input 1 >17 1. >17 2. >17 3. >17 4. >17 5. >17 6. >17 7. >17 8. >17 9. >17 10. >17 11. >17 12. >17 13. >17 14. >17 15. >17 16. >17 17. >17 18. >17 19. >17 20. >17 21. >17 22. >17 23. >17 24. ≤17 25. ≤17 26. ≤17 27. ≤17 28. ≤17 29. ≤17 30. ≤17 31. ≤17 32. ≤17 33. ≤17 34. ≤17 35. ≤17 36. ≤17 37. ≤17 38. ≤17 39. ≤17 40. ≤17 41. ≤17 42. ≤17 43. ≤17 44. ≤17 45. ≤17 46. ≤17 47. ≤17 48.

Input 2 Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester

Input 4 Input 5 ≤12 ≤3 ≤12 ≤3 ≤12 >3 ≤12 >3 >12 and ≤14 ≤3 >12 and ≤14 ≤3 >12 and ≤14 >3 >12 and ≤14 >3 >14 ≤3 >14 ≤3 >14 >3 >14 >3 ≤12 ≤3 ≤12 ≤3 ≤12 >3 ≤12 >3 >12 and ≤14 ≤3 >12 and ≤14 ≤3 >12 and ≤14 >3 >12 and ≤14 >3 >14 ≤3 >14 ≤3 >14 >3 >14 >3 ≤12 ≤3 ≤12 ≤3 ≤12 >3 ≤12 >3 >12 and ≤14 ≤3 >12 and ≤14 ≤3 >12 and ≤14 >3 >12 and ≤14 >3 >14 ≤3 >14 ≤3 >14 >3 >14 >3 ≤12 ≤3 ≤12 ≤3 ≤12 >3 ≤12 >3 >12 and ≤14 ≤3 >12 and ≤14 ≤3 >12 and ≤14 >3 >12 and ≤14 >3 >14 ≤3 >14 ≤3 >14 >3 >14 >3

Input 7 Output 1 Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time True Part Time True Full Time False Part Time False Full Time False Part Time True Full Time False Part Time False Full Time False Part Time False Full Time False Part Time False

113 Table 6-4: Tred (output 2) (Reduced expected outputs for MaxAllowedCourses) ID 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60.

Inp2 Short Semester Short Semester Short Semester Short Semester Short Semester Short Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester Normal Semester

Inp3 True True True False False False True True True False False False

Inp6 Output2 ≤12 12 >12 and ≤17 12 >17 12 ≤12 6 >12 and ≤17 6 >17 6 ≤12 24 >12 and ≤17 24 >17 24 ≤12 12 >12 and ≤17 20 >17 22

Table 6-5: Tred (output 3) (Reduced expected outputs for Discount) ID Inp1 Inp2 Output3 False 61. ≤17 True False 62. ≤17 False False 63. >17 True True 64. >17 False

6.1.1.4 Generate the complete dataset

The complete training samples can be provided by calculating the union of Tables 6.3 thorough 6.5 (i.e. ⋃

( ) ). The reduced dataset had 64 training

samples while the rest 512 samples (576-64=512) were generated automatically. Consequently, 89% of the required data were provided automatically by applying the I/O Relationship Analysis.

6.1.1.5 Defining the Oracle

Since the input domain has eight inputs, the input layers must have eight neurons.

Moreover, the output domain has three outputs; therefore, the Multi-

Networks Oracle consisted of three ANNs. On the other hand, only one ANN is required if we consider providing a Single-Network Oracle.

114 Note that all of the ANNs that were used in the experiments are Perceptron Multi-Layered networks with Sigmoid activation function. The activation function must be chosen according to the results to be generated by the ANNs. Since all of the outputs for both of the case study are scaled or normalized to range [0, 1], Sigmoid is adequate.

As mentioned earlier, the experiment was repeated by providing the same Single-network Oracle in order to compare its results with the proposed MultiNetworks one. Table 6.6 shows the structure of the Single-Network Oracle and Table 6.7 depicts the same for the Multi-Networks Oracle. Note that the minimum learning rate that can be applied in NeuronDotNet is 0.01, which it may result in low network error but requires too many training cycle.

Table 6-6: Single-Network Oracle for the first case study Input Neurons # 8

Hidden Neurons # 40

Output Neurons # 3

Learning Rate 0.01

Training Cycles 10000

MSE 0.00589

Table 6-7: The first case study Multi-Networks Oracle training parameters and

30

1

0.01 10000

8

30

1

0.01 10000 0.00001

8

30

1

0.01 10000 0.00005

Network 2 Output #2 (MaxAllowedCourses)

Network 3

Output #3 (Discount)

Total MSE

MSE

Output Neuron #

8

Training Cycles

Hidden Neuron #

Output 1 (AllowedToRegister)

Learning Rate

Input Neuron #

Network 1

Corresponding Output

MSEs

0.0073

0.0024

115 6.1.1.6 Make the Oracles

After the ANNs are defined, they need to be trained using the training samples provided by the previous sections. Once the training is done, all of the ANNs together make the oracle and then, the oracle is ready to be employed by the testing process.

The training quality of the neural networks can be illustrated by Mean Square Error (MSE). It calculates the squared distances between the training sample outputs and the outputs generated by the network at the end of the training cycle. The network parameters (the number of hidden neurons, training cycles and the learning rate) must be adjusted by trial and error in order to minimize the MSE; however, there is not any specific formula to determine the exact parameters. Furthermore, the coverage of the training samples is important to increase the overall network quality. In particular, if we choose training samples that does not cover the entire application logic, it is likely the resulted network may not learn well, even the MSE is very close to zero. Therefore, the training dataset is better to cover all of the business rules being modeled by the network.

Providing better confidentiality of the network, we also measured the absolute error rate of the oracles after applying the test cases, which it shows squared distances between the oracle results and the expected results.

Since we covered the entire I/O domain by applying Equivalence Partitioning and applied all of the identified partitions as training samples, the network predication was not required anymore and the network only needed to learn the training samples (i.e. no ANN generalization was necessary).

Thus, ANN's

Underfitting and Overfitting did not need to be considered. Similarly, as it is shown by Figures 6.2, 6.3, 6.10, and 6.11, no trapping in local minims was seen during the experiment. Moreover, using Multi-Networks Oracles was resulted in converting unlinear functionalities into linear so the speed of learning may be enhanced to avoid slow-learning.

116 Figure 6.2 shows the training error graph for the Single-Network Oracle and Figure 6.3 shows the same for the Multi-Networks Oracle. The MSEs are presented by Tables 6.6 and 6.7.

Figure 6-2: The first case study Single-Network Oracle error graph

117

Figure 6-3: The first case study Multi-Network Oracle error graph

The first case study Multi-Networks Oracle was made of three ANNs because the output domain has three outputs. For example, Network 1 in Figure 6.3 is the first ANN that generates output1; Network 2 is the second ANN that generated output2 and so on. Note that each of the ANNs in the Multi-Networks Oracle only was given by its associated output; for example, the training samples for the second ANN that modeled the second output contained all of the inputs but only output2 because the ANN only generates the second output. Thus, output1 and output3 are not given to the second ANN while training because it does not need to learn them. In particular, the training sample for the second case study is consisted of the following:

118 •

Second ANN Training Sample={ Input 1, Input 2, Input 3, Input 4, Input 5,Input 6, Input 7, Output 2}

6.1.2 The Second Case Study Experiment

The second case study was a web-based car insurance application called “Saina Insurance”.

The purpose of this application is to manage and maintain

insurance records, determine the payment amount claimed by the customers, handle their requests for renewal and other related operations.

The application was

programmed using ASP.Net 3.5, the latest web development technology from Microsoft at that time, and C#.Net programming language.

The DBMS was

Microsoft SQL Server 2008 and the IDE was Microsoft Visual Studio 2008. A three-layered architecture was chosen to design the application: the underlying layer was the database, the middle layer consisted of classes that implemented the business rules and the upper layer provided the user interface. We only considered one module of the application that provided car related insurance services since it was complicated enough for our purpose (the complexity of the case study is discussed later). The module composed of 15 dynamic web pages and 16 implementation classes, which one of the classes called “the main class” was responsible to do the related insurance operations. Particularly, the main class represents the company rules to perform the insurance policies, which they were to be verified by the proposed approach. The testing tool had five web pages and 11 classes to provide the required tools and data, which it was integrated to the case study. One of the testing tools is the automated test driver that executes and verifies the test cases, using the proposed oracle. A separate application was developed to make the oracle and provide it for the testing tools.

The second case study was more complicated than the first one. The case study has an I/O domain comprised of 13824 equivalence partitions as explained later. The business layer of the application was implemented by 1384 lines, which user-interface layer implementation is not included. Furthermore, the Cyclomatic Complexity of the insurance policies implemented by the case study is 38, which was

119 measured using Software Measurements package included in Microsoft Visual Studio.Net 2010. The complexity of the second case study highlights the advantages of the proposed Multi-Networks Oracle over the Single-Network Oracles.

The database composed of five main tables to provide all of the data required by the application. Figure 6.4 depicts the database structure. The tables are:

1. DriverDB Table: This table contains the information of the account holder. 2. Car Table: This table represents the car under insurance that registered to the account holder. 3. Driver_Car Table: This table provides a relationship between the Car and the DriverDB tables to show which car is owned by the account holder. Two data items represent this table and they both are external primary keys. 4. InsuranceDB Table: The insuranceDB portrays the insurance account in the system and maintains related information. 5. AccidentDB Table: This table holds the accidents claimed by the car owner. Each accident represents by a record in this table.

120

Figure 6-4: The Second Case Study Database Tables

Note that the redundancies in the database are due to preventing the application from unnecessary database access overhead costs.

As mentioned earlier in this section, there are 15 web pages providing the user interface of the application. Their contents are generated dynamically based on user requests. Each page has a corresponding class handling the page operations. Figure 6.5 shows a snapshot of the case study home page. The following explains the web pages:

121

Figure 6-5: The Second Case Study Home Page

1. MasterPage.master: This page provides a template for other pages. It contains the logo, navigation system and footer. 2. Default.aspx: This page is the application home page and provides some information regarding the application and the company. 3. Drivers.aspx: This page shows the driver information and the car(s) owned by the driver.

Once a driver is selected, their information

automatically added to the system as a “selected driver”.

122 4. newDriver.aspx: This page creates a new record on the DriverDB table and registers a new driver to the system. 5. updateDriver.aspx: Whenever the driver’s information is changed, this page can be accessed to update their profile. 6. Cars.aspx: The Cars.aspx page provides the user to search and select a car based on its number plate. 7. newCar.aspx: This page creates a new record on the Cars table and registers a new car to the system. After registration, user needs to select the owner of the car once. 8. updateCar.aspx: Whenever the information of the car is changed, this page can be accessed to update its record. 9. Insurance.aspx: The insurance record registered to each car can be shown on this page. 10. newInsurance.aspx: This page creates a new insurance account for the selected car.

If the car and/or its owner do not comply with the

company’s policy, it is not possible to create the insurance account. The InsuranceOperation.cs class (main class) checks the request and decides if the owner is eligible to register for the insurance operation automatically. 11. updateInsurance.aspx: This page can be used whenever the insurance information needs to override manually. 12. extendInsurance.aspx: This page handles the renewal requests. The user can increase/decrease the insured value and choose how many months the insurance can remain valid. Once the request is ready, it will be sent to the InsuranceOperation.cs class to be checked against the insurance rules. If the owner is eligible to extend the insurance, the real credit determines by the class automatically based on the company policies. Otherwise, the insurance account cannot be extended and will be eliminated. 13. Accident.aspx: The registered accidents are shown by this page. Users can select a specific accident to make the payment being processed. 14. newAccident.aspx: This page adds a new accident for the selected car to the system by adding a new record into the AccidentsDB table.

123 15. makePayment.aspx: In order to make a payment for a selected accident, users require navigating to this page.

First, the page shows the

information regarding the accident and the amount claimed by the customer. Then, the InsuranceOperation.cs class is called and asked to verify the request.

After verification, the amount to be paid to the

customer will be decided by the class and notification will be sent to related departments. In case the request is not confirmed, no payment can be done. The InsuranceOperation.cs class performs all of these processes automatically based on the predefined company rules.

InsuranceOperation.cs Class:

This class implements all of the insurance operations and it can be called by any other pages as necessary. The company rules are all portrayed by this class. Particularly, the class is a complex compound-condition module comprised of six functions.

Four of the functions are nested if-the-else structures represent the

insurance policies. The two other functions provide the inputs required by the other functions. The Cyclomatic Complexity of the class is 38; to put it differently, 38 different paths must be traversed in order to test it completely. Figure 6.6 shows the InsuranceOperation class and Figure 6.7 depicts a use case diagram, which it explains how the class is used by the second case study.

The input domain of the class includes:

1. The driver experience (input 1) 2. Type of the driver license (input 2). It can be one of the followings: •

A: For all type of cars and trucks



B: For all type of cars and mid-sized trucks



C: For all type of cars only



D: For sedan and hatchback cars only

3. Type of the car (input 3). It can be one of the followings: •

Sedan

124 •

SUV



MPV



Hatchback



Sport



Pickup

4. The credit remains in the insurance account (input 4) 5. Cost of the accident (input 5) 6. Type of the insurance (input 6). It can be either “Full” or “Third-party”. 7. The car age (input 7) 8. Number of registered accidents in the last year (input 8) There are four outputs to be generated by the class (the output domain):

1. Insurance extension allowance (output 1). A Boolean data item that is true if the insurance is allowed for extension and false otherwise. 2. Insurance elimination (output 2). It is a Boolean data item too. True value means the insurance account is terminated and false means it is not terminated. 3. The payment amount (output 3). This amount decides by the class to pay for accidents based on the claimed amount and the insurance rules. 4. The Credit (output 4). This amount is assigned to each insurance account as its available credit. Every time a payment is done to customers, the paid amount is reduced from this credit.

125

Figure 6-6: The InsuranceOperation Class Scheme

Appendix A lists the implementation of InsuranceOperation.cs class.

The functions of the class are as follows:



Constructors: There are two class constructors to provide the required input values and initialize the class variables.



ExtendInsurance(): This function processes the input data and applied the rules to make a decision on insurance extension. In particular, the output 1 (insurance extension allowance) is generated by this function.

126 •

IsEliminated(): The output 2 (insurance elimination) is decided by this function to determine whether the insurance account must be terminated or not.



MakePayment():Making a payment is the most complex function of the class since so many term and conditions must be considered in order to pay for an accident.

The function applies them all and

generates the third output, i.e. the payment amount. •

CalculateCredit():This function is also a complex one. It applies the insurance policies to determine the credit that can be assigned to each insurance account. This function generates the output 4 (the credit).

Since the InsuranceOperation.cs class is a complicated, important and risky module, and full coverage testing that covers all of the possible paths is required, the testing process may be difficult. Furthermore, providing a complete test oracle for this class requires the testers to understand the company policies and insurance rules completely or using professional domain experts while the testing process all the time. Each of these situations may not be feasible or they can be very expensive; therefore, an automated oracle to test the class may decrease the testing complexity and its cost significantly. We applied our approach to provide an automated test oracle facilitating the testing process. However, we needed to prepare the necessary data and the environment before proceeding to the testing process.

127

Saina Insurance: Automobile Insurance Module

Extend Insurance

include

Make Payment include Insurance Operations

Register New Accident Operator

Register New Insurance

Update Insurance

include

include

Update Customer Info

Figure 6-7: Saina Insurance Automobile Insurance Module Use Case Diagram

128 The process of designing and implementing the oracle model for the second case study is similar to the first one. Nevertheless, more implementation details are provided here:

6.1.2.1 Define the I/O equivalence classes

Regarding to I/O Relationship Analysis, the following variables were defined based on the specifications of the case study:



Input Vector X={input 1, input 2, …, input 8}



Output Vector Y={output 1, output 2, output 3, output 4}

where inputs and outputs are defined in the previous section.

After applying equivalence partitioning, possible input classes were identified as shown in Table 6.8, and according to it:

|T|= |D (1)| * |D (2)| * | D (3)| * … * |D (8)| = 3*4*6*2*3*2*4*4= 13824

Therefore, the case study has 13824 equivalence input classes. In order to provide complete training samples, we might provide 13824 input vectors and corresponding output vectors manually if we would not have applied I/O Relationship Analysis.

129 Table 6-8: The second case study D(X) (Input Values) Inputs 1 The driver experience

Values

|D|

Less than 5 years Between 5 and 10 years

3

More than 10 years 2 Type of the driver license

A, B, C, D

3 Type of the car

Sedan

4

SUV MPV Hatchback

6

Sport Pickup 4 The credit remains in the insurance account 5 Cost of this accident

Less than 100 Rial More than 100 Rial

2

Less that 25% of the initial credit Between 25% and 50% of the initial credit

3

More than 50% of the initial credit 6 Type of the insurance

Full Third Party

7 The car age

2

Less than 10 years Between 10 to 15 years Between 15 to 20 years

4

More than 20 years 8 Number of registered accidents the last year

0 Between 0 and 5 Between 5 and 8

4

More than 8

6.1.2.2 Determine the I/O Relationships

As we studied the case study and questioned the domain experts, I/O relationships were discovered as:

130 •

X(output 1) = {input3, input7, input 8)



X(output 2) = {input 4, input 8}



X(output 3) = {input 1, input 2, input 5, input 6}



X(output 4) = {input 1, input 2, input 7, input 8}

Figure 6.8 shows the I/O relationships.

Input 8

Input 7

Input 6

Input 5

Input 4

Input 3

Input 2

Input 1

Saina Insurance

Output 4

Output 3

Output 2

Output 1

Figure 6-8: I/O relationships of the second the case study

6.1.2.3 Provide the Reduced Data Sets

Using I/O relationships, we only needed to provide the expected outputs for the inputs that influence them. Tables 6.9 to 6.12 show some of the reduced sets as discussed in chapter 5. The expected outputs for the reduced sets were generated by domain experts. The complete list is provided by Appendix B.

131 Table 6-9: Tred (output 1) (Reduced expected output values for Insurance extension allowance) (

# )

Input #3

Input #7

Input #8

Output #1

1

Sedan

Less than 10

0

True

2

Sedan

Less than 10

Between 0 and 5

True

3

Sedan

Less than 10

Between 5 and 8

True

4

Sedan

Less than 10

>8

False

5

Sedan

Between 10 to 15

0

True

6

Sedan

Between 10 to 15 Between 0 and 5

True

7

Sedan

Between 10 to 15 Between 5 and 8

True

8

Sedan

Between 10 to 15

>8

False

9

Sedan

Between 15 to 20

0

True

10

Sedan

Between 15 to 20 Between 0 and 5

True

11

Sedan

Between 15 to 20 Between 5 and 8

True

12

Sedan

Between 15 to 20

>8

False

13

Sedan

>20

0

False

14

Sedan

>20

Between 0 and 5

False

15

Sedan

>20

Between 5 and 8

False

16

Sedan

>20

>8

False

17

SUV

Less than 10

0

True

18

SUV

Less than 10

Between 0 and 5

True

19

SUV

Less than 10

Between 5 and 8

True

20

SUV

Less than 10

>8

False

21

SUV

Between 10 to 15

0

True

22

SUV

Between 10 to 15 Between 0 and 5

True

23

SUV

Between 10 to 15 Between 5 and 8

True

>8

...

>20

...

Pickup

...

...

... 96

False

132 Table 6-10: Tred (output 2) (Reduced expected output values for Insurance elimination) (

# )

Input #4

Input #8

Output #2

97

Less than 100 $

0

False

98

Less than 100 $

Between 0 and 5

False

99

Less than 100 $

Between 5 and 8

False

100

Less than 100 $

More than 8

False

101

More than 100 $

0

True

102

More than 100 $ Between 0 and 5

True

103

More than 100 $ Between 5 and 8

True

104

More than 100 $

False

More than 8

Table 6-11: Tred (output 3) (Reduced expected output values for the payment amount) Input #2

Input #5

Input #6

105

Less than 5

A

Less than 25% of the initial credit

Full

106

Less than 5

A

107

Less than 5

A

108

Less than 5

A

109

Less than 5

A

Between 25% and 50% of the initial credit Between 25% and 50% of the initial credit More than 50% of the initial credit

110

Less than 5

A

More than 50% of the initial credit

...

...

...

176

More than 10

D

# )

Less than 25% of the initial credit

Full Third Party Full Third Party ...

... More than 50% of the initial credit

Third Party

Third Party

Output #3 50% of the claimed amount 40% of the claimed amount 35% of the claimed amount 28% of the claimed amount 25% of the claimed amount 20% of the claimed amount ...

Input #1

(

20 % of the claimed amount

133 Table 6-12: Tred (output 4) (Reduced expected output values for the credit) Input #2

Input #7

Input #8

Is Sporty

177

Less than 5

A

Less than 10

0

False

178

Less than 5

A

Less than 10

Between 0 and 5

False

179

Less than 5

A

Less than 10

Between 5 and 8

False

180

Less than 5

A

Less than 10

More than 8

False

181

Less than 5

A

Between 10 and 15

0

False

182

Less than 5

A

Between 10 and 15

Between 0 and 5

False

183

Less than 5

A

Between 10 and 15

Between 5 and 8

False

184

Less than 5

A

Between 10 and 15

More than 8

False

185

Less than 5

A

Between 15 and 20

0

False

186

Less than 5

A

Between 15 and 20

Between 0 and 5

False

187

Less than 5

A

Between 15 and 20

Between 5 and 8

False

188

Less than 5

A

Between 15 and 20

More than 8

False

189

Less than 5

A

Between 10 and 15

0

True

190

Less than 5

A

Less than 10

0

False

...

...

...

...

...

...

336

More than 5

D

More than 20

More than 8

False

# )

Output #4 70% of the Requested Credit 52.5% of the Requested Credit 35% of the Requested Credit 0% of the Requested Credit 56% of the Requested Credit 42% of the Requested Credit 28% of the Requested Credit 0 % of the Requested Credit 33.6 % of the Requested Credit 31.5% of the Requested Credit 21% of the Requested Credit 0% of the Requested Credit 42% of the Requested Credit 70% of the Requested Credit ...

Input #1

(

0% of the Requested Credit

134 6.1.2.4 Generate the Complete Dataset

As can be seen in tables 6.9 to 6.12, we only provided 336 expected outputs manually to create the reduced sets. The other necessary 13488 input vectors and corresponding expected output vectors were generated automatically by merging the generated reduced sets; therefore, 97.5% of the required data were provided automatically using I/O Relationship Analysis.

Appendix D shows the

implementation code of the automated dataset generation process, which was shown in the previous chapter in Figure 5.2.

Since the reduced sets address all of the I/O combinations, it is very easy to generate more additional samples as many as required easily if the ANN training requires more samples. As an illustration, we could generate more than 100000 train samples automatically only by changing the loop counter value in Appendix D line 91.

The merging process was done using the scripts shown by Appendix D. First, the script reads the reduced sets from the created text files in previous section. Then, it initializes the equivalence input classes (i.e. D(X)) based on Table 5.6. Finally, it calculates the union of the reduced sets, presents it into a complete set, and assigns their corresponding outputs. The entire process was performed 100% automatically.

6.1.2.5 Define the Oracles

Regarding the second case study, a Single-Network Oracle was trained to generate all of the four outputs. To put it differently, one network was considered to model the entire insurance policies for all of the outputs. More than 20 different Multilayer Perceptron networks were tried and a network with eight input neurons, one hidden layer with 30 neurons, and four output neurons (8*30*4) was selected. It could present the minimum MSE=0.0046 after 10000 training cycles and learning rate=0.01.

135 On the contrary, the Multi-Networks oracle for the second case study was easier to achieve because separating the networks made the learning process easier. In addition, since each ANN in Multi-Networks Oracle is a standalone neural network, we had more flexibility to define the networks with different parameters without interfering with others in order to obtain better accuracy.

The resulted networks were more accurate than the previous attempt and the MSEs were reasonable. Table 6.13 summarizes the Multi-Networks Oracle training parameters and the resulted MSEs for the second case study.

Table 6-13: The second case study Multiple-Network Oracle training parameters and MSEs Training Cycles

8

13

1

0.01 1400 0.00025

Network 2

Output #2 (Insurance Elimination)

Binary

8

3

1

0.1

150

0.00002

Network 3

Output #3 (The Payment Amount)

Continuous

8

30

1

0.25 2000

0.0023

Network 4

Output #4 (Credit)

Continuous

8

30

1

0.25 2000 0.00048

Total MSE

MSE

Binary

Learning Rate

Input Neuron #

Output Neuron # Hidden Neuron #

Type of the output

Output #1 (Insurance Extension allowance)

Corresponding output

Network 1

0.00076

As can be seen in Table 6.13, the networks MSEs are very close to zero; thus, as explained in the next sections, the resulted oracle was more reliable than the

136 Single-Network Oracle as the Multi-Networks Oracle MSE was equal to 0.00076 and the other one was 0.0046. Note that all of the networks were trained using the very same training sample and tool.

The second case study normalization was done using the code shown by Figure 6.9, which it is the implementation of Table 3.2. 'True' values of the binary parameters were considered as one and 'False' values were regarded as zero. Similarly, any continuous value was normalized to [0, 1]. Regarding the discrete and text based inputs, they were mapped to some integer values. As an illustration, the 'A' value of input2 was regarded as one, 'B' as two and so on.

137

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52.

for (int i = 0; i < trainingSamples.Count; i++) { string[] t = trainingSamples[i].ToString().Split('/'); //Setting input vector input[0] = Convert.ToDouble(t[0]); switch (t[1]) { case"A": input[1] = 1d; break; case"B": input[1] = 2d; break; case"C": input[1] = 3d; break; case"D": input[1] = 4d; break; } switch (t[2]) { case"Sedan": input[2] = 1d; break; case"SUV": input[2] = 2d; break; case"MPV": input[2] = 3d; break; case"Hatchback": input[2] = 4d; break; case"Sport": input[2] = 5d; break; case"Pickup": input[2] = 6d; break;} if (t[3] == "True")input[3] = 1d; else input[3] = 0d; input[4] = Convert.ToDouble(t[4]); if (t[5] == "Full") input[5] = 1d; else input[5] = 0d; input[6] = Convert.ToDouble(t[6]); input[7] = Convert.ToDouble(t[7]); //Setting output vector if (t[8] == "True")output[0] = 1d; else output[0] = 0d; if (t[9]== "True")output[1] = 1d; else output[1] = 0d; output[2]=Convert.ToDouble(t[10]); output[3]=Convert.ToDouble(t[11]); trainingSet.Add(newTrainingSample(input, output)); }

Figure 6-9: The Second Case Study normalization process (i.e. Table 3.2) source code

138 6.1.2.6 Make the Oracle

The final step is to provide the error graph and train the networks. Figure 6.10 shows the training error graph for the Single-Network Oracle and Figure 6.11 shows the same for the Multi-Networks Oracle. The MSEs are presented by the previous section.

Figure 6-10: The second case study Single-Network Oracle error graph (8*30*4)

All of the oracle training and making activities were performed using a tool that we developed. Figure 6.12 shows a snapshot of the tool. In order to make the oracle and train the ANNs, the user simply loads the training samples and set the training parameters. Then, the tool applies the procedure presented in the previous chapter and trains the networks automatically.

139

Figure 6-11: The second case study Multi-Network Oracle error graphs

140

Figure 6-12: The Multi-Networks Oracle Producer

6.1.3 The Automated Test Driver

As mentioned earlier, an automated test driver was developed to show how the test oracle model was applied in practice. It creates the test cases and executes them on the case study; then, it asks the proposed oracle to generate the expected

141 outputs, verify the mutated and oracle results against the expected results using its comparator. Finally, the test driver creates a comprehensive report. Note that it performs all of the above activities automatically. The tester only needs to set the comparator precision (i.e. the thresholds), choose how many test cases must be created, and leave the tool to do the rest automatically. In addition, it is possible to create a test case manually using any test case generation method, such as Equivalence Partitioning, and run it automatically. Figure 6.13 shows a snapshot of the automated test driver. Similarly, Appendix F lists its implementation details.

The automated driver generates a comprehensive report. It shows the test case inputs and the normalized oracle input vectors, the output vectors generated by the oracle and the mutated version and the expected outputs from the golden version. Furthermore, it presents the comparator results as well. Figure 6.14 depicts a report sample generated by the tool.

In order to verify the usability of the automated test driver, a human survey was conducted asking 15 software engineers to complete a questionnaire after they used the tool. The target population of the survey was postgraduate Computer Science and Information Technology students at Universiti Teknologi Malaysia. Table 6.14 shows the questionnaire and the results. Number 9 in the table shows the usability of the test driver.

As can be seen in the table, the target population agrees the tool automates the testing process.

Similarly, it is confirmed that the usability of the tool is

acceptable. Furthermore, most of the people who attended the survey are willing to consider the test driver for their future testing projects.

142 Table 6-14: Test Driver Usability Survey Questionnaire and Results Questions

1

2

What is your experience in software testing?

What do you think about test automation?

Answers I am a software tester I test the software sometimes I have not tested a software yet but I have the basic knowledge I do not know what software testing is It can be helpful

It cannot be helpful I do not know Yes Do you have any experience using 3 No test automation tools? I am not sure Please answer the following questions regarding the test driver: Do you think the tool automates the 4 testing process?

Yes No I do not know Yes

Do you think the automation offers by 5 the tool is adequate?

Kind of No I do not know Yes

6 Do you think using the tool is easy?

Kind of No I do not know Yes

7

Do you think using the tool may increase software quality?

8

Are you willing to consider the tool in your software testing future projects?

9

What is your overall opinion about the tool usability?

Kind of No I do not know Definitely Maybe Not at all I do not know Excellent Good Fair Bad No opinion

Responses 2 persons (13%) 7 persons (47%) 6 persons (40%) 0 people (0%) 15 persons (100%) 0 people (0%) 0 people (0%) 8 persons (53%) 7 persons (47%) 0 people (0%) 15 persons (100%) 0 people (0%) 0 people (0%) 13 persons (87%) 2 persons (13%) 0 people (0%) 0 people (0%) 14 persons (94%) 1 person (6%) 0 people (0%) 0 people (0%) 13 persons (87%) 2 person 13%) 0 people (0%) 0 people (0%) 9 persons (60%) 6 persons (40%) 0 people (0%) 0 people (0%) 6 (40%) 9 (60%) 0 people (0%) 0 people (0%) 0 people (0%)

143

Figure 6-13: The automated Test Driver for the Second Case Study

Figure 6-14: An Automated Test Driver Report Sample

144 6.2 Experimental Results

This section explains the results of the experiment. The neural networks training quality is already discussed by the previous section. Hence, the results of applying the generated oracles on the case studies are presented here. The quality benchmarks, which are explained by chapter 4, were measured to assess the quality of the proposed approach. In order to evaluate the quality of the Multi-Networks Oracles, two Single-Network Oracles for both of the case studies were provided too. Moreover, a comparative study between the proposed oracle and existing oracles is presented.

6.2.1 Implementing the Evaluation Model

After the neural networks were trained, it was possible to use them as automated oracle. However, the resulted oracles should be evaluated and the quality parameters (i.e. accuracy, precision, practicality and misclassification-error) must be measured. As explain before in chapter 4, we assessed the proposed approach using mutation testing by developing two versions of the case studies. The first version is a Golden Version, which it is a complete fault free implementation of each case study to generate the expected results. The other version is a Mutated Version that was injected with ordinary programming mistakes.

Since the neural networks are only an approximation of the case studies, some of their outputs may be incorrect. On the other hand, the mutated classes themselves may produce errors, which finding their faults is the main reason for the testing process. Therefore, the results of the comparison can be divided into four categories, as explained in section 4. Table 6.15 summarizes the comparison results [13]. Note that categories 3 and 4 represent the oracle misclassification error.

145 Table 6-15: The comparison results categorization

The Mutated Outputs

Correct Wrong

The Oracles Outputs Correct Wrong True Positive False Negative True Negative False Positive

6.2.1.1 Mutation Testing the First Case Study

The first case study was mutated by 14 mutants. As a result, the mutated code injected some faults into the SUT and the oracles were asked to find them. The mutated code was selected based on ordinary programming mistakes such as operator changes, argument changes, value changes and typecasting mistakes.

6.2.1.2 Mutation Testing the Second Case Study

Table 6.16 shows the mutants regarding the second case study. As can be seen, the mutants are changes that were made to the source code such as operator changes, argument changes, value changes, typecasting mistakes, and combinations of these changes. The changes injected several faults into the case study.

The Insurance-Operation.cs class (Appendix A) was mutated by 22 mutants. Similar to the first case study, the mutated code were selected based on ordinary programming mistakes.

The mutated class is saved as Mutated-Insurance-

Operation.cs class and presented by Appendix E.

The oracle evaluation process is performed automatically via the automated test driver (a web page called Testing.aspx), which it is shown by Appendix F. It automatically creates random valid test cases (i.e. Appendix C) and executes them on the Golden Version, the oracle and the mutated class simultaneously. Then, it compares all of the results and generates a report. The report includes the number of executed test cases, thresholds (explained in next subsection), oracle absolute errors, the number of false positive (i.e. the missed faults) and false negative faults, practicality or detected faults, misclassification error, and the oracles accuracy.

146 Furthermore, the tester can create manual test cases and verify them automatically as well. More details are provided by the next sections.

147 Table 6-16: The second case study mutants Mutated Code

1

113 if (Credit 10)

10

if (((CarAge > 15) && 179 (CarAge 10) && (CarAge 20)

if (CarAge < 20)

12

205 if (CarAge 5) && (driverExperience