An approach for early prediction of software reliability

ACM SIGSOFT Software Engineering Notes Page 1

November 2010 Volume 35 Number 6

An Approach for Early Prediction of Software Reliability Sirsendu Mohanta Dept. of Computer Science and Engg. IIT Kharagpur, India - 721302 e-mail: [email protected]

Gopika Vinod Bhabha Atomic Research Centre Mumbai, India - 400085 e-mail: [email protected]

A. K. Ghosh Bhabha Atomic Research Centre Mumbai, India - 400085 e-mail: [email protected]

Rajib Mall Dept. of Computer Science and Engg. IIT Kharagpur, India - 721302 e-mail: [email protected]

Abstract

GT02] predict system reliability based on two important assumptions. First, the system reliability is assumed to be dependent on the reliabilities of the individual components of a system and the interactions among them. Second, reliabilities of the individual components of a system are assumed to be known. In practic, often reliability of the components are not available. A few approaches [RSP03, CRMG08, CSC02] address this issue by determining a component’s reliability either in terms of the services offered by it or through traditional code-based testing techniques [Hua05].

In the early stages of development, failure information is not available to quantitatively measure reliability of a software product. In this context, we propose an approach to predict software reliability early in the product development stages from design metrics. First we predict reliabilities of the components of a system. For this, we categorize the different kinds of faults that can occur in a component during its development and identify the design metrics that correlate to these faults. We construct a Bayesian Belief Network (BBN) model to predict reliabilities of the components using the identified design metrics. Based on predicted reliabilities and usage frequencies of the components of a system, we determine the reliability of the system. The applicability of our proposed model is illustrated through a case study. Results obtained from our case study indicate the effectiveness of our approach.

We propose a bottom-up software reliability prediction approach, which we have named as Early Software Reliability Assessment (ESRA). In our approach, we view every system to be composed of one or more components. We determine the component reliabilities from the design characteristics of the components. We predict the reliability of a system based on the reliabilities of its components

Keywords: Software reliability, fault model, design metrics, early prediction of reliability, Bayesian belief network.

1

We focus on predicting the reliability of object-oriented programs. This choice is primarily motivated by the increasing commercial and academic importance of the objectoriented paradigm. Consequently, we consider classes as the basic components in our approach. First we predict reliabilities of the classes. However, accurate reliability prediction of classes early in the product development cycle is a major challenge, as neither actual field failure information nor failure information collected during testing is available. In this context, we construct a fault model to categorize different kinds of possible code faults and we determine the different design metrics correlated to different categories of faults. We construct a Bayesian belief network (BBN) to determine the reliability of a class from the identified design metrics. We determine the use case reliabilities based on an analysis of the class reliabilities. The system reliability is predicted based on the use case reliabilities and the operational profile.

Introduction

Software has become an integral part of many complex applications such as nuclear power generation, medical instrumentation, aviation, and satellite communication. Many of these applications are safety-critical in nature requiring highly reliable operation. Ensuring required reliability of these applications has become a major challenge. Researchers have proposed various approaches [CHB+ 08, BHXN05, Hua05] to estimate the reliability of software products in the later stages of product development. A major difficulty in using these approaches [CHB+ 08, BHXN05, Hua05] is that by the time reliability figures become available, it becomes too late and too costly for making any major design changes to achieve specific reliability goals. This has prompted several researchers to focus on predicting software reliability during the early stages of product development. Several reported early reliability prediction approaches [CSC02, GT02, YCA99, RSP03, CRMG08, MRDL92, TM05] targeted predicting system reliability by analysing the architecture of the software. Most of these approaches [YCA99,

The rest of this paper is organized as follows. Section 2 discusses some basic concepts on which our approach is based. In section 3, we present our approach. We discuss a case study in section 4. In section 5, we compare our approach with related work. Section 6 concludes this paper.

1

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403


2


Background

C 1 0 1 0

In this section, we review a few basic concepts to enhance readability of the later sections of this paper.

2.1

Bayesian Belief Network

p(F = 0 |C, E) 0.67911 0.627968 0.251 0.962458

Construction of DAG: In a BBN, nodes or variables are used to express the events or objects. A given problem can be modeled with the behaviour of these variables. First, the independent and dependent variables of a given problem are identified. A node is drawn for each corresponding dependent or independent variable. These nodes are linked using directed edges based on the dependencies among the variables. Assignment of Probability Distributions: In this step, we assign probability distribution to each variable which corresponds to a node in the constructed DAG. For this, a node probability table (NPT) is assigned to each node in the constructed DAG. A NPT contains conditional probability density (CPD) of a node given its parents. An example of a NPT for node F of Figure 1 is shown in Table 1. The CPD for a node can either be derived from experimental observations or elicited from experts. NPTs for root nodes or the nodes without their parents contain probability distributions of these nodes. Inference: Once the BBN model constructed, the joint probability distribution of all variables in the BBN is determined using Bayesian theory. Given a BBN, the joint probability distribution over variables (D1 , . . . , Dn ) is determined by equation (1). Probability distribution of the variables of our concern is determined by marginalising over the joint probability of all variables. Probability distribution of a variable can be determined by equation (2). The probability computation of a desirable node given a model is known as probabilistic inference.

D

B E

C

p(F = 1 |C, E) 0.32089 0.372032 0.749 0.0375415

Table 1: An example of NPT for node F

A Bayesian belief network (BBN) is a directed acyclic probability graph (DAG). A node(vertex) of this graph represents a discrete or continuous random variable. Here, we will use the term node and variable interchangeably. The directed edges represent causal dependencies between variables. When two vertices are connected with a directed edge, the vertex at the tail of the edge is called child and the other one as parent. The parent nodes which have only outgoing edges and no incoming ones are called root nodes. An example of BBN is shown is Figure 1. In this figure, A, B, D are the parents of C, and D is also the parent of E. C, E are the parents of F. In Figure 1, nodes A, B, D have only outgoing edges and no incoming edges, so they are the root nodes. The variable

A

E 1 1 0 0

F

Figure 1: An example of a BBN represented by a root node is not influenced by any other variable. The way in which a variable depends on other variables is determined by conditional probability distribution of that node given its parent nodes. Formally, a Bayesian network is defined as the set {D, S, P}, where 1. D is a set of random variables. D = {D1 , D2 , . . . , Dn }. Each Di can be discrete or continuous. D can also contain a mixture of discrete and continuous random variables.

p(D1 ∩ D2 ∩ . . . ∩ Dn ) =

n Y

p(Di |Parents(Di ))

Given this joint probability, the marginal probability of a random variable Di is computed as

2. S is a set of conditional probability distributions (CPD). S = {p(D1 |Parents(D1 )), . . . , p(Dn |Parents(Dn ))}, Parents(Di ) indicates all the parent nodes for Di , p(Di |Parents(Di )) is the conditional distribution of Di given its parents.

p(Di ) =

n X

p(D1 ∩ D2 ∩ . . . ∩ Dn )

(2)

Dj ,j6=i,j=1

3. P is a set of probability distributions or marginal distributions of the random variables. P = {p(D1 ), . . . , p(Dn )}, where p(Di ) represents probability distributions of Di .

2.2

Design Metrics

In the past, a large number of investigations [ZL06, OEGQ07] have been reported in the literature to find out any correlations between software design metrics and code faults. Researchers have reported several metric suites to predict code faults. Metric suites proposed by Chidamber and Kemerer [CK94], Briand et al [AHSP+ 02] have been frequently referred in literature. Chidamber and Kemerer(CK) metric

A BBN model is used in many applications to solve the problems related to uncertainty due to availability of minimum amount of data or incomplete data sets. The BBN model for a given problem is developed using the following steps.

2

DOI: 10.1145/1874391.1874403

(1)

i=1

http://doi.acm.org/10.1145/1874391.1874403


System Reliability Prediction

System Reliability

without trying to find out the correlation of the metrics with specific categories of faults. We investigate the design metrics that correlate significantly with the specific categories of faults. We construct a BBN and incorporate the identified design metrics to predict class reliability. These steps are elaborated in the following.

Operational Profile of Use Case

Use Case Reliability System Reliability Prediction UML Class Diagram

3.1.1

Use Case Reliability Prediction

BBN Construction

Design Metrics Computation

Design Metrics

UML Sequence Diagram

Scenario Reliability Prediction

Fault Model Class Reliability Prediction

Class Reliability


Class Level Reliability Prediction Legend Has an Impact

Artifacts

Functional blocks

Fault Model Construction

A fault model for a class is a classification of the different types of faults that can occur in a class code. At a very high level, faults occurred in class code can be classified as programming faults and algorithmic faults [Bin96]. Programming faults are caused due to human errors and unfamiliarity with the programing constructs by a programmer. It can further be classified into structural faults, and traceability related faults. Structural faults can be divided into procedural code faults and object-oriented code faults types. Procedural code faults may occur due to large program size, overly complex program structure. The fault model of a class has been shown in Figure 3. In the following, we give a few examples

Operational usage of scenarios

Scenario Reliability

BBN


Numerical Values

Figure 2: Schematic representation of our Early Software Reliability Assessment (ESRA) approach Faults

suite include of object-oriented metrics, such as Coupling Between Objects (CBO), Depth Of Inheritance Tree (DIT), Lack Of Cohesion Of Methods (LCOM), Num Of Children (NOC), Response For Class (RFC), and Weighted Method per Class (WMC). The metric suite proposed by Briand et al [AHSP+ 02] named as MOOD (metrics for object-oriented design) contains metrics, such as Method Hiding Factor (MHF), Attribute Hiding Factor (AHF), Method Inheritance Factor (MIF), Attribute Inheritance Factor (AIF), Polymorphism Factor (PF), Coupling Factor (CF). In addition to CK and MOOD metric suite, other metric suites such as QMOOD (quality metrics for object-oriented design)[BD02], and L&K [LK94] have been reported in the literature.

3

Programing Faults

Traceability related Faults

Structural Faults

Procedural Faults

Faults related to Size and Complexity

Object-oriented Faults

Inheritance related Faults

Polymorphic Faults

State-transition related Faults

Figure 3: A fault model for a class of procedural faults.

ESRA: Our Early Reliability Approach

Operator precedence error: This kind of fault occurs when a programmer either omits required parentheses, or inserts parentheses in the wrong place. Non terminating loop: In this type of fault, a loop does not terminate, which cane possibly be caused when the logical conditions that govern looping are wrongly formulated. Incorrect bounds: This kind of faults are caused when a developer either tries to access the value of an array item outside of its bound or he tries to access the memory address outside of memory address range. Interface mismatch: This type of faults occurs during method calls with incorrect parameter values. Faults due to inappropriate data structure usage: This kind of faults occurs when a large value is stored in a data structure incapable storing the value. For an

In this section, we present our ESRA approach to predict the reliability of a system. As it can be seen in Figure 2, ESRA consists of three parts: class level reliability prediction, use case reliability prediction, system reliability prediction. We explain important parts of our ESRA approach in the following subsections.

3.1

Algorithmic Faults

Class Level Reliability Prediction

During development of the source code of a class, several types of faults may creep in. We construct a fault model of a class to categorise these different kinds of faults. In recent past, researchers have carried out investigations to find out correlations between design metrics and software faults

3

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403


example, if a very large number exceeding the storage range of type integer is stored in a variable of type integer, the number will be incorrectly stored and produce wrong output. Incorrect initialization of variables: This category of faults occurs when a programmer either initialize variables with an incorrect value, or forgets to initialize variables with any value. Improper conditions: This type of faults is caused by inappropriately use of if-else construct. In this type of fault, either a unrequired action is performed, or a required action is not performed.


Setting up incorrect preconditions for an algorithm: This kind of faults occurs when an algorithm starts working even if few specific preconditions are not satisfied. Typographic error in an algorithm: This categories of faults may occur when a wrong or illogical statement is written in an algorithm. For an example, a algorithmic statement is changed form n = 0, 1, 2, . . . to n = 1, 2, 3, . . . due to a mistake. 3.1.2

Object-oriented faults may occur due to deep inheritance structure, polymorphism, complexity of state model of a class. UML state model is used to depict various states that an object of a class may be in during its execution and the transitions between those states. Several examples of inheritance and polymorphic faults are discussed in [OAW+ 01, Bin96]. A few examples of state-transition based faults are given in the following.

Identification of Design Metrics Correlated to Specific Faults

We identify design metrics that have some bearing on the different categories of faults. Chances of occurrence of structural faults is high if the program size is too large or it is too complex to implement [FN99]. In the literature, various design metrics have been reported to estimate the size and complexity of software. In the recent past, a large number of studies have been carried out for empirical validation of software design metrics suites to predict fault proneness. A similar study has been performed by Olague et al. [OEGQ07]. In this study they performed a bivariate correlation analysis between defects and three design metrics suites, CK, MOOD, QMOOD. They have also performed a logistic regression of design metrics versus faults to determine the metrics that are significant indicators of faults. Based on their results, we chose the size and complexity metrics. According to [OEGQ07], number of methods, weighted methods per class, response for a class, class interface size, data access metric correlate strongly with faults. These design metrics are defined in the following.

Missing transition[Bin96]: This kind of faults occurs when the developers overlook certain transitions between states in the implementation. Incorrect transition[Bin96]: This kind of faults may occur when transitions among the certain states in a UML statechart diagram of the class are incorrectly implemented. Transition to incorrect sate[Bin96]: This kind of faults occur when a transitions in the state model leads to a state which is invalid and spurious. Missing states[Bin96]: This kind of faults can be caused when the developers overlook to implement certain states of the state model.

Number of Methods (NOM) [BD02]: It is count of the number of methods in a class. Data Access Metric (DAM) [BD02]: This metric is defined as the ratio of the number of private, protected attributes to the total number of attributes declared in a class. Weighted Methods per Class (WMC) [CK94, CDK98]: This metric is defined as the sum of the complexities of all methods in a given class. Response for a Class (RFC) [CK94, CDK98]: It measures the number of methods that an object of a given class can execute in response to a received message. Class Interface Size (CIS) [BD02]: It is represented by the number of public methods in a class.

Traceability related faults occur when a programmer misinterprets the design while developing the code from it. A few examples of traceability related faults are mentioned as follows: Omission of a function: This kind of errors may occur when a programmer overlooks some functionalities defined in the SRS document. Introduction of a spurious function: It may occur when a developer implements an extra functionality although it is not defined in the SRS document. Incorrect realisation of a function: This kind of faults may occur, when a developer misinterprets some functionality of SRS, . Algorithmic faults arise due to use of incorrect logic, incorrect data structure or due to incorrect understanding of the problem by a programer. A few examples of algorithmic faults are discussed in the following.

Deeper inheritance structure and greater polymorphism among classes increases the chances of occurrence of inheritance and polymorphic faults [OAW+ 01, Bin96]. According to [OEGQ07], we identify the following inheritance metrics that are significant indicators of faults:

Out of compliance with the performance: This type of faults may occur when unnecessary steps are included in an algorithm, which may increase the time complexity of the algorithm. Use of algorithm with redundant steps degrades the performance of program.

Number of Child Classes (NOC) [CK94, CDK98]: NOC is the count of the total number of immediate child classes of a given class. 4

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403


Depth of Inheritance Tree (DIT) [CK94, CDK98]: It is defined as the length of the of the longest path from a given class to the root class in the inheritance hierarchy. Method Inheritance Factor (MIF) [BD02]: It is represented by the ratio of inherited methods to total number of methods in a class.

DIT DAM WMC

NOM

RFC NA

CIS NS MIF Number of faults

NOC

The chances of occurrence of state-transition related faults increase if the complexity of the statechart model of a class increases. In the literature, several statechart metrics suites [GMP02, HNP07] have been proposed. The investigation of Heidenberg et al. [HNP07] reveals the correlation between statechart metrics and number of pre-release defects. We find number of states, number of transitions, and number of actions of a state model of a class correlate significantly with faults.

Testing Quality

NT Reliability of a class Number of residual faults

Figure 4: A BBN model for predicting reliability of a class faults, Testing quality towards Number of Residual faults in our DAG. We assume the increase of residual faults in a class increases the chances of occurrence of failure in a class. On the other hand, reliability of a class is defined by the probability of failure free operations of the class. Hence, we assume the reliability of a class can be influenced by the number of residual faults or the faults present in the class code after testing phase. We model this relationship by drawing a node named Reliability of class and an edge from node Number of residual faults to Reliability of class. As a result, the DAG shown in Figure 4 is obtained. After structuring the DAG, we assign conditional probability distributions to the individual nodes of this DAG in the following step.

Number of transitions (Trans) [GMP02]: This metric measures the total number of transitions in the state diagram of a given class. Number of States (NS) [GMP02]: This metric computes the total number of states in the UML state diagram of a given class. Number of Activities (NA) [GMP02]: This metric counts the total number of activities in a UML state diagram. 3.1.3


Construction of BBN Model for Class Reliability Prediction

In this section, we construct a BBN to predict class reliability from the design metrics. The following steps are carried out to construct a BBN model.

Assignment of Probability Distributions: The directed edges among nodes of our DAG represents dependencies among the nodes. The strength of dependency relation is expressed in terms of conditional probability density (CPD) of the node given its parents. For this, every node of the DAG is associated with a node probability table (NPT) which stores the CPD of the node given its parent nodes. The CPD for each node is defined either based on expert judgements or from some experimental results. We choose the latter one to determine CPD of each node of our constructed DAG. The CPD for the node labelled Number of faults is a linear regression equation, in which Number of faults is the dependent variable and design metrics are the independent variables. The functional form of the linear regression equation can be represented in matrix notation as follows.

Construction of a DAG: Here, we assume the number of code faults introduced in a class is influenced by certain design characteristics of the class, such as structural complexity, size, inheritance and complexity of the state model of the class. We make this assumption based on few research findings [OEGQ07, HNP07] which show significant correlation between defects and design metrics. For DAG construction, we draw a node for each corresponding design metric. We also draw a node labelled Number of faults. To represent dependency relation among design metrics and number faults in class, we draw directed edges from design metrics to Number of faults as shown in Figure 4. The code faults that remain undetected after testing are the residual code faults. The number of residual faults in a code can be determined by subtracting the number of faults found and fixed in testing from total number faults introduced during development of code. Therefore, we assume the number of residual faults depends on testing quality and the number faults that creep into the class during development of code. To represent this relation, we add two nodes labelled Testing quality, Number of residual faults in our DAG. We also draw directed edges from the nodes named Number of

y = βX + ǫ

Here, y is represents the Number of faults, X is a matrix of design metrics, β is a vector of regression coefficients. The error in the regression process is captured by ǫ, which we assume a i.i.d. Gaussian noise, i.e., ǫ ∼ N (0, σ 2 ), with zero mean and σ 2 as variance. To determine the coefficients(β), we perform linear regression analysis over design metrics and code faults collected from a large number of our own developed projects.

5

DOI: 10.1145/1874391.1874403

(3)

http://doi.acm.org/10.1145/1874391.1874403


independent of each other, then reliability of a scenario is calculated as the product of the reliabilities of all classes involved in its execution.

The CPD for the node Number of residual faults follows binomial distribution(n,p) with parameter p=1-Testing quality and n = Number of faults. Testing quality indicates the probability of detecting programing faults in testing. The probable number of failures can be predicted from the number of residual faults with the help of fault exposure ratio(FER) [Mus04]. FER measures the probability of that a fault encountered during its processing will cause a failure. FER was computed for thirteen software applications and summarised in [Mus04]. The average value of FER in the table given in [Mus04] is 0.0000042. The probable number of failures of a class can be predicted from total Number of residual faults with each fault having the probability 0.0000042 to cause a failure. Therefore, we assume CPD for failures of a class can be assumed to follow a Binomial distribution(n, p) with parameter n = Number of residual faults and p = 0.00000042. The CPD for node labelled Reliability of class can defined by the expression exp(−N ). Here, N = total number of probable failures of a class. Once all NPTs of our BBN model are filled up, we use this BBN model to predict the reliability of a class using the following step.

R(Si ) =

N Y

Rel(Clj )

Where, R(Si ) is the reliability of ith scenario, R(Clj ) is the reliability of j th class, and N = number of classes that interact to implement the ith scenario. Reliability of a use case is estimated from the reliabilities of its scenarios and the usage frequency of the scenarios. Suppose, the execution frequency of ith scenario is pi , then use-case reliability is computed as R(Ui ) = 1 −

M Y

((1 − Rel(Si )) ∗ pi )

(5)

i=1

Where, R(Si ) is the reliability of the ith scenario, and M is the number of scenarios within a use case. Usage frequency of a scenario (pi ) can be determined from the operational profile [Mus93].

3.3

System Reliability Prediction

A system can be modeled as a set of use cases. Consequently, the reliability of a system can be predicted from the reliabilities of its use cases and their respective execution frequencies. If a system consists of K use cases and reliability of each use case is Uk (as predicted in section 3.2), then system reliability R(Sys) can be expressed as follows: R(Sys) =

K Y

(1 − ((1 − R(Uk )) ∗ Pk ))

(6)

k=1

Where, Uk is the reliability of k th use case, and Pk is the execution probability of k th use case. The execution probability of a use case can be determined from the operational profile.

4

Case Study

In this section, we illustrate the usage of our approach to predict reliability of a system through a case study.

4.1


System Description

We consider the example of a simple restaurant automation system(RAS) which computerizes order processing, billing, and accounting activities of the restaurant. The system also generates statistical report about monthly sales of different items and maintains prices all the items to be sold in his restaurant. The sales clerk of the restaurant use this system to keep track of customer orders and generates bills for sold food items. A use case model of the RAS is shown in Figure 6.

A use case typically consists of multiple scenarios. A scenario represents a possible sequence of steps in the execution of the use case. The reliability of a scenario would depend on the reliability of classes involved in its execution. Classes that participate in the execution of a scenario can be identified from the sequence diagram. If we assume the reliabilities of classes that participate in the execution of a scenario is 1 http://www.norsys.com/

6

DOI: 10.1145/1874391.1874403

(4)

j=1

Inference: In this step, we assign values to independent variables of the BBN model to determine the probable values of dependent variables. In our BBN model, design metrics of a class are the independent variables and the reliability of the class is dependent variable. A set of design metrics such as, WMC, RFC, DIT, NOC, MIF, NOM, CIS, APM, DAM, NA, NS, NT are computed from the UML class diagrams. These metrics values are given as the input to our constructed BBN and reliability of the class is predicted with help of our constructed BBN. We used a tool named Netica1 to construct our BBN model and used its inference algorithm to calculate class reliability. Figure 5 shows the result obtained after entering the calculated design metrics values to the nodes labelled WMC, RFC, DIT, NOC, MIF, NOM, CIS, APM, DAM, NA, NS, NT. The value of the node labelled Testing quality can determined with the help of various factors, like number of test cases, percentage of test coverage, testing environment, number of testers and various other factors of software testing. However, this information is not available at early design stages. A possible solution to this problem could be experts’ opinion, or we could assume the Testing quality or the probability of finding defects as an average.

3.2


http://doi.acm.org/10.1145/1874391.1874403



RFC WMC 2 to 6 6 to 10 10 to 14

DIT 0 to 8 8 to 16 16 to 24

0 100 0 8 ± 1.2

100 0 0 4 ± 2.3

0 to 4 4 to 8 8 to 12 12 to 16 16 to 20 20 to 24

0 0 100 0 0

MIF

0 0 100 0

0 to 0.4 0.4 to 0.8 0.8 to 1

100 0 0

0.2 ± 0.12

7.5 ± 0.87

NA 0 to 2 2 to 4 4 to 6 6 to 8 8 to 10

Number of faults

0.5 ± 0.058 NOM 2 to 4 4 to 6 6 to 8 8 to 10 10 to 12 12 to 14

0 to 3 3 to 6 6 to 9 9 to 12

6 ± 1.2

DAM 0 to 0.2 0.2 to 0.4 0.4 to 0.6 0.6 to 0.8 0.8 to 1

NOC

0 100 0 0 0 0

0 0 0 100 0 0 CIS

9 ± 0.58 2 to 4 4 to 6 6 to 8 8 to 10

0 to 6 6 to 12 12 to 18 18 to 24 24 to 30 30 to 36 36 to 42 42 to 48 48 to 54 54 to 60 60 to 66 66 to 72

0 100 0 0

2.92 6.37 13.1 16.8 31.0 12.7 8.19 4.29 1.49 1.10 1.03 1.01

0 0 0 100 0 7 ± 0.58 NT

0 to 8 8 to 12 12 to 16 16 to 24 24 to 32 32 to 40

NS 0 to 8 8 to 16 16 to 24 24 to 32

26.6 ± 12

0 0 100 0 0 0 14 ± 1.2

0 100 0 0 12 ± 2.3

5 ± 0.58 Number of residual faults 0 to 6 6 to 12 12 to 18 18 to 24 24 to 30 30 to 36 36 to 42 42 to 48 48 to 54 54 to 60 60 to 66 66 to 72

Testing quality 0 to 0.2 0.2 to 0.4 0.4 to 0.6 0.6 to 0.8 0.8 to 1

0 0 100 0 0

0.5 ± 0.058

34.3 18.2 9.44 7.52 5.32 4.31 4.12 3.72 3.72 3.72 2.81 2.81

Reliability of Class 0 to 0.2 0.2 to 0.4 0.4 to 0.6 0.6 to 0.8 0.8 to 1

1.89 1.15 7.87 8.20 80.9

0.83 ± 0.17

19.5 ± 20

Figure 5: A BBN model with values assigned to the independent nodes Class C1 C2 C3 C4 C5 C6 C7 C8

show monthly sales

maintain price

process order Owner

generate bill

Reliability 0.935 0.986 0.836 0.920 0.865 0.895 0.956 0.875

Sales Clerk

Table 2: Predicted Class reliabilities Restaurant Automation System

ated from UML Diagrams. We have used MagicDraw2 tool to draw our UML diagrams. Once the design metrics for all the classes are calculated, these values are given as the input to our BBN model. We have used Netica Tool and its underlying inference propagation algorithm to compute the marginal distribution of the child node Reliability of class in our model. Reliabilities of classes predicted by our BBN model is tabulated in Table 2. Use case U2 include two scenarios (S21 , S22 ) with usage probability of p1 = 0.98 and p2 = 0.02 respectively. Usage frequency of the scenarios are determined from the operation profile[Mus93]. Objects of class C1 , C2 , C4 , C5 participate in the execution of S21 scenario and C1 , C2 , C5 , C7 in S22 . Hence, the reliability of the scenarios (S21 , S22 ) can be estimated using equation (4).

Figure 6: Use case model of restaurant automation system

4.2

Reliability Prediction Using ESRA

It can be observed from the Figure 6 that there are four use cases (show monthly sales, maintain price, process order, generate bill). For simplicity we use U1 , U2 , U3 , U4 to denote use cases “show monthly sales”, “maintain price”, “process order”, and “generate bill” respectively. In addition to four use cases, we have six scenarios and eight classes in this case study. Each use case has only one scenario, except U2 and U4 which have two scenarios. For the sake of readability we use Ck , (where k = 1, 2, . . . , 8) to denote the k th class and Sij to denote j th scenario of ith use case. Design metrics for each class Ck is computed using our own developed tool DMetrics. DMetrics is developed using Java programing language. It computes relevant design metrics from the XML file gener-

R(S21 ) = 0.935 ∗ 0.986 ∗ 0.920 ∗ 0.865 = 0.733 R(S22 ) = 0.935 ∗ 0.986 ∗ 0.865 ∗ 0.956 = 0.762 2 http://www.magicdraw.com/

7

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403


Use Case U1 U2 U3 U4

Reliability 0.97958 0.99876 0.98107 0.99328

Usage Frequency 0.11 0.12 0.45 0.32

chain model. However, this approach does not mention the exact way to determine the state-transition probabilities in the statechart model of the component. To determine components’ state transition probabilities they only mentioned the probable number sources, such as domain knowledge, requirement document, simulation, functionally similar component. Singh et al. have proposed a reliability prediction framework for component based systems using Bayesian approach [CSC02]. They used the prior information about the failure probabilities of components and the system usage to compute reliabilities of the components. This approach can be used in the early phase of system design, as soon as the prior information about the failure probabilities of components and properly annotated use case diagrams and sequence diagram become available. However, this approach does not consider the reliability of the interfaces that connect the components. In [RSP03], reliability of a component is measured as a average of the reliabilities the services provided by the component. The reliabilities of component services are assumed to be known. However, it does not consider the design structure of the component, which is likely to reduce the accuracy of the reliability prediction of the component. Compared to the above approaches, in our bottom-up approach, we first focus on predicting major categories of faults that may occur in a class instead of predicting faults as a whole. We identify the likelihood of faults based on the design metrics correlated to these faults. Reliability of a class is predicted based on the predicted number of faults with the help of a BBN and system reliability is predicted based on the class relations and other uml artifacts. Results from the limited experiments that we have carried out show that our bottom-up approach yields approximately 5 to 10% increase in the accuracy of reliability prediction compared to related approaches. Moreover, our ESRA approach allows evaluating different design choices with respect to their impact on the final product reliability.

Table 3: Predicted use case reliabilities

Using equation (5) reliability of U2 is estimated from the reliabilities of use case scenarios (S21 , S22 ) and usage probability of these scenarios (p1 = 0.98 and p2 = 0.02 ) as follows: R(U2 )

= 1 − [((1 − R(S21 )) ∗ p1 ) ∗ ((1 − R(S22 )) ∗ p2 )] = 1 − [((1 − 0.733) ∗ 0.98) ∗ ((1 − 0.762) ∗ 0.02) = 1 − 0.00124 = 0.99876 Reliabilities of the other three use cases(U1 , U3 , U4 ) that are used to model the system, are estimated in a similar way. Reliabilities of these use case and their usage frequencies are shown in Table 3 Reliability of a system can be predicted using the reliabilities of use cases of the system and execution probability of the use cases with the help of equation (6). We predict the reliability of the restaurant automation system as follows: R(Sys)

5


= (1 − P1 (1 − R(U1 ))) × (1 − P2 (1 − R(U2 ))) ×(1 − P3 (1 − R(U3 ))) × (1 − P4 (1 − R(U4 ))) = (1 − .11(1 − 0.97958)) × (1 − .12(1 − 0.99876)) ×(1 − .45(1 − 0.98107)) × (1 − .32(1 − 0.99328)) = (1 − 0.00221) × (1 − .00014) × (1 − .00851) ×(1 − 0.00215) = 0.99779 × 0.99986 × 0.99149 × .99785 = 0.98703

Comparison with Related Work

In the literature, many early software reliability prediction approaches [CSC02, YCA99, GT02, RSP03, CRMG08, CSC02, CRMG08, MRDL92, TM05] have been reported. Most of the reported approaches [YCA99, GT02] assume that reliabilities of components making up a system are previously known. They predict system reliability considering the interaction among the components. A few relatively reported approaches [RSP03, CRMG08, CSC02] address reliability prediction of the components. In the following, we compare our ESRA approach with these approaches [RSP03, CRMG08, CSC02]. Cheung et al. [CRMG08] have proposed a framework to predict the reliability of a system from the reliabilities of its components. To predict component reliability, they have incorporated failure state into the UML state model for individual components and applied a discrete time Markov chain stochastic process to this model. They determined reliability of individual components from the probability of the component to be in a normal state (not in the failure state) by applying standard numerical techniques to solve the Markov

6

Conclusion

We have proposed a bottom-up reliability prediction approach to determine the reliability of an object-oriented system from its class reliabilities. We determine the class reliabilities from the design metrics correlated to the various kinds of code faults of a class. In order to achieve this, we have constructed a BBN that is used to predict reliability from the identified design metrics. The class reliabilities and the information from the sequence diagrams is used to predict the use case reliabilities. The use case reliabilities along with the operational profile information are used to predict the system reliabilities. We are now applying our approach to the real-life systems to investigate the accuracy with which it can actually predict system reliability. To achieve increased accuracy of prediction, we also plan to extend our approach to include class relations into our reliability prediction technique.

8

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403


References [AHSP+ 02] Fernando Brito e Abreu, Brian Henderson-Sellers, Mario Piattini, Geert Poels, and Houari A. Sahraoui. Quantitative approaches in object-oriented software engineering. In ECOOP ’01: Proceedings of the Workshops on ObjectOriented Technology, pages 174–183, London, UK, 2002. Springer-Verlag. [BD02]

J. Bansiya and C.G. Davis. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 28:4–17, 2002.

[BHXN05]

C. G. Bai, Q. P. Hu, M. Xie, and S. H. Ng. Software failure prediction based on a markov bayesian network model. Journal of Systems and Software, 74(3):275 – 282, 2005.

[Bin96]

Robert V. Binder. Testing object-oriented software: a survey. Software Testing, Verification and Reliability, 6(34):125–252, 1996.

[CDK98]

Shyam R. Chidamber, David P. Darcy, and Chris F. Kemerer. Managerial use of metrics for object-oriented software: An exploratory analysis. IEEE Transactions on Software Engineering, 24:629–639, 1998.

[CHB+ 08]

[CK94]


[Mus93]

John D. Musa. Operational profiles in software-reliability engineering. IEEE Softw., 10(2):14–32, 1993.

[Mus04]

John D. Musa. Software Reliability Engineering: More Reliable Software Faster and Cheaper. Professional Software Engineering Series. Tata McGraw-Hill Publishing Company Limited, 2 edition, 2004.

[OAW+ 01] Jeff Offutt, Roger Alexander, Ye Wu, Quansheng Xiao, and Chuck Hutchinson. A fault model for subtype inheritance and polymorphism. In ISSRE ’01: Proceedings of the 12th International Symposium on Software Reliability Engineering, page 84, Washington, DC, USA, 2001. IEEE Computer Society.

Kai-Yuan Cai, De-Bin Hu, Cheng-Gang Bai, Hai Hu, and Tao Jing. Does software reliability growth behavior follow a non-homogeneous poisson process. Inf. Softw. Technol., 50(12):1232–1247, 2008. S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Trans. Softw. Eng., 20(6):476–493, 1994.

[OEGQ07]

Hector M. Olague, Letha H. Etzkorn, Sampson Gholston, and Stephen Quattlebaum. Empirical validation of three software metrics suites to predict fault-proneness of objectoriented classes developed using highly iterative or agile software development processes. IEEE Trans. Softw. Eng., 33(6):402–419, 2007.

[RSP03]

Ralf H. Reussner, Heinz W. Schmidt, and Iman H. Poernomo. Reliability prediction for component-based software architectures. J. Syst. Softw., 66(3):241–252, 2003.

[TM05]

Rakesh Tripathi and Rajib Mall. Early stage software reliability and design assessment. In APSEC ’05: Proceedings of the 12th Asia-Pacific Software Engineering Conference, pages 619–628, Washington, DC, USA, 2005. IEEE Computer Society.

[YCA99]

Sherif M. Yacoub, Bojan Cukic, and Hany H. Ammar. Scenario-based reliability analysis of componentbased software. In ISSRE ’99: Proceedings of the 10th International Symposium on Software Reliability Engineering, page 22, Washington, DC, USA, 1999. IEEE Computer Society.

[ZL06]

Yuming Zhou and Hareton Leung. Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans. Softw. Eng., 32(10):771–789, 2006.

[CRMG08] Leslie Cheung, Roshanak Roshandel, Nenad Medvidovic, and Leana Golubchik. Early prediction of software component reliability. In ICSE ’08: Proceedings of the 30th international conference on Software engineering, pages 111–120, New York, NY, USA, 2008. ACM. [CSC02]

Vittorio Cortellessa, Harshinder Singh, and Bojan Cukic. Early reliability assessment of uml based software models. In WOSP ’02: Proceedings of the 3rd international workshop on Software and performance, pages 302–309, New York, NY, USA, 2002. ACM.

[FN99]

Norman E. Fenton and Martin Neil. A critique of software defect prediction models. IEEE Trans. Softw. Eng., 25(5):675–689, 1999.

[GMP02]

M. Genero, D. Miranda, and M. Piattini. Defining and validating metrics for uml statechart diagrams. In 6th International ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering(QAOOSE 2002), pages 120 – 136, June 2002.

[GT02]

Swapna S. Gokhale and Kishor S. Trivedi. Reliability prediction and sensitivity analysis based on software architecture. In ISSRE ’02: Proceedings of the 13th International Symposium on Software Reliability Engineering, page 64, Washington, DC, USA, 2002. IEEE Computer Society.

[HNP07]

Jeanette Heidenberg, Andreas Nals, and Ivan Porres. Statechart features and pre-release defects in software maintenance. In VLHCC ’07: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing, pages 223–230, Washington, DC, USA, 2007. IEEE Computer Society.

[Hua05]

Chin-Yu Huang. Performance analysis of software reliability growth models with testing-effort and change-point. J. Syst. Softw., 76(2):181–194, 2005.

[LK94]

Mark Lorenz and Jeff Kidd. Object-oriented software metrics: a practical guide. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994.

[MRDL92]

J. A. McCall, William Randell, Janet Dunham, and Linda Lauterbach. Software reliability, measurement, and testing software reliability and test integration. Technical report, Rome Laboratory, April 1992. Final Technical Report RLTR-92-52.

9

DOI: 10.1145/1874391.1874403

http://doi.acm.org/10.1145/1874391.1874403