Recalibrating Software Reliability Models Abstract - CiteSeerX

12 downloads 60 Views 1MB Size Report
Northampton. Square. London. EC1V 0HB. Abstract. In spite of much research effort, there is no universally applicable software reliability growth model which.
Recalibrating

Software

Sarah Brocklehurst,

Reliability

P Y Chan,

Models

Bev Littlewood

-y

Centre

for Software Reliability City University Northampton Square London EC1V 0I-IB

John Snell Computer Science Department City University Northampton Square London EC1V 0HB

Abstract In spite of much research effort, there is no universally applicable software reliability growth model which can be trusted to give accurate predictions of reliability in all circumstances. Worse, we are not even in a position to be abl_ to decide a priori which of the many models is most suitable in a particular context. Our own r_ecent work has tried to resolve this problem by developing techniques-whereby, for eccch program, the accuracy of various models can be analysed. A user is thus enabled to select that model which is giving the most accurate reliability predictf-ons for the particular program under examination. One_of these ways of analysing predictive accuracy, which we callthe uplot, in fact allows a user to estimate the relationship between the predicted reliability and the true reliability. In this paper we show how this can be used to improve reliability predictions in a completely general way by a process of recalibration. Simulation results show that the technique gives improved reliability predictions in a large proportion of cases. However, a user does not need to trust the efficacy of recalibration, since the new reliability estimates produced by the technique are truly predictive and so their accuracy in a particular application can be judged using the earlier methods. The generality of this approach would therefore suggest that it be applied as a matter of course whenever a software reliability model is used.

(NASA-CR-186407) RFLIA_ILITY

MODELS

RECALl BRATING (City Univ.)

N90-19763

SOFTWARE 36

p CSCL

09B G3161

Unclas 0270364

1

The

Introduction

earliest

twenty

attempts

years

ago.

to measure

and predict

In spite of considerable

is still no definitive

method

Perhaps

not be surprising.

this should

not easy.

Perhaps

or model

the major

the reliability

research

which

occurred

work in the intervening

can be universally

Estimating

difficulty

of software

about

years,

recommended

and predicting

software

is that we are concerned

primarily

there

as 'best'. reliability

is

with design

faults.

This situation theory.

is very different

Here

the dramatic

concentration

on the random

have

understanding

a good

depend

upon,

reliabilities

the

theory,

failures

similar

It seems

quarter

century

processes

of physical

failure.

Thus,

hand,

likely,

of such flaws

of components designs.

they represent

on hardware

system

reliability

we now

hardware on

the

of design

a

systems other,

the

hardware

faults

to the

intelligent

strategies

to

results

in a higher

proportion

of

flaws

in hardware

systems

are

the result good

from

to use

Such

of this, that obtaining

come

of this physical

the importance ability

reliability

for example,

structure,

The very success

by flawed

have

of complex

system

Our

failure

faults:

as a result

detailed

systems.

caused

to software

the

reliabilities

is now revealing

of physical

being

the

components.

of complex

the effects

system

one

hardware

of the past

of how

however,

reliability

minimise

very

on

by the conventional

advances

of the constituent

reliability overall

from that tackled

of human

methods

misunderstandings.

for measuring

will be as difficult

the effect

as measuring

software

reliability.

Software

has

no significant

inherent

design

faults

circumstances.

These

in the

design

original

theories

of how

require

better

sciences,

rather

of these

sciences

or in subsequent faults

software;

in arriving

any dramatic

These

difficulties

come

for solutions.

in order

problem

failures

in the software We

are

appropriate

Presumably

the

look to social

theories social

good would

processes

and psychological

In view of the comparative understanding,

their creation

do not have such

and

merely

operational

since

currently

solving

perhaps

at quantitative

breakthrough

recently.

into being.

if so, we should

Software under

changes.

of human

lack of success

it would

be wise

not to

in the short term.

notwithstanding,

modelling

themselves

will have been resident

than physics,

expect

user can choose

faults

understanding

in writing

manifestation.

revealing

software

involved

reliability

physical

there

have

been

important

In fact, there is now a plethora

to make

reliability

estimates

2

advances

of models

and predictions.

in software

from which However,

the none

of these has been able to decide This

shown

in a particular

presents

measures

Our

recent

whereby data

work

source.

then be sensible,

selection

of a model,

Indeed,

this 'best'

model

predicted

methods and actual

two especially noise

of past

techniques

errors

will be shown cases,

particular

need

apply

such

models

case can be analysed,

means

on a particular

and select

for each the model

predictions.

to use that model

for courses'

approach

obviates

is provided

data which

It would for the next the need

with its 'best'

for

model.

as more data is collected.

work

by analysing

In particular, which

closeness

they provide

work

can be used

between

information

about and

is that this knowledge

to improve

general

future

of the

predictions.

The

and are not model-dependent.

predictive

this efficacy

the

we call bias (or ill-calibration)

idea in the present

in improving

not take

by devising

techniques,

reliability

each data source

here are quite

to be effective

but users

could

of departure

of prediction

to be described

to use.

in obtaining

of past predictions

by several

behaviour.

The key

model

interested

this problem

the accuracy

selection

types

appropriate

is solely

to tackle

This 'horses

may change

important

who

the most accurate

instead

failure

be the most

of any other information,

of model

(or variability).

nature

about

best by giving

in the absence

and we are not presently

can have confidence.

produced

on that data source.

new

user,

is that a user

a priori

These

would

attempted

to the results

has so far performed

prediction

[1] has

intention

(program),

which

he/she

can be made

The

in all circumstances,

for a potential

in which

judgements

source

context

difficulties

reliability

own

to be applicable

accuracy

on trust:

just like any other

their

model,

They

in a high proportion predictive

using

accuracy

our earlier

of in a

techniques

[11.

2

Reliability

In its simplest variables

is being

to fix the fault

represent found

form,

T1, T2 .....

as a program failure

growth

the

software

debugged. which

[ 1, 8, 15].

predictive

reliability

Tn, representing

this fault-finding

elsewhere

and

caused

and fixing

growth

the execution

It is generally that

problem times

assumed

failure.

operation:

accuracy

between

vary

of different

the

random

successive

that attempts

Models details

concerns

failures

are made

in the way approaches

at each

that they can be

At stagei, whenobservationstl, t2.... , ti-I havebeenmadeof the first i-1 inter-failure times,theobjectiveis to predictfuturefailure behaviourrepresented by the unobserved Ti, Ti+l .... randomvariables. Informally, thepredictionproblemis solvedif we can accurately estimatethe joint distribution of any finite subsetof Ti, Ti+l .... This statement,however,begsthe questionof whatwe meanby 'accurately',andit is this issuewhich formsa majorpartof ourearlierwork [1]. In practice, of course, a user will be satisfied with much less than a complete descriptionof all future uncertainty.In manycases,for example,it will be sufficientto know thecurrentreliability of thesoftwareunderexamination.This could bepresented in manydifferentforms: thereliability function,P(Ti < t); thecurrentrateof occurence of failures (ROCOF), [3]; the mean (or median) time to next failure (mttf). Alternatively,a usermaywishto predictwhena target reliability, perhaps to be used as the criterion

for termination

If we accept

that prediction

competing

software

comparing

the relative

allow

(i)

us to predict

(ii)

the future

a statistical

statements

about

Of course,

the model

it can be seen

models

of prediction

model

which

parameter

that

is misleading.

systems.

(Ti, Ti+l ...) from

specifies

the usual We

A prediction

the past (tb t2 ....

the distribution

discussion

should,

instead,

system

which

of be will

ti-1) comprises:

of any subset

of the Tj's

o_ ;

inference

procedure

for

procedure

combining

(i) and (ii) to allow

future

o_ involving

use

of

available

data

is an important

is not sufficient:

stages

system.

In fact disaster

can strike

(ii) and

There

is not 'close (iii)

are vital

at any of the three

to be possible

to gain trust in (or to mistrust) this is not possible.

part of this triad

if the model

model

it ought

us to make

probability

Tj's.

can be obtained

In principle,

goal,

of Tj's);

a prediction

predictions

will be achieved.

growth

merits

on a (unknown)

(realisations

(iii)

is our

reliability

the probabiIistic

conditional

of testing,

to analyse

the predictions.

are several

and it seems to reality'. components

4

that good

However,

a good

of the prediction

stages.

each of the three Unfortunately,

reasons.

unlikely

stages

separately

it is our experience

so as that

In the In'stplace,themodels fit' approach does

to be attempted.

not allow

problem

are usually

the simplest

this kind of analysis.

for independent

of unknown

Even

identically

parameters.

too complicated

This

The reliability

exponential

should

distributed

for a traditional

not surprise

random

growth

order

statistic

model

[14]

us: the goodness-of-fit

variables

context

'goodness-of-

is hard in the presence

is much

worse

because

of non-

stationarity.

Secondly,

statistical

Bayesian

analysis

models

assume

an upper

of these

bound

asymptotic

are invariably

there is a proper

It involves

posterior for

for the popular

advances

in Bayesian

computers,

Finally,

of the greater

dubious

proposition.

cannot software when

it may be possible program,

or even

Their

small

framework.

this does

analytical

present

models.

coupled

some

However, with

with recent

powerful

of their

which

are 'obviously'

better

underlying

assumptions.

We

them.

of some However,

It is our belief

models

personal

this still leaves

others

which

even choose

model

under

to a program

development

this a naive

that we cannot

a reliability

find

overly

understanding

of the software

than others

seem

that

knowledge

of the software

estimators.

even

predictive

[I8],

are models

is so imperfect

to match

that we cannot

in the near future.

to discount

we have an intimate

is thus

at stage (ii) and Bayesian

growth

the assumptions

a priori.

engineering

reliability techniques

plausibility

be reasonable

be rejected

Unfortunately,

that there

Certainly,

(ML)

several

There

(ii) and (iii) in the Bayesian

of the parameters

may change

example,

of faults.

This implies

for a non-

hard to obtain.

to stages

software

be argued

because

and it might

likelihood

numerical

this picture

it could

for maximum

[2]).

For

number

Tj's.

approach

(see

only a finite

impossibly

parameters

not available.

of observable

distributions (iii)

of unknown

are usually

contains

theory

Of course,

difficulties

models

on the number

properties

distributions

of the estimators

that the software

trust the usual sample

properties

of the processes an appropriate

study.

At some

model

future

via the characteristics

methodology

used.

to obtain

trustworthy

of

time of that

This is not currently

the case.

Where

does this leave

for his current examination complete be done,

a user, who merely

software

project?

and comparison prediction the most

systems. important

Our

wants view

of the quality

is that there

is no alternative

of the predictions

emanating

In [1] we have described tools being

key idea in each case is that a comparison

reliability

the u-plot

ways

and the prequential

is made between

5

several

metrics

to a direct from

different

in which

this can

likelihood.

what has been

predicted

The and

what is (later) actually observed. We believethat this emulateshow a user would informally gainconfidencein a sequence of predictions. For simplicity weshallconcentrateonpredictionof thenexttime to failure Ti, basedon observationstl, t2.... , ti-1. The u-plot usesthe predictor _i(t), the estimateof the distributionfunctionFi(t) = P(Ti < t), via _ ui =

where

_i(ti)

ti is the

probability

later-observed

integral

function.

transform

{ui} should

are various we

types shall

distributed.

of the ui sequence.

u-plots

on 86 predictions:

(JM)

and 0.150 from

also performing

importantly

making

predictions

This can be seen (the true U(0,1) u values failure, almost

and

use

a U(0,1)

might

to see that

is the sample

[1].

show

{ui} sequence

ui is the

distribution

distribution

the

There

themselves;

looks

uniformly

cumulative

distribution

of this plot from the cdf of U(0,1), of the prediction

that is the maximum

standard

Thus

predictive

it is easy

which

the

which

of a departure distance,

tables

for Jelinski-Moranda

on a data

based

More

whether

Ti.

system vertical

to determine

the

from accuracy. deviation,

whether

as a

or not it is

significant.

predictions

prediction

from

The departure

is an indication

the Kolmogorov

1 shows

making

with

the

is good,

such an appearance

be concerned

of this departure

Figure

from

sample

variable

using

{_i(ti)}

We shall do this via the u-plot

can use

statistically

observation

of predictions

of departure

line of unit slope,

measure

of the

of the random

look like a random

only

(cdf) function

We

realisation

If the sequence

sequence

here

(1)

set, called

_51(t)

(LV).

The

first

JM; the second poorly

cdf), so there

The

are too many

These

whilst

suggesting

suggests

of the plots

LV predictions above

small ui values.

A similar

plots

are 0.205 very

poor

that this model

the chance argument

tells

us that

is

the line of unit slope

But consistently of small shows

JM is

are too pessimistic.

times

too small between

that a plot which

the line of unit slope, such as LV, is too pessimistic.

6

are each

to JM.

the shape

is underestimating

[13] models

distances

1% level,

JM plot is everywhere

is too optimistic.

below

at the

superior

purposes,

in [1].

The Kolmogorov

at 5%, which

are too optimistic,

as follows.

i.e. the model

_136(t).

is significant

for our present which

S1 [17], analysed

is significant

but is somewhat

tells us that the model

everywhere

through

[10] and Littlewood-Verrall

is

If we knew that thesedeviations between predicted and actual behaviour were consistent,we could attemptto measurethe degreeof optimism (or pessimism)and improvefuturepredictionsby takingaccountof this tendency.It is this ideawhich we shall develop in the next section. Before we do that, we shall briefly describethe prequentiallikelihood function (PL) which is a generalmechanismfor comparingthe accuracyof predictionsystems. The PL is definedasfollows. Thepredictivedistribution _i(t) for Ti basedon t 1, t2..... ti_1 will beassumedto havea probabilitydensityfunction(pd0 _i(t)

=

_i'(t)

Forpredictionsof Tj+I, Tj+2, .... Tj+n, theprequential

likelihood

is

j+n

PLn

=

YI

_i(ti)

(2)

i=j+l

A comparison Tj÷2 ....

of two prediction

Tj+n, can be made

systems,

A and B, over

via their prequential

a range

likelihood

of predictions

of Tj+I,

ratio

j-l-n

YI

'_i A (t i)

i=j+l PLRn

=

(3) j+n

FI

'_i B (t i)

i=j+l

Notice

how,

in a fashion

contributions

analogous

to the prequential

likelihood

pdf for Ti of the the later-observed as

n ---) _, prediction

with which suggests works.

system

we inevitably the superiority

Specifically

To summarise, prediction

are obtained

realisation

of A over

B.

for a particular

data

[7] shows

in favour

of A.

bias or noisiness

otherwise

7

into the predictor that if PLRn

For the finite

reasons

---) oo samples

consistently why

the PL

of a prediction

system

for choosing

the best

be the case.

as a general source.

the individual

that PLRn increasing

In [1] we give intuitive

that consistent

PL than would

by substitution

ti. Dawid

B is discredited

the PLR can be regarded

system

of the u sequence,

have to deal, we shall argue

we show

will tend to give a smaller

to the calculation

The

procedure u-plot

is a means

of indicating

a

particular kind of consistentinaccuracyof prediction which could be a contributory factor in poor predictive accuracy. Thus a poor u-plot might suggestthat poor predictiveaccuracy(represented by a poorprequentiallikelihood) is dueto consistent bias. For sucha case,we shall showin the next sectionhow it is possibleto remove the biasandsoimprovetheaccuracyof reliability predictions.

3

Recalibration

Consider

of predictions

a prediction

distribution

is Fi(t).

_bi(t)

of the random

Let the relationship

variable

between

Ti, when

these

the

true

be represented

(unknown)

by the function

Gi where

Fi(t)

= Gi[

Obviously,

if we

inaccurate

predictor,

many

cases

changing

the

(4)

lbi(ti) ]

knew

Gi we could

_i(ti).

sequence

recover

The key

notion

true

distribution

in our recalibration

{ Gi } is approximately

stationary,

of Ti from approach

i.e.

the

is that in

it is only

slowly

in i.

If the sequence precise

were

completely

interpretation

would

also have

using

it to improve

stationary,

the possibility

of estimating

the accuracy

of future

in practice

does

to be the case that the sequence

seem

i.e. Gi = G for all i, we would

of the idea of 'consistent

Of course,

opens

the

such complete

up the possibility

bias' used the common

a more

section.

We

G from past predictions

and

predictions.

stationarity

of approximating

in the previous

have

is unlikely

to be achieved.

changes

only

slowly

Gi with

an estimate

in many Gi* and

However, cases.

it

This

so forming

a

new prediction

_i*(ti)

A suitable function calculated formed

= Gi*[

estimator

for Gi is suggested

of Ui = t}i(Ti). from from

(5)

_i(ti)].

predictions

the ujs for j