Scenario-driven Specification-based Testing ... - Semantic Scholar

Scenario-driven Specification-based Testing against Goals and Requirements Thomas A. Alspaugh, Debra J. Richardson, Thomas A. Standish, and Hadar Ziv Donald Bren School of Information and Computer Sciences University of California, Irvine {alspaugh,djr,standish,ziv}@ics.uci.edu

Abstract. We describe a new verification and validation (V&V) approach based on comparing actual system behavior in the form of captured goalannotated event traces, with expected behavior expressed by requirements scenarios with system requirements goals. We believe our proposed V&V approach can leverage requirements engineering work fruitfully and lead to improved software quality, by offering these six potential benefits: (1) higheryield testing, (2) distinguishing false positives, (3) defining test coverage metrics, (4) detecting domain-analysis errors, (5) validating top-level requirements, and (6) efficient control of retesting. We use examples to explain how our method can attain these benefits. If our goal/requirements-based V&V approach succeeds in realizing these six benefits, we believe it will lead to improved requirements that, in turn, can successfully improve software quality.

1 Introduction Requirements Engineering (RE) helps provide better quality software, but that’s not enough. Verification and Validation (V&V) must be integral so as to demonstrate the improved quality. Moreover, since the majority of errors and especially those with highest impact originate in early life-cycle phases such as RE (widely and consistently reported over several decades [Boe81, Dav90, Sta94, Dav00]), V&V’s role in helping uncover requirements defects is vital. Traditional code-coverage or test generation approaches typically result in intractably large test sets, due to the many distinct paths through the code whose effects are not significantly different. Specification-based testing allows greater efficiency and, when driven by requirements scenarios and guided using stakeholder goals, maximizes satisfaction of those goals. Thus, improved software quality is more likely to result from RE that is leveraged by improved V&V methods. In this paper, we describe how we can leverage RE work with a new improved goal/requirements-based V&V method: 1. We trace requirements goals down through a goal-refinement hierarchy to derive functional requirements goals, which can be realized as system components. We develop requirements scenarios (expressed as scenario schemata in a new scenario description language, ScenarioML [Als05]) that express the abstract plans of action that allegedly satisfy these functional requirements goals.

2. We implement system components by stepwise refinement of the abstract plans, leading ultimately to executable coded components while retaining goal annotations of the sub-goals used in the abstract plans from which they were refined. These goal annotations are later emitted during testing. 3. When we execute such goal-annotated components, event traces of their running behavior are collected containing events, data results, and goal-annotations of interest. The goal annotations in these event traces indicate what sub-goals the component was working on in its intrinsic “plan of action” when it was running. 4. The requirements scenarios are augmented with expected events, expected (intermediate and final) data results, and specification-based test oracles [Ric94] that can check event traces and actual data results for correctness. 5. During testing, we collect a set of goal-annotated event-traces {Ti} by running an executable component C, and compare the actual system behavior captured in the Ti to the augmented requirements scenario S that shares the same goal from which both C and S were derived. By comparing the actual behavioral events, data results, and goal annotations in Ti with the expected events, data results, and goals in instances of scenario S specifying what should occur, we test whether or not Ti shows that component C works both correctly and according to plan — in essence determining whether it did “the right things for the right reasons.” 6. Because the goal annotations in Ti express embody the designer’s intentions, embodied in C and enacted while it was executing, we can determine not only whether C got the correct expected result, but also whether it was trying to follow the expected plan of action, and whether it was trying to solve the expected subproblems called for in that plan of action. We will compactly refer to these human intentions, embodied and enacted by C, as C’s intentions. Including such “intention testing” in our method leads to some important benefits for V&V. For example, checking goals and intentions helps distinguish false positives. Ordinarily this is a serious issue. A false positive is much less likely if the oracle requires that the result passes and the goal is satisfied. Therefore less testing is needed. Without goals one must perform more tests to gain the same level of confidence. Hence, the addition of goals gives higher-yield testing. Cost-effectiveness in testing is improved if, to obtain a complete test suite, we can use fewer, higheryielding tests than traditional testing methods need to use. Also, if our testing method can determine when low-level requirements goals are satisfied, then under appropriate conditions we can propagate that satisfaction up the goal refinement hierarchy to check whether top-level requirements goals are being satisfied. (Here, by “appropriate conditions” we mean that the goal refinement hierarchy is structured so that goals are satisfied whenever their instrumental sub-goals are satisfied.) This is immensely helpful in validating that top-level requirements goals are being met by the actual behavior of the system, and shows how, by leveraging V&V with a new goal/requirements-based technique, we can improve both RE and software quality. Section 2 explains how our approach works using three progressive examples, each of which explores different issues and reveals different benefits of our approach. Section 3 comments on related work. Section 4 presents conclusions and describes our plans for validation of our approach and for future work.

2 Explaining Our Approach using Three Progressive Examples We choose to explain our new approach by means of a sequence of three examples, each of which enables us progressively to reveal additional benefits of our basic ideas. The first example studies a method for computing the least common multiple (lcm) of two integers, the second examines an Automated Teller Machine (ATM) for a bank, and the third studies how to test whether a Tic-Tac-Toe game player program meets some requirements for playing at an expert skill level. 2.1 Example 1: Finding the lcm This first example, least common multiple, illustrates how our approach collects goalannotated event traces and compares them to a requirements scenario augmented with a specification-based test oracle. Suppose that some developers have been given the task of implementing a pocket calculator that can add fractions on its screen. To do this, it is required to be able to find suitable common denominators. For example, to add 1/6 + 3/4 we could use 12 as a common denominator, and write: 1/6 + 3/4 = 2/12 + 9/12 = 11/12. One way to find a suitable common denominator for c/a + d/b is to use the least common multiple of the denominators a and b, denoted lcm(a, b). Thus, we might imagine that the original top-level system requirements goal “implement a calculator that can add fractions,” has led to a derived intermediate requirements goal “find a common denominator for adding two fractions,” which in turn has led to a low-level functional requirements goal, “find lcm(a, b) for two positive integers a and b.” (We intend this short example of a requirements derivation chain to be emblematic of the way that top-level system requirements can trace down through a goal refinement hierarchy to low-level functional requirements goals that can be operationalized.) To operationalize the requirements goal “find lcm(a, b)” we first need a suitable abstract plan to use as a precursor for stepwise refinement. In a book on elementary number theory [Eyn87], we discover a theorem that states: lcm(a, b)*gcd(a, b) = a*b, where gcd(a, b) is the greatest common divisor of a and b. We know that gcd(a, b) can be found by Euclid’s method. This suggests a plan for finding lcm(a, b), where a and b are positive integers. In Fig. 1 we have denoted goals g by enclosing them in double angle brackets >. Thus, the plan for achieving the goal > consists of three sequential sub-goals. In fact, these three sub-goals are partially ordered by the prerequisite relationships between them. Namely, we must find the product a*b and the gcd(a, b) before using them to find lcm(a, b), but we can find the product and the gcd in either order. Thus, the producer-consumer relationships between data determine the partial order of the goal prerequisite relationships, and any sequential order for the sub-goals that is consistent with this goal-prerequisite partial order will give a correct instance of a plan for finding lcm(a, b). The fact that ScenarioML [Als05] allows us to express scenarios that are partially ordered sequences, makes ScenarioML especially appropriate for expressing the requirements scenario schema for lcm(a, b).

> lcm(a, b) { > > > }

Fig. 1. Plan for finding lcm(a, b)

We can use stepwise refinement to refine the plan for lcm(a, b) in Fig. 1 into the executable implementation shown in Fig. 2. In performing this stepwise refinement, we retain the top-level goal (find lcm(a, b)) and we retain the sub-goals in the plan that was refined as goal-annotations in the refinement. > int lcm(int a, int b) { int p,g; > p = a * b; > g = gcd(a,b); > return p/g; }

Fig. 2. Goal-annotated implementation for finding lcm(a, b)

To complete our stepwise refinement process, we can use Euclid’s method for finding the gcd(a,b), as shown in the plan in Fig. 3, and we can refine it into the goalannotated implementation shown in Fig. 4. > gcd(m, n) { > { > } > }

Fig. 3. Plan for Euclid’s method for finding gcd(a, b) > int gcd(int m, int n) { int r; > while ( n != 0 ) { > r = m % n; m = n; n = r; } > return m; }

Fig. 4. Goal-annotated implementation for finding gcd(a, b)

Suppose that we execute the goal-annotated implementation for finding lcm(a, b) given in Fig. 2, by calling it with specific inputs, such as those used to evaluate lcm(6, 4). This leads to the goal-annotated event-trace shown in Fig. 5, where numbered lines in the outline are goals and a line beginning with * indicates an observed event or data collected in the event trace. top-level goal: find lcm(6, 4) 1. find lcm of 6 and 4 1.1 find the product p of 6 and 4 * p = 24 1.2 find g, the gcd of 6 and 4 1.2.1 second member of pair (6, 4) is not zero *4≠0 1.2.2 replace the pair (m, n) with (n, m mod n) * (m, n) replaced by (4, 2) 1.2.3 second member of pair (4, 2) is not zero *2≠0 1.2.4 replace the pair (m, n) with (n, m mod n) * (m, n) replaced by (2, 0) 1.2.5 second member of pair (2, 0) is zero *0=0 1.2.6 return first member of pair (2, 0) as the gcd * 2 returned as value of gcd(6, 4) 1.2.7 assign value 2 = gcd(6, 4) to be value of g *g=2 1.3 return p / g as the lcm of 6 and 4 * return 24 / 2 * 12 returned as the value of lcm(6, 4)

Fig. 5. Goal-annotated event trace for finding lcm(6, 4)

We can now compare the event trace of Fig. 5 against a requirements scenario S, shown in Fig. 6, which is derived from the plan in Fig. 1. In general, checking only the events in an event-trace (such as that of Fig. 5) against the expected events in a requirements scenario S (such as that of Fig. 6) can tell you whether the plan is being followed, but it cannot tell you if the plan satisfies the requirements. For that, you need additional information that comes from running a test oracle [Ric94] embedded in the scenario that checks whether the result of executing the plan satisfies the (precise) result specifications of the scenario’s requirements goal. (Indeed, that is why we say that our scenario-driven testing method is “specification-based.”) We assume the existence of an operationalized version of the test oracle for the lcm property. For instance, to test that result r satisfies the lcm property for lcm(a, b), we could first test whether r is a common multiple of a and b (true iff a|r ∧ b|r) and then test multiples of a of increasing size to see if any of them are both divisible by b and smaller than r, stopping when we reach a multiple of a equal to r. When using the requirements scenario in Fig. 6 to test the event trace of Fig. 5, we have a choice of whether to check the attainment of each of S’s three sub-goals by using either oracles or pre-stored expected results, or whether we will choose to trust the components that implement these sub-goals. We may choose to trust them because, say, we believe: (i) that the product p of a and b is reliably computed by the

hardware, (ii) that the gcd(a, b) has been both verified and unit tested for correctness, and (iii) that the theorem that claims lcm(a, b) = a*b / gcd(a, b) is true (because it comes from a trusted source such as a number theory book [Eyn87]). In this case, we may choose to ignore the nested trace of details in Fig. 5 that compute the gcd(6, 4), and to use an oracle only to test whether the final result lcm(6, 4) = 12 is correct. The point is that when doing integration testing [Jor94] of a plan that uses already tested or verified components, we can choose to trust but not test the components, or we can choose (selectively) to test the operation of the components in the context of the integration test. Again when choosing what to retest during regression testing [Rot96], we can avoid the cost of retesting trusted components that haven’t changed.

Fig. 6. Requirements Scenario S for finding lcm(a, b), written in ScenarioML

2.2 Example 2: Withdrawing cash from an ATM Here we use an example Automated Teller Machine to illustrate the use of plan automata for achieving good requirements-based test coverage. Banks primarily operate ATMs that automate common banking services to increase a bank’s business success. Because automation substitutes capital for labor, it increases productivity (by saving the cost of a human bank teller to provide service to a customer who visits a bank in person). Moreover, ATMs can increase a bank’s market share by increasing customer satisfaction, because ATMs provide access to banking services in more locations and during more hours of the day in a manner that is convenient, safe, secure, and easy to use. A portion of a bank’s requirements goal hierarchy showing how these high-level bank stakeholder goals begin to refine into goals that determine the required properties for ATMs is shown in Fig. 7.

Space restrictions prevent us from showing in detail how the top-level bank stakeholder goals in Fig. 7 ultimately refine into low-level functional requirements goals that can be operationalized in an ATM system implementation, but the reader can likely imagine that the top-level requirement to make ATMs safe and secure refines into requirements sub-goals to grant access only to users who pass authentication checks involving ATM cards and PINs (personal identification numbers), and to deny access to ATM services to users who can’t be authenticated.

Fig. 7. Part of a Bank’s Requirements Goal Hierarchy

By examining the goal-annotated event trace of the use of an ATM machine to withdraw cash given in Fig. 8, the reader can further imagine what the ATM requirements scenario might include. For example, the top-level ATM requirements scenario includes sub-goals: (i) to authenticate the user, (ii) to provide each of four ATM services, and (iii) to terminate an ATM session in a secure manner and enter the ready state for a new ATM session. Here, the sub-goal for withdrawing cash corresponds to a separate abstract plan for dispensing cash provided the amount requested by the user is within limits prescribed by the bank’s business rules (see sub-goals 2.2.2 and 2.2.3 in Fig. 8). This abstract cash withdrawal plan can be stepwise refined into its own goal-annotated procedure implementation. To define requirements-based test coverage metrics [Cla89], it is useful to model the ATM requirement scenario as a plan automaton, such as that shown in Fig. 9. A plan automaton is a transition diagram whose edges are labelled with goals. Each transition in the diagram represents actions satisfying that edge’s goal. In a plan automaton, when a state m has more than one successor state, n1, … , nk, the state transitions from m to each of the ni are labeled with predicates that are mutually exclusive and exhaustive. Thus, plan automata model sequential and branching control flow among sub-goals in a plan, deterministically because one and only one predicate can be true among the predicates for multiple successor states. When a sub-goal g labels a state transition in plan automaton p, it is intended that p can call the plan automaton for g. A plan automaton provides a goal/requirements-based test coverage metric – that is, to adequately test the requirements goals, each transition in the plan automaton must be covered by at least one test case.

Top-level Goal: Conduct an ATM User Session (note1: numbered items in outline below are ATM requirements goals) (note2: * indicates observed event collected in event trace scenario) 1. Authenticate User 1.1 Get user to insert ATM card * ATM Card inserted 1.2 Check ATM Card is valid * ATM card determined to be valid 1.3 Get user to enter PIN * user enters PIN and presses "Enter" key 1.4 Check that entered PIN is valid * entered PIN matches PIN on ATM card * Authentication Successful 2. Grant User Access to ATM Services 2.1 Get user to choose a transaction type: (1: withdraw cash, 2: make deposit, 3: get balance, 4: transfer money between accounts, or 5: terminate session) * user chooses (1) to withdraw cash 2.2 Allow User to Withdraw Cash 2.2.1 Get user to choose account to withdraw from (1: checking, 2: savings) * user chooses (1) to withdraw from checking 2.2.2 Get user to enter amount, A, to withdraw * user enters A = $180 2.2.3 Check that A is within limits (1: enough in user's account, 2: within daily withdrawal limit, 3: enough cash in machine, 4: even multiple of twenty dollars) * user's requested amount is within limits (1-4)

(continued from left panel) 2.2.4 Dispense requested cash * $180 in cash dispensed 2.2.5 Deduct dispensed amount A from: (1: cash available in machine, 2: user's account, 3: user's remaining daily withdrawal limit) * deduction transactions (1-3) accomplished 2.2.6 Ask if user wants a printed receipt (1: yes, 2:no) * user does not request printed receipt (2) * Cash withdrawal transaction successful 2.3 Get user to choose a transaction type: (1: withdraw cash, 2: make deposit, 3: get balance, 4: transfer money between accounts, or 5: terminate session) * user chooses (5) to terminate session 3. Terminate ATM User Session 3.1 Return user's ATM card 3.1.1 Prompt user to expect ATM card ejection * ATM card ejected half-way 3.1.2 Prompt user to withdraw ATM card completely * sensor indicates ATM card has been completely withdrawn 3.2 Clear user account information from screen * screen cleared 3.3 Enter ready state for new ATM session * ready state entered * ATM user session ended successfully

Fig. 8. Goal-annotated event trace for withdrawing cash from an ATM 0 authenticate user ATM card or PIN invalid

1 ATM card and PIN valid 2

8

c ← get user’s choice of ATM service

3

deny access to ATM services

c=1 c=2 c=3 c=4 c=5

4 5 6 7

withdraw cash make deposit get balance transfer money between accounts

9 terminate ATM session 10

Fig. 9. Plan automaton for ATM requirements scenario

The event trace in Fig. 8 results from a test that does not cover all of the state transitions in the plan automaton of Fig. 9. For example, the transitions from states 1 to 8 and 8 to 9 are not covered (because that path is followed only by a user who doesn’t authenticate and is denied ATM services), and it also misses the state transitions from state 3 to states 5, 6, and 7 and then to state 2 (because these correspond to ATM services not exercised by the trace in Fig. 8). The event trace in Fig. 8 covers only 7 of the 15 state transitions in the automaton in Fig. 9, only amounting to 47 percent coverage. To achieve full coverage, other types of event traces are needed that represent successful use of the other available ATM services, and that illustrate that failure to authenticate results in being denied access to ATM services. One could also use a test coverage metric that counts only edges labeled with sub-goals (representing subroutine calls made by the top-level ATM plan automaton to the respective plan automata for the sub-goals). This coverage metric would measure the percentage of different ATM components exercised by a set of event traces. For example, the event trace in Fig. 8 exercises only 4 of the 8 available sub-goals used as state transition labels in Fig. 9, amounting to making 50 percent of the available calls to component sub-goals in the ATM top-level plan. We conclude that plan automata can provide a useful basis for defining test coverage metrics. Matching goal-annotated ATM event traces to the ATM requirements scenario turns out not to involve any mysterious, hard-to-verify oracles. Instead, it turns out to be rather straightforward, as one might suspect from examining the simple nature of the bank’s ATM business rules encountered in Fig. 8. However, when we have a test suite consisting of a set of event traces that collectively demonstrate good requirements goal coverage (according to the coverage metric just discussed) we are in a position to validate with high confidence and thoroughness of coverage that the bank’s higher-level ATM requirements have been satisfied because all of the supporting low-level instrumental goals in the requirements goal hierarchy have been met, and, in fact, do “the right thing for the right reasons.” 2.3 Example 3: Playing Expert-Level Tic-Tac-Toe Tic-Tac-Toe (aka noughts and crosses) is a simple children’s game commonly played in many countries throughout the world. We assume (hopefully not erroneously) that the reader is familiar with Tic-Tac-Toe. Our Tic-Tac-Toe example is designed to shed light on the benefits of including goal annotations in our event traces. To this end, suppose some software developers have been assigned the task of developing a Tic-Tac-Toe game playing program whose requirements state, among other things, that “it should be able to play at three skill levels: novice, intermediate, and expert.” Suppose these developers have given us an implementation of such a Tic-Tac-Toe playing program that, they claim, succeeds in playing at the three required skill levels. We play a game of Tic-Tac-Toe in which the machine starts first and plays with X, and in which we make our replying moves with O. Suppose further that the machine has concealed the level of skill at which it is currently playing, and that it generates an event trace of this game that is not annotated with goals. Both the game and the event trace are shown in Fig. 10.

New game began at 2:21:46 PM Machine plays first with X * Machine moved with X in upper left corner (1,1) * You moved with O in lower right corner (3,3) * Machine moved with X in lower left corner (3,1) * You moved with O on left side (2,1) * Machine moved with X in upper right corner (1,3) * You moved with O in center (2,2) * Machine moved with X on top side (1,2) Result: X won the game Transcript ended at 2:22:16 PM

Fig. 10. An Event-Trace Scenario of Tic-Tac-Toe Play — Machine Plays X

We now ask the question, “Was the machine playing at the expert level?” Indeed, it is entirely possible that the machine was playing flawlessly at the expert level. An argument that could be used to convince us that this was the case is given in the form of a game analysis in Fig. 11 that appears to have been written by a person who has substantial knowledge about how to play Tic-Tac-Toe.

1.

X starts in the upper left corner

2.

O’s reply in the lower right corner is fatal because it’s not in the center

3.

X’s forcing move in the lower left corner will lead to a future fork and a win

4.

O is forced to move on the left side to block X’s immediate win in the left column

5.

X now moves in the upper right corner to create a fork that is guaranteed to win

6.

O can block a win on only one branch of fork, and chooses to move in the center

7.

X can always complete the win on the other branch of the fork, and does so

Fig. 11. A Game Analysis of Expert-Level Play by X

On the other hand, we can’t really tell whether the game described in the event trace in Fig. 10 is unassailable evidence of expert-level play. In fact, suppose this game was actually played by a novice at the novice skill level specified in Fig. 12. L1.

A novice can complete an immediate win and block to prevent an immediate loss. But that’s all.

L2.

An intermediate can create and block forks and can do everything a novice can do. But that’s all.

L3.

An expert (i) never loses (i.e., always wins or draws), (ii) can recognize fatal second moves by an opponent and afterwards can force a win, (iii) always plays safe second moves with variety, and (iv) can do everything an intermediate can do without making any fatal moves.

Fig. 12. Specifications Defining Three Skill Levels

Now, suppose that goal annotations were turned on for event trace collection when this novice played the Tic-Tac-Toe game of Fig. 10, and suppose that the goalannotated event trace of Fig. 13 was collected.

In fact, the event trace of Fig. 13 is entirely consistent with the hypothesis that the novice’s first three random moves fortuitously created a fork, and that on its fourth move, the novice spotted an opportunity for an immediate win on the top row and took advantage of it (because recognizing and making immediate winning moves is within the range of a novice’s defined capabilities according to Fig. 12). New game began at 2:21:46 PM Machine plays first with X at Novice level Goal: to play with variety make a random first move * Machine moved with X in upper left corner (1,1) * You moved with O in lower right corner (3,3) Goal: make a random move * Machine moved with X in lower left corner (3,1) * You moved with O on left side (2,1) Goal: make a random move * Machine moved with X in upper right corner (1,3) * You moved with O in center (2,2) Goal: make a winning move * Machine moved with X on top side (1,2) Result: X won the game Transcript ended at 2:22:16 PM

Fig. 13. A Goal-Annotated Event-Trace of Novice-Level Play

Now suppose that goal annotation had been turned on for event trace collection and that the level of play had been set to Expert when the game of Fig. 10 was played, resulting in the goal-annotated event trace of Fig. 14. We can see from the goal annotations in Fig. 14 that there is evidence that the expert recognized O’s fatal second move, and, in response, took immediate action to make a forcing move leading to the creation of a fork, and ultimately to a win (because it announces its intention to do just that as its second goal in Fig. 14). New game began at 2:21:46 PM Machine plays first with X at Expert level Goal: to play with variety make random first move * Machine moved with X in upper left corner (1,1) * You moved with O in lower right corner (3,3) Goal: make forcing move that will lead to a fork and a win * Machine moved with X in lower left corner (3,1) * You moved with O on left side (2,1) Goal: create a fork * Machine moved with X in upper right corner (1,3) * You moved with O in center (2,2) Goal: make winning move * Machine moved with X on top side (1,2) Result: X won the game Transcript ended at 2:22:16 PM

Fig. 14. A Goal-Annotated Event-Trace of Expert-Level Play

Thus, we conclude: (1) that raw events in an event trace without goal annotations cannot alone be used to deduce the machine’s skill level unambiguously; (2) that we can identify false positives (that fortuitously got the right results using the accidentally correct behavior of wrong methods that would not normally give such correct results); and (3) that the characteristics of the machine’s intentions revealed by goal annotations in the event trace allow us to identify the skill level at which the machine was playing.

Strategies Implemented for the Three Skill Levels. Fig. 15 sketches how some developers might have implemented strategies that, they allege, yield valid novice, intermediate, and expert-level play at the three required skill levels. A challenge for testing is to determine whether playing with these strategies at the three levels correctly fulfills their specifications (defined in Fig. 12) and validly meets the original top-level requirement to play at the novice, intermediate, and expert skill levels. In particular, it is challenging to test whether the expert-level strategy of Fig. 15 correctly meets the specification that “experts never lose,” given in Fig. 12. novice: make immediate win if you can; block opponent’s immediate win if you must; else choose any available move with systematic variety. intermediate: win if you can; block win if you must; fork if you can; block fork if you must; else choose any available move with systematic variety. expert: case on move number (m) (m=1): choose with systematic variety any first move (m=2): choose with systematic variety any safe second move (m=3): if opponent made fatal second move then choose forcing move that leads to opportunity to make a fork and to win after forking; else choose with systematic variety any safe third move (m=4): block opponent’s immediate win if you must; else prevent opponent from creating fork either by blocking fork or making forcing move that prevents a fork. If no prevent-fork required, use augmented intermediate-level game strategy for moves (m>4). (m>4): {augmented intermediate-level strategy} win if you can; block win if you must; fork if you can; block fork if you must; else make best scoring move, where move score is # lines newly occupied + 1 for moving in center end case

Fig. 15. Implemented Strategies for Tic-Tac-Toe Skill Levels

Challenges in Domain Analysis. One of the startling facts about V&V is that domain analysis errors account for a greater share of errors uncovered during V&V than errors in any other category [Rey99]. These are not errors in the specification per se, or in the implementation of the specification, but rather mismatches between the assumptions that a specification, design, or implementation makes about the domain in which the system is used, and the actual properties of that domain. Neither the least common multiple example nor the ATM example involved difficult domain analyses, primarily because the domains involved (number theory and banking business rules) have been evolving for centuries and are presently both precisely understood and widely known. By contrast, concepts used in the domain analysis of Tic-Tac-Toe game playing programs are less commonly known, relatively unevolved, and likely to be error prone — traits they may well share with domain analyses for large new software systems, such as the FBI’s Trilogy system [NRC04] that is currently late, over-budget, and experiencing uncertainty that it can eventually meet its system requirements. The reader has probably noticed that, heretofore, we have not given precise definitions for technical words used in Figs. 11-15 to define specifications, event traces, and the implemented strategies for playing at the three required Tic-Tac-Toe skill levels. Examples of such undefined technical words and phrases are immediate win, fork, forcing move, fatal move, blocking a fork, safe move, blocking an immediate loss, etc.

Without understanding the precise meaning of these concepts, we cannot correctly implement the oracles contained in our requirements scenarios that use executable tests to verify whether goals have been correctly achieved according to the precise specifications of their meanings. For example, an oracle to test whether or not a given 5th move in a Tic-Tac-Toe game creates a fork is essential for testing the correctness of the expert-level strategy for achieving a sure win after the opponent has made a fatal second move, as shown in Figs. 11 and 14. That’s because the expert-level strategy for winning, in this case, is to make a forcing move that leads to the creation of a fork that guarantees a win. In particular, the expert-level strategy given in Fig. 15 calls for experts to know which second moves are safe and which are fatal. (Definitions: A fatal move is one for which the opponent can initiate a sequence of forcing moves leading to a loss for the player who made the fatal move. A safe move is one that is not fatal.) Fig. 16 shows the safe second moves that experts know about. Also, experts know how to initiate forcing moves leading to a win after an opponent makes a fatal second move.

Fig. 16. Experts Know About Safe and Fatal Second Moves

As an example of a problem in domain analysis error-detection, it is quite possible that the characterization of safe and fatal second moves given in Fig. 16 is incorrect and contains errors. If this is the case, we expect that such errors will be revealed when we analyze test cases consisting of event traces of expert-level Tic-Tac-Toe games. If the expert-level strategy uses the properties of safe and fatal second moves given in Fig. 16 to make its moves, and, say, one or more of the moves in Fig. 16 that is characterized as being safe is, in reality, fatal, then we would expect to find eventtraces of games in which experts thought they were making safe second moves that turned out to be fatal and resulted in actual losses. By this means, our V&V method based on testing goal-annotated event-traces, should be able to identify the domain analysis errors involved. Thus, we believe that an important additional benefit of our V&V method is an improved ability to detect domain analysis errors.

3 Related Work Our V&V approach compares actual system behavior in the form of captured goalannotated event traces with expected behavior expressed by requirements scenarios that are tied to system requirements goals.

Specification-based testing has been an area of research for a considerable period [Ric81], actively investigated at the present time for a variety of specification forms [for example Bri04, Khu04]. Our work is distinct from others in this area in that our testing is driven by requirements scenarios which are of immediate interest to stakeholders, connected back to stakeholder goals, and augmented using goals. While the specifications in question could be expressed using normal assertions, design by contract, Hoare logic, or other forms, our use of goals and scenarios provides a direct link to specifications important to and understandable by a wide range of stakeholders. Work on relationships between goals and refinement of goals into operationalizable requirements has been carried out for at least a decade, including goals reduced into both functional requirements (e.g., use cases [Coc97]) as well as non-functional ones. Of particular significance is the work on goal refinement by Lamsweerde et al [Lam98, Let02], Mylopoulos, Chung, Yu, et al [Chu00], and Rolland et al. [Rol01, Sal03] specifically the work on mapping goals to requirements scenarios. When we execute goal-annotated components, we collect event traces of their running behavior containing events, data results, and goal-annotations of interest. The Perpetual Testing project [Ric02], residual testing [Nas04, Pav99], Software Tomography [Bow02], Gamma Technology [Ors02] and Expectation-Driven Event Monitoring (EDEM) [Hil02] are research projects that address this problem as well. However, these projects generally do not rely on goals, and specifically do not use goal annotations in event traces to indicate what sub-goals the component was working on during execution. Work on goal-based planning and refinement of abstract plans has been carried out in Artificial Intelligence (AI) for several decades, including several applications of AI techniques to software engineering challenges. In the area of Requirements Engineering, Rolland et al. use maps to relate goals and strategies for achieving them [Rol01, Sal03]. In [Mem99], an AI planner is used to find sequences of operators that map an initial state into a solution state, and such operation sequences as test cases in a test suite to test a GUI. Our paper uses abstract plans, in the form of flowcharts with subgoals in them, to derive implementations by stepwise refinement that retain the subgoals used in the abstract plans as goal-annotations. These implementations are, of course, instances of the abstract plans, too. Thus, the plans in our approach are derived by software developers rather than by an AI problem-solver trying to mimic the purposeful actions of GUI users. We actually use the event-trace capture approach to collect traces of actual system use by users, while annotating the captured event trace with the goals the system was working on during user interacting. These goalannotated event traces (of actual behavior) are compared with scenario schemata that express expected behavior. If the two match, the test case passes. If the two do not match, the test case fails. Thus, the scenario schemata in our approach contain enough goal information to indicate whether the actual event trace was following the expected plan. Moreover, they contain an adequate test oracle to be able to check both final and intermediate results, thereby determining not only whether correct results were achieved but also whether they were achieved following the correct plan.

4 Conclusions and Plans for Future Work We believe our novel goal/requirements-based V&V method achieves the following potential benefits: (1) higher-yield testing: driving specification-based testing from scenarios and goals produces tests with higher yields, in terms of the number of significant errors detected per number of tests executed, so that less testing is required; (2) distinguishing false positives by checking goals and intentions as well as values; (3) defining test coverage metrics in terms of stakeholder goals and requirements: coverage of the plan automata and goal hierarchy provide one useful basis for defining test coverage metrics, and our future work includes additional approaches for linking tests to stakeholder confidence; (4) detecting domain-analysis errors: comparing goal-annotated event traces to requirements scenarios augmented by oracles based on specifications derived from domain analyses can distinguish implementation faults from domain analysis errors (either in the specifications themselves, or in the methods that fail to achieve results expected by the domain analyses, or both); (5) leverage for validating top-level requirements: provided that the goal refinement hierarchy is structured so that goals are always satisfied whenever their instrumental sub-goals are satisfied, we can propagate the satisfaction of low-level requirements goals up the goal refinement hierarchy to check that top-level requirements are being met by actual system behavior; and (6) efficiently controlling the degree of retesting: our method is structured to allow selective component retesting during integration testing and regression testing, so we can avoid retesting components that have not changed. Goals, their interrrelationships, and their relation to and distinction from other requirements entities are central to our approach. Our future work includes the possible use of i* [Myl97, Yu97] or the maps of Rolland et al. [Rol01, Sal03] in our approach. Even though we believe our V&V approach has these six potential benefits, we still need to demonstrate convincingly that the benefits are real, and not merely potential. To confirm our belief and demonstrate how effective our approach is, we must gather quantitative data from experiments comparing our new V&V approach to traditional approaches [Cla89]. We plan to identify suitable medium and large-scale systems to test, providing case studies that compare our approach to traditional methods.

5 Acknowledgements The authors thank the anonymous referees for their valuable comments.

References [Als05] Thomas A. Alspaugh. Temporally Expressive Scenarios in ScenarioML. Technical Report UCI-ISR-05-06. Institute for Softw. Research, Univ of California, Irvine. May 2005. [Boe81] B. W. Boehm. Software Engineering Economics. Prentice-Hall, 1981. [Bow02] J. Bowring, A. Orso, M. J. Harrold, Monitoring Deployed Software using Software Tomography, PASTE’02, Charleston, SC, USA, Nov. 2002.

[Bri04] L. C. Briand, Y. Labiche, and Y. Wang. Using Simulation to Empirically Investigate Test Coverage Criteria Based on Statechart. In 26th Intl Conf on Softw Engg, pp86-95, 2004. [Chu00] L. Chung, B. Nixon, E. Yu and J. Mylopoulos, Non-functional Requirements, in Software Engineering. Kluwer Academic, Boston, 2000. [Cla89] L. Clarke, A. Podgurski, D.J. Richardson, S.Zeil, A Formal Evaluation of Data Flow Path Selection Criteria, IEEE Trans on Software Engineering. 15(11): pp. 1318-1332, 1989. [Coc97] A. Cockburn, Structuring Use Cases with Goals, http://alistair.cockburn.us/crystal/articles/sucwg/structuringucswithgoals.htm (1997). [Dav90] A.M. Davis. Software Requirements: Analysis and Specification. Prentice-Hall, 1990. [Dav00] Alan M. Davis and Ann S. Zweig. The missing piece of software development. Journal of Systems and Software, 53(3) pp. 205-206, Sept 2000. [Eyn87] Charles Vanden Eynden, Elementary Number Theory, McGraw-Hill, Inc., 1987. [Hil02] D. Hilbert. Expectation-driven event monitoring (EDEM), http://www.ics.uci.edu/~dhilbert/edem/ (2002). [Jor94] P. C. Jorgensen and C. Erickson, Object-oriented integration testing, CACM, 37(9), pp. 30-38, Sept. 1994. [Khu04] Sarfraz Khurshid and Darko Marinov. TestEra: Specification-Based Testing of Java Programs Using SAT. Automated Software Engg., 11 (4) 403-434, 2004. [Lam98] A. van Lamsweerde and L. Willemet, Inferring Declarative Requirements Specifications from Operational Scenarios, IEEE Trans. on Softw. Engr., pp. 1089-1114, Dec. 1998. [Let02] E. Letier and A. van Lamsweerde, Deriving Operational Software Specifications from System Goals. In 10th Symp. on Foundations of Softw. Engg (FSE02), Nov. 2002. [Mem99] A. M. Memon, M. E. Pollack, M. L. Soffa. Using a Goal-driven Approach to Generate Test Cases for GUIs. In 21st Int’l Conf. on Softw. Eng’g (ICSE99), pp. 257-266, 1999. [Myl97] John Mylopoulos, Alex Borgida, and Eric Yu. Representing Software Engineering Knowledge. Automated Software Engg., 4 (3) 291-317 1997. [Nas04] L. Naslavsky, R. S. Silva Filho, C. R. B. De Souza, M. Dias, D. Richardson, and D. F. Redmiles. Distributed Expectation-Driven Residual Testing. 2nd Int. Workshop on Remote Analysis and Measurement of Software Systems (RAMSS'04), Edinburgh, UK, May 2004. [NRC04] National Research Council, A Review of the FBI’s Trilogy Information Technology Modernization Program, National Academies Press, Washington, D.C., 2004. [Ors02] D. Orso, M. Liang, M. J. Harrold, and R. Lipton. Gamma system: Continuous evolution of software after deployment. In Int Symp on Softw Testing and Analysis (ISSTA), 2002. [Pav99] C. Pavlopoulou and M. Young. Residual Test Coverage Monitoring, Proceedings of the 21st International Conference on Software Engineering (ICSE’99), May 1999. [Rey99] A. A. Reyes and D. J. Richardson, Siddhartha: A Technique for Building DomainSpecific Test Synthesizers, In 14th Intl Conf on Automated Softw Engg (ASE’99), 1999. [Ric81] Debra J. Richardson and Lori A. Clarke. A partition analysis method to increase program reliability. In 5th Intl Conf on Software Engg (ICSE’81), pp. 244-253. 1981. [Ric94] Debra J. Richardson, TAOS: Testing with Analysis and Oracle Support, in Proc. of the International Symposium on Software Testing and Analysis (ISSTA94), (Aug. 1994). [Ric02] D.J. Richardson. Perpetual Testing,. http://www.ics.uci.edu/~djr/edcs/. (2002). [Rol01] Colette Rolland and Naveen Prakash. Matching ERP System Functionality to Customer Requirements. In 5th Int’l Symp. on Requirements Eng’g (RE'01), pp. 66-75, 2001. [Rot96] Gregg Rothermel and Mary Jean Harrold, Analyzing Regression Test Selection Techniques, IEEE Trans. Softw. Engr., 22(8), pp. 529-551, Aug. 1996. [Sal03] Camille Salinesi and Colette Rolland. Fitting Business Models to System Functionality Exploring the Fitness Relationship. In CAiSE'03, pp. 647-664, 2003. [Sta94] The Standish Group, CHAOS Report, http://www.standishgroup.com/sample_research/chaos_1994_1.php, (1994). [Yu97] Eric S. K. Yu. Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering. In RE '97: 3rd Intl Symp on Reqs Engg (RE'97), pp. 226, 1997.