A Dependability-Focused Requirements

A Dependability-Focused Requirements Engineering Process Sadaf Mustafiz and J¨org Kienzle School of Computer Science, McGill University, Montreal, Canada [email protected], [email protected]

Systems are developed to satisfy a set of requirements that meet a need. A requirement that is important in missionand safety-critical systems is that they be highly dependable. Dependability [4] is that property of a computer system such that reliance can justifiably be placed on the service it delivers. Dependability involves satisfying several requirements: availability, reliability, safety, maintainability, confidentiality, and integrity. The dependability requirement varies with the target application, since a constraint can be essential for one environment and not so much for others. In this paper, we focus on the reliability and safety attributes of dependability. Reliability. The reliability of a system measures its aptitude to provide service and remain operating as long as required [3]. Reliability of a service is typically measured in probability of success of the service, once requested, or else in mean time to failure. If the average time to complete a service is known, it might be possible to convert between the two values. Safety. The safety of a system is determined by the lack of catastrophic failures it undergoes [3]. The seriousness of the consequences of the failure on the environment can range from benign to catastrophic. Seriousness of consequences can be measured with a safety index. For instance, the DO-178B standard for civil aeronautics defines safety index values from 0 to 4 with the following meaning: 0. Without effects; 1. Minor effects lead to upsetting the stakeholders or increasing the system workload; 2. Major effects lead to minor injuries of users, or minor physical damage or monetary loss; 3. Dangerous effects lead to serious injuries of users, or serious physical damage or monetary loss; 4. Catastrophic effects lead to loss of human lives, or destruction of the system. Each application has different safety requirements. It is now up to the developer in consultation with all the stakeholders to define the number of safety levels to consider, and their exact definitions. Requirements engineering involves several activities: discovery and elicitation of the system functionality, properties and qualities, specification of the requirements and precise definition of the system boundary, and analysis of the requirements to ensure their correctness and completeness. If the analysis reveals undesired properties or flaws, the specification has to be refined. Once the system is implemented, the requirements can be validated against the running system. Our use case driven process that focuses on the development of dependable reactive systems defines for each of these activities detailed steps that lead the developer to pay particular attention to system safety and reliability. The following subsections describe each activity in detail.

1. Requirements Elicitation and Discovery Task 1: Discovering Actors and Goals. The first activity in use case based requirements elicitation consists in establishing a list of actors and stakeholders, with a special emphasis on primary actors, i.e. external entities in the environment that interact with the system in the pursuit of a well defined goal. Secondary actors, i.e. external entities that are required by the system in order to achieve its functionaliy, are also documented during this brainstorming activity, if their use is indeed part of the requirements and is not already part of the solution domain. For each of the discovered goals, a use case outline is written. This outline consists in a textual summary of the goal, an explanation of the context in which the primary actor wants to achieve the goal, and a clear description of the value or

service that the system has to provide to satisfy the primary actor. Complex goals can be split into several subgoals to form a hierarchy [1], in which case a use case outline is written for each of the subgoals. Task 2: Eliciting Dependability Expectations. For dependable systems, it is at this phase important to discover the requirements with respect to safety and reliability. Mission and safety-critical systems often have to comply with safety standards, but even if the requirements do not require compliance with a standard, stakeholders and primary actors explicitly or implicitly expect a certain degree of safety and reliablility from a dependable system. To document the desired dependability, reliability and safety annotations have to be added to the use case outlines that, for each goal, specify the desired probability of successful achievement of the goal, as well as the maximum tolerable probability of occurrence of a safety violation. Also, for each service offered by a mandatory secondary actor, the developer has to document the service’s reliability and safety properties. If the secondary actor is a piece of hardware, then the reliability can be found in the specification manual. Task 3: Discovering Environment-Related Exceptions. As a next step, the developer has to focus on environment-related exceptional situations that the system under development could be exposed to. To help discover these situations, the following questions should be answered: • What environment-related situations prevent the entire system from providing safe service? • What environment-related situations prevent the system from satisfying a primary actor’s goal (or subgoal)? • What environment-related situations take priority over the primary actor’s goal? • What environment-related situations could make the primary actor change his goal? For each exceptional situation that is discovered, a named exception is defined, together with a small text that describes the situation in more detail. All discovered exceptions are documented in an exception table. This activity typically leads to the discovery of new exceptional goals. Often, the occurrence of the situation can not be detected by the system without help from the environment, which means that new exceptional actors have to be introduced. For example, in an elevator system where safety is the main concern, in case of a fire outbreak in the building, the elevator operator or a smoke detector, both exceptional actors, should activate the fire emergency mode of the elevator control software. Task 4: Specifying Handler Use Cases. For each identified exception, a handler use case outline has to be established describing how the system is supposed to react or recover from that situation. These new exceptional goals should then again be annotated with desired safety and reliability values. A handler can be further classified as a safety or reliability handler depending on the concern it attempts to satisfy. A handler can also be linked to one or several contexts, i.e. use cases during which the exceptional situation can occur. Upon reception of the exception, the current interaction is interrupted and the exceptional interaction begins.

2. Requirements Specification Now that the goals and subgoals have been identified, detailed use case descriptions have to be elaborated for each of them. A use case describes, without revealing the details of the system’s internal workings, the system’s responsibilities and its interactions with its environment as it performs work that, if successfully completed, satisfies the goal of the primary actor. Use cases are in general text-based, so that they can be effectively used as a communication means between technical as well as non-technical stakeholders. We suggest to describe use cases with a pre-defined template as done by others [6], which forces the developer to explicitly document all relevant features. Reactive systems only perform work or produce output after they receive an input event. Therefore, the main parts of our use case template consist of a numbered list of individual base interaction steps, each one describing either an input interaction – an external actor decides to send a message/data/event to the system – or an output interaction – the system sends a message/data/event to an external actor. If a use case is decomposed into subfunction-level use cases, a step can also be a reference to a lower level use case, which in turn describes the base interaction steps that leads to the completion of the subgoal. In any case however, a use case describing a user goal could be flattened into a sequence of base interaction steps, if needed.

Task 5: Designing Interactions. The standard way of achieving a goal is described in the main success scenario part of the template. The ordering of the individual interaction steps are often dictated by logic, by required usage patterns, by user interfaces, or by protocols enforced by secondary actors interacting with the system. Where flexibility exists, the stakeholders should be consulted to choose the most adequate interaction pattern. Alternatives sequences of interactions, or situations in which user goals are abandoned are listed in the extensions section of the template. It is important to always specify if a sequence of interactions ends in success, i.e. the user goal was satisfied, or if the user goal was abandoned. Failure of the user goal is handled in the next step. Task 6: Discovering Service-Related Exceptions and Effects on System Reliability and Safety. The successful completion of a user goal may be threatened due to service-related exceptional situations. Service-related exceptions have many natures: The system state make the provision of a service impossible1 , Failure of secondary actors that are necessary for the completion of the user goal, Failure of communication links between the system and important secondary actors, Actors violate the system interaction protocol, i.e. they invoke system services in the wrong order, or at the wrong time. Possible service-related exceptions can be discovered most effectively following a bottom-up approach. Our process requires the developer to examine each individual base step of a use case, sub-use case or handler, and reflect on the consequences that a failure of the step has on reliability, i.e. the achievement of the goal, and on system safety. The developer should answer the following questions: • If this step is omitted, will the goal fail? If yes, the step should be annotated with a reliability tag, together with the probability of success of the step. • If this step is omitted, is the safety of the system threatened? If yes, the step should be annotated with a safety tag, together with the corresponding safety level and the probability of success of the step. A named exception should be defined for each service-related exceptional situation, together with a small text that describes the situation in more detail. For example, in an elevator system, motor failure, i.e. the situation in the motor does not react to commands anymore because of a hardware or communication failure, is a serious threat to safety and reliability. The exception is then added to the exception table of environmental exceptions for documentation reasons. • • • •

3. Requirements Analysis Task 7: Analyzing Safety and Reliability. In [5] we proposed a model-based approach for analyzing the safety and reliability of our use cases. Since each interaction step in a use case is annotated with a probability reflecting its chances of success, and a safety tag if the failure of the step hampers the system safety, it is possible to map the use case to a formalism that is well-suited for dependability analysis. We proposed to use DA-Charts, a probabilistic extension of state charts. We have implemented our formalism in the AToM3 tool [2] to provide support for automatic dependability analysis. The tool also verifies the formalism constraints and ensures that the mapping rules are adhered to. Based on path analysis, the tool quantitatively determines probabilities of reaching safe or unsafe states, or achieving the goal, providing a degraded success, or failing. For details on the dependability analysis see [5]. The dependability determined by the tool can now be compared with the dependability required by the stakeholders as determined in task 2. If the analysis reveals an acceptable level of reliability and safety, then the requirements engineering process is complete, and a summary specification can be established (see tasks 10 and 11). Otherwise, the requirements need to be refined with handler use cases that address the service-related exceptions, which is described in tasks 8 and 9.

4. Dependability-based Refinement and Iteration Based on the output of our analysis tool, the service-related exceptions that have significant negative effect on the system’s reliability and safety can be identified. To improve the situation, the following tasks should be performed for each one of them. 1

Addressing these situations is of course not new to our approach. Standard use case driven requirements engineering techniques usually specify the handling of such situations in an extension section of the use case.

Task 8: Specifying Detection Mechanisms. Before any recovery actions can be taken by the system, the exceptional situation has to be detected. The developer should investigate if the current actors and their interactions make the detection possible, and if not, adapt the interaction pattern or even add secondary detection actors to the system’s environment. Detection is usually done differently for input and output interactions. Omission of input to the system can usually be detected using timeouts. Invalid input data can be detected with checksums, etc. In both cases, no additional detection actors have to be introduced. The use case has to be updated by adding the discovered exception to the extension section of the template as an alternative to the essential input step. Output failure is more difficult to handle. Whenever a system output triggers a critical action of an actor, then the system must make sure that it can detect eventual communication problems or failure of an actor to execute the requested action. This very often requires additional hardware, e.g. a sensor, to be added to the system. The job of this new actor is to inform the system that the essential actor successfully executed the system’s request. This new acknowledgement step has to be added to the main success scenario after the essential output step in the use case or handler, and an exception representing the failure of the output, detected by a timeout while waiting for the acknowledgement, is added to the extension section as an alternative to the acknowledgement step. For example, an elevator control software might request the motor to stop, but a communication failure or a motor misbehaviour might keep the motor going. Additional hardware, for instance, a sensor that detects when the cabin stopped at a floor, might be necessary to ensure safety or reliability. Task 9: Specifying Handler Use Cases. If the exception puts the user in danger, then measures must be taken to put the system in a safe state. If the exception threatens the successful completion of the user goal, reliability is at stake. It should then be investigated if the system can recover and meet the user goal in an alternative way. In any case, exceptional interaction steps with the environment are performed during recovery, and hence must be specified in a separate reliability or safety handler use case2 . Very often, actors – especially humans – are “surprised” when they encounter an exceptional situation, and are subsequently more likely to make mistakes when interacting with the system. Exceptional interactions must therefore be as intuitive as possible, and respect the actor’s needs. If the goal of the primary actor cannot be achieved, then it is of paramount importance to inform him of the situation by an appropriate output interaction. In some cases, it might not be possible to satisfy a user’s goal completely, but a dependable system can instead offer a degraded form of service. The handler use case should then specify degraded success as the outcome of the interaction. If omission of input from an actor can cause the goal to fail, then, once the omission has been detected, different options of handling the situation have to be considered. For instance, prompting the actor for the input again after a given time has elapsed, or using default input are possible options. Safety considerations might make it even necessary to temporarily shutdown the system in case of missing input. Invalid input data is another example of input problem that might cause the goal to fail. Since most of the time the actors are aware of the importance of their input, a reliable system should also acknowledge input from an actor, so that the actor realizes that he is making progress in achieving his goal. To complete this task, every interaction step of the newly defined handler must again be analyzed and the essential steps tagged with reliability probabilities and safety information. The developer must then go back to task 7 and analyze the updated use cases to determine if the required safety and reliability requirements can now be satisfied.

5. Requirements Summary Task 10: Use Case Summary. Whereas individual use cases are text-based, the UML use case diagram provides a concise high level view of the use cases of a system. It allows developers to graphically depict the use cases, the actors that interact with the system, and the relationships between actors and use cases. In a use case diagram, standard use cases appear as ellipses, associated to the actors whose goals they describe. In [7] we extended use case diagrams and proposed to identify handler use cases with a < > stereotype or even a different symbol in order to differentiate them from standard use cases. We also suggest classifying the handlers as a < > or a < >. Having different type of handlers enables quick identification of functionality that affects safety or reliability of a system, as well as identification of safety-critical zones (or functionality). It allows the developer in collaboration with the stakeholders to decide, for instance, how much resources should be allocated 2 Separation of handlers also enables subsequent reuse of handlers. Just like a subfunction-level use case can encapsulate a subgoal that is part of several user goals, a handler use case can encapsulate a common way of handling exceptions that might occur while processing different user goals. Sometimes even, different exceptions can be handled in the same way.

to the development of the functionality defined in the handler use cases, or to prioritize between safety and reliability in case of conflict. Handler use cases are associated to a base use case, which may be any standard use case or other handler use case. We suggest to depict this association in the use case diagram by a directed relationship (dotted arrow) linking the handler use case to its base use case. This relationship is very similar to the standard UML < > relationship. It specifies that the behavior of the base use case may be affected by the behavior of the handler use case in case an exception is encountered. In case of an occurrence of an exceptional situation, the base behavior is put on hold or terminated, and the interaction specified in the handler is started. A handler can temporarily take over the system interaction, for instance to perform some compensation activity, and then switch back to the normal interaction scenario. In this case, the relationship is tagged with a < > stereotype. Some exceptional situations, however, cannot be handled smoothly, and cause the current goal to fail. Such dependencies are tagged with < >. The exceptions that activate the handler use case are added to the interrupt relationship in a UML comment, similar to what is done for extension points. Task 11: Exception Summary Table. For traceability and documentation reasons, all discovered environmental and service-related exceptions are recorded in a table during tasks 3 and 6. As a summary, the entries in this table already contains a small textual description of the exceptional situation should be complemented with the exception contexts in which the exception can occur, the associated handler(s), and the mechanism for detecting the exception.

References [1] A. Cockburn. Writing Effective Use Cases. Addison–Wesley, 2000. [2] J. de Lara and H. Vangheluwe. AToM3 : A tool for multi-formalism and meta-modelling. In ETAPS, FASE, LNCS 2306, pages 174 – 188. Springer, April 2002. [3] J.-C. Geffroy and G. Motet. Design of Dependable Computing Systems. Kluwer Academic Publishers, 2002. [4] J. C. Laprie, A. Avizienis, and H. Kopetz, editors. Dependability: Basic Concepts and Terminology. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1992. [5] S. Mustafiz, X. Sun, J. Kienzle, and H. Vangheluwe. Model-driven assessment of use cases for dependable systems. In MoDELS, volume 4199 of LNCS, pages 558–573. Springer, 2006. [6] S. Sendall and A. Strohmeier. Uml-based fusion analysis. In UML’99, Fort Collins, CO, USA, October 28-30, 1999, volume 1723 of LNCS, pages 278–291. Springer Verlag, 1999. [7] A. Shui, S. Mustafiz, J. Kienzle, and C. Dony. Exceptional use cases. In MoDELS, volume 3713 of LNCS, pages 568–583. Springer, 2005.