From Timed Automata to Timed Failure Propagation Graphs1

0 downloads 0 Views 1MB Size Report
Jul 20, 2012 - The reset at the transition from l1 to l3 sets the clock c1 back to 0. In addition to time ...... A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr.
From Timed Automata to Timed Failure Propagation Graphs1

Technical Report tr-ri-12-325

Claudia Priesterjahn, Christian Heinzemann, Wilhelm Sch¨afer Software Engineering Group Heinz Nixdorf Institute University of Paderborn Warburger Str. 100 D-33098 Paderborn, Germany [cpr|c.heinzemann|wilhelm]@uni-paderborn.de

Paderborn, July 20, 2012

1

This work was developed in the course of the Special Research Initiative 614 - Selfoptimizing Concepts and Structures in Mechanical Engineering - University of Paderborn, and was published on its behalf and funded by the Deutsche Forschungsgemeinschaft.

Table of Contents

1 2 3

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timed Failure Propagation Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Generating Timed Failure Propagation Graphs . . . . . . . . . . . . . . . . . . . . 4.1 Step 1 – Computing the Reachable Behavior of the Software Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Step 2 – Computing the Refined Context of the Reachable Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Step 3 – Computing the Reachable Behavior with Injected Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Step 4 – Constructing the TFPGs . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 3 9 10 13 13 16 19 21 21

1

Abstract Embedded real-time systems are increasingly applied in safety-critical environments like cars or aircrafts. Even though the system design might be free from flaws, hazardous situations may still be caused at run-time by random faults due to the wear of physical components. Hazard analysis is based on fault trees or failure propagation models. These models are created at least partly manually. They are usually independent from the software models which are used for checking safety and liveness properties to avoid systematic faults. This is particularly bad in cases, where the software model contains manually specified operations to deal with random faults which have been identified by hazard analysis. These operations include replacing the faulty components by reconfiguration. We propose to generate a failure propagation model automatically from the software model to check whether the results of hazard analysis have been properly accounted for the specification of reconfiguration operations. In contrast to a few existing other approaches, our approach considers the real-time properties of the system and adds explicit failure propagation times based on using timed automata for model specification.

1

Introduction

Embedded real-time systems are increasingly interacting with the physical world. They are usually software intensive, i.e., their functionality and correctness depend on high quality software. As they are often employed in safety-critical contexts, guaranteeing this high quality software becomes an absolute must. Besides testing and simulation, formal verification of a software model and automatic code generation have become an accepted approach to improve software quality and guarantee correctness. Formal verification enables, in particular, checking safety and liveness properties of the system under development. This prevents systematic faults caused by malfunctioning software. However, due to the interaction with the physical world, errors in such systems may also be caused by random faults2 that may occur, for example, due to the wear of physical components. Dealing with random faults caused by malfunctioning hardware is based on hazard analysis [12, 17]. This analysis checks possible combinations of hardware faults that lead to a hazard and computes the probability of the hazard’s occurrence. The system developer uses this information to implement the system and, in particular, its software such that the probability of the occurrence of a hazard is acceptable. This means, the software guarantees that the hazard only occurs with a given probability [17]. This can be achieved by, e.g., the replacement of faulty components such that their faults do not result in hazards. However, model checking for safety and liveness properties to avoid systematic faults is completely independent from hazard analysis to handle random faults. 2

According to Laprie et al. [2], an error is the deviation from a correct system state. A fault is the cause of an error.

2

Specifying the reaction to certain faults, e.g., by the replacement of faulty software components is usually a purely manual task. Especially, the consideration of timing issues in the software implementation and its corresponding model is very difficult. In addition, these timing issues make the software and its corresponding model very complex. It is thus an error-prone task to check manually whether the results of hazard analysis are properly reflected in the software, i.e., whether the reconfiguration can prevent the hazard. Take for example a modern car that brakes by wire. When the braking pedal is pushed, the brake controller sets the value of the braking force to be applied to each wheel individually. The value depends on the speed of each wheel which is measured by a speed sensor. If a random fault occurs in a speed sensor, a wrong value is passed to the brake controller. In this case, the braking force might not be high enough to bring the car to a halt early enough which might result in a collision. To prevent such a situation, a reconfiguration may be specified which replaces the faulty sensor by a spare which is still working. The fault may be detected, e.g., using model-based fault diagnosis [16]. This reconfiguration, however, only prevents the hazard if it is performed fast enough such that the wrong speed value does not propagate to a critical point in the system, i.e., the brake controller. Some approaches suggest to analyze the software model after development for a proper integration of the results of hazard analysis, i.e., to check whether hazards are prevented. These approaches propose to automatically extract information from the software model like a fault tree or a failure propagation model which is used for hazard analysis [10, ?]. These approaches do not take time into account. Consequently, they lack important information to guarantee correct system behavior when dealing with random faults in real-time systems as illustrated by the example above. In this paper, we present an approach for the automatic generation of timed failure propagation graphs (TFPG) from timed automata [1, 4]. TFPGs are failure propagation models [7] annotated with the propagation times of failures. The benefit of TFPGs is their minimality concerning the information needed for hazard analysis. For the software specification of embedded real-time systems, various versions of timed automata have proven to be a very useful and adequate specification mechanism. For the scope of this paper, we use the definition of [4], but our approach can be easily adapted to other similar formalisms. In the next section, we introduce our models for the system structure and behavior. We then present TFPGs in Section 3. The description of the generation of TFPGs from the system behavior model follows in Section 4. Section 5 discusses related work before we conclude the paper in Section 6.

3

2

System Model

Since hazard analysis is based on the propagation of failures through system components as described in [14], we model the system by means of components. For each component, we model its behavior by timed automata. In the course of this paper, we use the component model of MechatronicUML [3]. Components communicate via ports. Components and ports are instantiated in a component instance configuration [3]. The component instances are connected via their ports using connectors that specify an additional behavior for modeling delays of messages. Figure 1 shows an example of a component instance configuration. It contains two component instances c1 and c2 of types Component1 and Component2, respectively. c1 and c2 are connected via their ports p3 and p4. The ports are directed such that messages may only flow in the direction indicated by the triangles.

p1 p2

c1:Component1 p3

p4 c2:Component2 p5

Fig. 1. Component Instance Configuration consisting of two Component Instances

We model the behavior of software components by timed automata as defined in [4]. A timed automaton is a finite automaton that is extended by a set of realvalued variables called clocks. By using timed automata, the developer defines time-dependent behavior. Thus, the behavior of the component does not only depend on its inputs but also on the point in time when these inputs are received. This is essential for modeling real-time systems. For the definition of timed automata, we require a definition of clock constraints first. They are used to model conditions on the values of clocks in a timed automaton. Definition 2.1 (Clock Constraint) Let C be a set of real-valued clocks. A clock constraint B is a conjunctive formula of atomic constraints of the form x ∼ n or x − y ∼ n for x, y ∈ C, ∼∈ {≤, , ≥} and n ∈ N. We use B(C) to denote the set of clock constraints. [4] The set of clock constraints is now used in the definition of a timed automaton. Based on the clocks, a timed automaton specifies time guards, clock resets, and invariants. A time guard is a clock constraint that restricts the execution of a transition to a specific time interval. A clock reset sets the value of a clock back to 0 when a transition is fired. Invariants are clock constraints associated

4

with locations that forbid that a timed automaton stays in a location when the clock values exceed the values of the invariant. Definition 2.2 (Timed Automaton) A timed automaton A is a tuple A = (L, l0 , C, Σ, E, I) where – – – –

L is a finite set of locations, l0 ∈ L is the initial location, C is a finite set of clocks, Σ = (D × p × {?, !}) ∪ {τ } is a finite set of actions where D is a set of action names, p a set of port names, and τ is the empty action, – E ⊆ L × B(C) × Σ × 2C × L is the set of transitions where ϕ ∈ B(C) is the transition guard and λ ∈ 2C are the clock resets, and – I : L → B(C) assigns clock constraints to locations, the invariants. ϕ,a,λ

We shall write l −−−→ l0 when (l, ϕ, a, λ, l0 ) ∈ E. (cf. [4]) Figure 2 shows an example of a timed automaton. The timed automaton consists of eight locations l0 to l7 and nine transitions connecting the locations. The invariant c1 < 15 of location l0 specifies that l0 may only be active while the value of c1 is less than 15. Accordingly, the time guard 5 ≤ c1 ≤ 10 restricts the firing of the transition from l0 to l1 to the according time interval. The time intervals are interpreted with respect to the values of the clock c1 and not with respect to the global time that has passed since the system has been started. The reset at the transition from l1 to l3 sets the clock c1 back to 0. In addition to time guards and resets, a transition may carry an action symbol from Σ that specifies input actions and output actions of the timed automaton. Input actions are denoted by ?, output actions by !. In order to relate actions to ports of components, we prefix the action by the name of the port. We thus use actions of the form p.a. p specifies the port name and a the action name. In our example, the input action p1 .i1 ? specifies that input i1 is received via port p1 . The output action p3 .o1 ! specifies that the output o2 is sent via port p3 . To analyze the behavior of the timed automaton, we need to provide an environment model that sends the inputs of the timed automaton and receives its outputs. We call this environment the context of the timed automaton and its corresponding component. The context specifies timed automata for all inputs and outputs of the component’s timed automaton. Figure 3 shows a context automaton of the component automaton of Figure 2. It consists of two locations that are connected by a transition. The time guard 5 ≤ c2 ≤ 10 restricts the firing of the transition to the given time interval. The transition sends the output action i1 via port p1. Given the timed automata of Figures 2 and 8, the timed automata can now interact with each other using the actions of the joint set of actions Σ. Such a set of interacting timed automata is referred to as a network of timed automata [4]. We formalize networks of timed automata as follows.

5 c1≥40 c1:= 0 x:=0 l5 5≤ c1≤ 10 p1.i1? x:=-3

10