Performance Comparison of Techniques on Static Path ... - CiteSeerX

23 downloads 21848 Views 334KB Size Report
with real performance data. In this paper, we implement both ILP and model checking for static path analysis of. WCET, and the experiment results show that ILP ...
Performance Comparison of Techniques on Static Path Analysis of WCET Mingsong Lv, Nan Guan, Qingxu Deng and Ge Yu Institute of Computer Software and Theory Northeastern University, China P.R.

Abstract Static path analysis is a key process of Worst Case Execution Time (WCET) estimation, the objective of which is to find the execution path that has the largest execution time. Currently, there is an argument in the research community whether model checking is another good solution for WCET analysis, besides ILP. To our knowledge, no paper so far has addressed this argument with real performance data. In this paper, we implement both ILP and model checking for static path analysis of WCET, and the experiment results show that ILP yields very good performance, while model checking only works well for simple programs, and it is inclined to scalability problems when dealing with programs that have complex structures and large loop counts. Keywords: static path analysis, WCET, model checking, ILP, real-time

1. Introduction WCET analysis is an essential process in the development of real-time embedded systems because it provides schedulability analysis with basic timing infromation [12]. Results of the analysis should be safe and accurate: no actual execution of the program should exceed the estimated time, and the estimation should be as close as possible to the real maximal execution time of the program. An unsafe WCET will cause scheduling failure and the estimation accuracy affects the quality of schedulability analysis. Classical WCET analysis generally consists of two phases: high-level analysis and low-level analysis. The objective of high-level analysis is to reconstruct the Control Flow Graph (CFG) from an executable file and find the path that leads to the longest execution time. The low-level analysis computes the execution time of each basic block by modeling the hardware architecture such as instruction set, cache and pipelines. The two phases of analysis can be done

either separately [16] or in an integrated manner [11]. In this paper we focus on the former issue, which is also called static path analysis. Given the executable file of a program to be analyzed, static path analysis will first slice the program into basic blocks, which are sequences of instructions with no jumps within the block, and then organize the basic blocks into a CFG that maintains the control flow of the program. An upper bound is associated with each loop in the program to express the largest possible loop iteration. Additional constraints on the execution of the basic blocks can be manually added by the users or automatically generated by some analysis techniques. We call all these constraints including the loop counts flow facts [4]. Given the CFG, the flow facts and the execution time of each basic block obtained from the low-level analysis, there exist multiple execution pathes with one of which having the largest execution time: we call this path the Worst Case Execution Path (WCEP). Finding the WCET corresponding to the WCEP can be formulated as an optimization problem. Implicit-Path Enumeration Techniques (IPET) [4, 13, 19] encodes the structural information of the CFG, the flow facts and the linear constraints on execution counts of basic blocks into an ILP formulation, and the WCET value can be obtained by solving the ILP problem. In [11], Metzner argued that model checking [10] can be used to improve the accuracy of WCET analysis since a model checker checks every possible execution path and more accurate cache analysis can be performed alongside. The author also argued that model checking can compete with other techniques like ILP on performance. Wilhelm in [17] pointed out that model checking seems to encounter an exponential state-space explosion in WCET analysis. Neither of them provides concrete performance data to answer the question whether model checking is suitable for such problems. In this paper, we implement both the ILP formulation used in IPET and the model checking semantics proposed by Metzner to do static path analysis of WCET, and we use the experiment results to show that in static path analysis, model checking works well for programs with simple structures, but may

still encounter scalability problems when dealing with complex program structures with large loop counts, while ILP outperforms model checking in terms of both run time and memory usage. The rest of the paper is organized as follows: the problem statement and a motivating example are given in Section 2. Section 3 and Section 4 present the ILP formulation and the model checking solution to the static path analysis of WCET respectively. Experiment results on performance comparison are given in Section 5. Related work on WCET analysis is presented in Section 6, and Section 7 concludes the paper. (a) A motivating example

2. Problem Statement In this section, we give the assumptions made in our analysis and the definitions of the CFG. Then a simple program is presented as the motivating example to illustrate the methods applied throughout the paper.

2.1

The Problem Statement

The assumptions to the problem are stated as follows: – We assume an abstract processor model, which means any duration of the execution of a basic block derived from the abstract model must not be shorter than the duration of the execution on the real processor. Since only high-level analysis is discussed in this paper, we assume this property is guaranteed by the tool that estimates the execution time. – The programs analyzed in this paper are all staticallylinked executable files. This indicates that each invoked function is linked into the executable file and can be analyzed to generate the CFG. – Loops are implemented in either for-style or whilestyle, and the loop count for each loop is given. – No recursion is allowed. The optimization problem is defined as: given the CFG of a program with every basic block annotated with its execution time and the flow facts, find the upper bound for the execution time of the program, i.e., the execution time of the Worst Case Execution Path.

2.2

A Motivating Example

Figure 1-(a) is the motivating example used to illustrate the methods in this paper. The main body of the example is a while-loop with variable i as the loop counter, and the loop body contains an if-then-else structure. The loop bound is set to 10.

(b) The CFG

Figure 1. A motivating example and the corresponding CFG

Figure 1-(b) is the corresponding CFG generated by Chronos [8], a WCET tool developed at National University of Singapore. Chronos takes C code as input and compiles C code into SimpleScalar binary, which is further disassembled to reconstruct the CFG. To get a binary that best resembles the structure of the C code, the optimization option for GCC is turned off. The integer annotated on each transition is the execution time of the destination node of the transition, which are obtained by Chronos. These annotations are added as an extension to the traditional definition of CFG. The annotation method in our model is different from the classical way of expressing execution time where it is annotated on each node. Now we briefly explain this difference. Chronos allows the use to configure the processor by turning on/off cache and branch prediction modules, but pipeline modeling [9] is mandatory. The execution time of a basic block depends not only on the instructions inside the block, but also on the pipeline context of the basic block. For example in Figure 1-(b), if BB1 executes after BB0, it will take 5 cycles; but if it executes after BB6, the execution time is 4 cycles. This is because BB0 and BB6 provide different pipeline contexts for BB1. Actually the mendatory pipeline modeling leads to more accurate estimation without violating the safety property of WCET analysis. Note that BB0, the first node of a CFG, has no incoming edges. For specification consistency, we make up a fictitious node Sta and and an edge from Sta to BB0, and annotate BB0’s execution time on this edge.

2.3

The Formal Definition of CFG

A formal definition of a CFG can be given as follows: Definition 1 (Control Flow Graph (CFG)). A CFG is a tuple B = (BB, Sta, T er, TB , L) with:

– BB, the set of all basic blocks of the program, and the ith basic block is denoted as BBi – Sta, the fictitious starting basic block of the program, where Sta ∈ BB – T er, the only terminating basic block of the program, where T er ∈ BB – TB ⊆ BB × BB, the transition relations, we use ti j to denote the transition from BBi to BBj – L, the set of all the loops in the CFG, with each loop Loopi defined in Definition 2 We can also define an operation cost(ti j ) that returns the annotation on transition ti j , i.e., the execution time of the destination node of ti j . Operation src(ti j ) and dst(ti j ) return the source and destination node of transition ti j respectively. We can define a loop in the program: Definition 2 (Loop Structure). A loop in the CFG is a tuple Loop = (BODY, Head, T ail, T bj, BE, BL, lpc) with: – BODY ⊆ BB, the set of all basic blocks of a loop body – Head ∈ BODY , the first node of a loop body, where the loop counter is checked in each iteration – T ail ∈ BODY , the last node of a loop body, when Tail finishes execution, the program jump to the Head – T bj ∈ TB , the back-jump transition from Tail to Head – BL ⊆ BB, the set of all entry nodes to the loop Head – BE ⊆ BB, the set of all nodes exited from the loop body – lpb ∈ N is the loop bound In Figure 1-(b), BODY = {BB1, BB3, BB4, BB5, BB6}, BB1 is the Head, BB6 is the T ail, transition t6 1 is the T bj, set BL has one node BB0, set BE has one node BB2, and lpb = 10. Given the above definitions of the CFG and the loop, both ILP and model checking can be applied to analyze the program to find the WCET, which will be detailed in the next two sections.

3. The ILP Formulation ILP is widely adopted in IPET-based WCET analysis tools. The objective of WCET calculation is to solve the maximization problem specified in Equation (1), where xi j is the execution count of transition ti j , and cost(ti j ) is defined in Section 2.3. Note that the objective function has been modified according to our definition of the CFG.

wcet = M AX

X

(cost(ti j )∗xi j ) s.t. f lowf acts (1)

ti j ∈TB

There are three types of flow facts encoded in terms of ILP constraints. The first type of flow facts specifies the structural aspect of a CFG, i.e., for each node the total execution counts of all its incoming transitions equals the total execution counts of all its outgoing transitions. Equation (2) gives the formulation, where bi denotes the execution counts of the CFG node BBi. ∀ BBi, bi =

X

xk i =

dst(tk i )=BBi

X

xi j

(2)

src(ti j )=BBi

The second type of flow facts specifies that the program is entered and exited exactly once. This is done by setting the execution count of BB0 and T er to 1. For some programs, Chronos can also infer additional constraints on the execution counts of the basic blocks, and we leave these constraints in the ILP specification. X ∀ Loopi , bT aili ≤ lpbi · bj (3) BBj∈BLi

The third type of flow facts are the loop bounds provided as user inputs. In the worst case, each time a transition to the loop head (excluding the back-jump) is taken, the loop body executes lpb times. For well structured programs, we can safely use the fact that the execution count of the loop body equals the execution count of the loop tail. So the loop count constraints can be formulated in Equation (3). Figure 2 is the ILP formulation for the motivating example generated by Chronos. The maximization objective and the three types of flow facts are clearly listed in the ILP formulation. We can submit it to CPLEX or lp solve to get the results.

4. The Model Checking Solutions In this section, we will study how model checking techniques can be applied to find the execution path with largest execution time. First, the formal semantics of constructing an automaton from the CFG is defined; then the concrete models for both the explicit state model checker SPIN and the symbolic model checker NuSMV are illustrated.

4.1

Basic Semantics

4.1.1

The Definition of BBA

The CFG basically gives the static structure of a program. In order to calculate the WCET using model checking techniques, a finite state machine, that contains all the possible

Maximize

sic blocks of a program into the guards and actions of the corresponding transitions. To record the time consumed for each path, we define a variable wcet, which is initialized to 0 when the program starts, and each time a transition is taken, i.e., some basic block is executed, wcet is incremented by the value annotated on the transition to imitate the advance of time. In our analysis, the if-then-else structure is modeled in non-determinism, which means at each branching node of the BBA, either path can be taken. The problem is that all the information on the test conditions is lost in this model, which results in looser WCET estimation and larger state space. But in real programs, the test conditions are so versatile that it is hard to model them in either ILP or model checking. For example, in a test condition if (b >0), b is a double-precision real number, and before the if lots of calculation is performed to get the value of b. If you do not want to lose the concrete test information, your analysis tool should have the capability to handle real numbers and it has to simulation the calculation to get b’s value, which is impossible for ILP and some popular model checkers.

11 dSta_0 + 5 d0_1 + 1 d1_2 + 3 d1_3 + 5 d2_7 + 7 d3_4 + 5 d3_5 + 7 d4_6 + 8 d5_6 + 4 d6_1 Subject to \ === tcfg constraints === dSta_0 = 1 b0 - d0_1 = 0 b0 - dSta_0 = 0 b1 - d1_2 - d1_3 = 0 b1 - d0_1 - d6_1 = 0 b2 - d2_7 = 0 b2 - d1_2 = 0 b3 - d3_4 - d3_5 = 0 b3 - d1_3 = 0 b4 - d4_6 = 0 b4 - d3_4 = 0 b5 - d5_6 = 0 b5 - d3_5 = 0 b6 - d6_1 = 0 b6 - d4_6 - d5_6 = 0 b7 - d2_7 = 0 b0 = 1 b7 = 1 b6 - 10 b0 = lpbi

pathes of the program, is constructed from the CFG. Metzner in [11] gave the complete semantics on modeling the CFG and the cache. We basically adopt Metzner’s idea of modeling, but simplify the semantics by abstracting away the details on cache modeling, since we only discuss static path analysis. Some other modifications are also made according to our specific definition of the CFG.The semantics for the loops are transformed into guards and actions of the corresponding transitions. Similar to [11], we call this finite state machine the Basic Block Automaton (BBA) and define it as follows: Definition 3 (Basic Block Automaton (BBA)). The Basic Block Automaton is a tuple BBA (s0 , sT , S, V, T ) with:

=

– S, the set of states, where S = BB – s0 ∈ S, the initial state, where s0 = Sta – sT ∈ S, the termination state, where sT = T er – V is the set of variables used for loop counters and consumed time – T ⊆ S × S, the transition relation, where T = TB 4.1.2

Transition Semantics

The transition semantics is defined by translating the loops, the if-then-else structures and the execution of the ba-

Exit Node

...

Loop Body

Loop Tail

Exit Node

...

lpci < lpbi

Loop Body

lpci++

Loop Tail

Figure 3. Semantic transformation of loops For each loop a variable lpci is defined as the loop counter to record the number of finished iterations of Loopi . For each transition that leads to the loop head, lpci is initialized to 0, which means the starting of a loop. On each backjump transition from the loop tail to the loop head, lpci is incremented by 1 to indicate a new iteration. In our definition of a loop, loop head is the location where the loop counter is checked to decide whether to leave the loop or not. For the transition from the loop head to the exit block, (lpci ≥ lpbi ) is set as the guard, and for the transition from the loop head to the next node of the loop body, (lpci < lpbi ) is set as the guard. The above semantics implies the worst case execution of a loop: the loop body executes the maximal lpbi iterations and then exits. Putting them all together, we can give the formal definition of transition semantics. For a transition ti j = (BBi, BBj) ∈ T , the guards and actions are defined as follows: Definition 4 (Transition Semantics).

4.2

guard(ti j ) =

  (lpck < lpbk )               (lpck < lpbk )              T RU E

actionlp (ti j ) =

 {lpck = 0}           {lpck ++}         ∅

if ∃ k, Loopk ∈ L ∧ BBi = Headk ∧ (BBj ∈ BODYk ∧ BBj 6= Headk ) if ∃ k, Loopk ∈ L ∧ BBi = Headk ∧ BBj ∈ BEk else

if ∃ k, Loopk ∈ L ∧ BBj ∈ BLk ∧ ti j 6= T bjk if ∃ k, Loopk ∈ L ti j = T bjk else

action(ti j ) = actionlp ∪ {wcet+ = cost(ti j )} 4.1.3

Optimization Procedure

The above Basic Block Automaton and the transition semantics can be easily translated to the models of most model checkers. We can ask the model checker to proof the LTL property [] øN , where øN = (wcet ≤ N ). This LTL property specifies that ”for all execution pathes starting from the initial state, globally wcet is not greater than N”. If (N < acutal W CET ), the property is proofed false; if (N ≥ actual W CET ), the property is proofed true. So apparently, if and only if N = acutal W CET , [] øN is proofed true and [] ø(N −1) is proofed false. We use this property to decide if N is the actual value of WCET, and use binary search to find this value.

SPIN [2, 7] is an explicit state, on-the-fly model checker, which means that the system states are represented explicitly instead of using symbolic data structures, and the state space is constructed on-the-fly. Promela is used as the input language of SPIN, and only Linear Temporal Logic (LTL) is supported to specify properties. Figure 4 illustrates the SPIN model for the motivating example, which is automatically generated given the BBA of the program. Integer variable lpc*s are used as the loop counters and wcet are introduced to record the time passed. proctype BBA() implements the BBA by simulating all the state transitions and performing the corresponding actions. Each block led by line number "Si:" represents a state of BBA, and the statements wrapped in atomic{} implement the semantics of all possible transitions enabled in the current state. The init() proctype is mandatory in SPIN, which initializes variables and starts all the other user-defined proctypes. The never claim in the example implements the LTL property [] øN , denoted by p in the SPIN model. If (N ≥ actual W CET ), [] øN is evaluated true and SPIN exits with error 0; otherwise, [] øN is violated and SPIN writes the counter example into the trace file and exits with error 1. int wcet, lpc1; proctype BBA() { Sta: atomic { wcet = wcet + 11; S0: atomic { lpc1 = 0;

goto S0; }

wcet = wcet + 5;

goto S1; }

// S1 – S7 omitted due to limited space } init{ atomic{ wcet = 0; }}

lpc1 = 0;

run BBA();

#define p (wcet goto accept_all; ::else -> goto T0_init; fi; accept_all: skip }

Algorithm 1 Finding the WCET using binary search input: The model M of a model checker, initial value of N output: The optimal value found set the upper and lower bound of binary search while (lower bound < upper bound - 1) middle = (lower bound + upper bound) / 2; check the property [] ø(middle) if ([] ø(middle) is satisfied) upper bound = middle; else lower bound = middle; end while return upper bound

Modeling with SPIN

Figure 4. The SPIN model of the motivating example

4.3

Modeling with NuSMV

NuSMV [3, 14] is an reimplementation and extension of the original SMV model checker from Carnegie Mellon University. Different from SPIN, NuSMV is a symbolic model checker that uses Binary Decision Diagram (BDD)

to represent the state space symbolically. NuSMV permits multiple modeling styles, and direct specification of FSM in terms of propositional formulas. In this paper, we use the latter style to model the BBA of a program. Figure 5 is an automatically genereated NuSMV model for the motivating example. The VAR section gives the declaration of variables. The variables in the NuSMV model for a BBA are similar to that of the SPIN model, but note that in NuSMV a value range must be assigned to each integer variable. The ASSIGN section specifies the initialization of the variables, and the follwing TRANS section declares the transition system of an FSM. For each state in the BBA, a (state=SSi)->next(...) structure is constructed to specify that ”if the BBA is in state SSi, the values of the variables should change according to the next statements”. All the state structures are connected by &. In each run of the TRANS, all the states are evaluated, and only the matched state will be enabled and the corresponding actions are performed. The property specification language of NuSMV can be either Computational Tree Logic (CTL) or Linear Temporal Logic (LTL). In our model, we use the CTL formula AG (wcet