Instructions for Preparing the Camera Ready Papers for Publication in

2 downloads 712 Views 535KB Size Report
control-flow error recovery methods with a rollback recovery mechanism for using in ... CFE Recovery using Data-flow graph Consideration (CRDC) and. CFE Recovery using Macro .... RAM=6GB, OS=Linux Ubuntu10). During our experiment.
Two Control-flow Error Recovery Methods for Multithreaded Programs Running on Multi-core Processors N. Khoshavi, H. R. Zarandi, M. Maghsoudloo Abstract - In this paper we consider two software-based control-flow error recovery methods with a rollback recovery mechanism for using in multithreaded architectures. Disregarding to thread interactions between different threads by previous CFE recovery techniques caused these methods not be suitable in multithreaded architectures. Furthermore, the high memory and performance overheads of these techniques may be problematic for real-time embedded systems which have tight memory and performance budget. Therefore, regarding to the importance of handling the CFE, unsuitability of the conventional related techniques to be utilized in the modern processors and high memory and performance overheads of previous CFE recovery techniques, two low-cost control-flow error recovery techniques, CFE Recovery using Data-flow graph Consideration (CRDC) and CFE Recovery using Macro block-level Check pointing (CRMC), are presented in this paper. The proposed recovery techniques are composed of two phases of control-flow error detection and control-flow error recovery. These phases are achieved through inserting additional instructions into program at compile time regarding to dependency graph. This graph is extracted to model control-flow and data dependencies among basic blocks and thread interactions between different threads of a program. In order to evaluate the proposed techniques, five multithreaded benchmarks Quick Sort, Matrix Multiplication, Bubble Sort, Linked List and Fast Fourier Transform utilized to run on a multicore processor, and a total of 5000 transient faults has been injected into several executable points of each program. Fault injection experiments show that tolerable performance and memory overheads with noticeable error recovery coverage can be achieved via proposed techniques.

I. INTRODUCTION Recently, multi-core processors have introduced as viable way to keep performance improvement rates within a given power budget [1]. Multithread programming energized performance of multi-core processors by extracting thread level parallelism from the sequential program flow. When a sequential program is parallelized conventionally, the programmer or compiler needs to ensure that threads are free of data dependences. If data dependences do exist, threads must be carefully synchronized to ensure that no violations occur. Additionally, advances in CMOS technology have provided reduction in transistor size and voltage levels. Reduction in transistor size and voltage levels coupled with increased sensitivity of microprocessors to transient faults. One of the major threats in modern microprocessors is N. Khoshavi, H. R. Zarandi and M. Maghsoudloo are with the Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran, E-mail: {navid.khoshavi, h_zarandi, m.maghsoudloo}@aut.ac.ir

transient faults which induced by energetic particle strikes, such as high-energy neutrons from cosmic rays, and alpha particles from decaying radioactive impurities in packaging and interconnect materials [2], [3]. It has been shown that considerable fraction of transient faults, between 33% and 77%, reflects control-flow errors, such as possible errors in program counter (PC), address circuits, steering and control logic [4]. A Control-flow Error (CFE) is said to have occurred if the processor executes an incorrect sequence of instructions [2]. Numerous software-based CFE detection techniques have been devised to assess processor errors [2], [3], [5], [6], [8], [9]. In these approaches firstly, program code is partitioned into basic blocks and secondly, extra instructions are added to each basic block in order to verify the flow of code execution. Basic block includes a maximal set of ordered non-branching instructions (except in the last instruction) [2]. A unique signature is assigned to each basic block at design time. Signatures also are calculated at run-time and next compared with the original ones. If any mismatch has observed (by the added instructions), an error is detected and reported. Unfortunately, only a few published works have concentrated on CFEs correction [4], [7], [10]. Correcting the CFE is not generally efficient and the program may fail since there may be some data errors generated by the CFEs [4]. Therefore, any data errors caused by CFE should be corrected after or during correcting the CFE, as well. In addition, in multi-core systems, since all processors share a single view of data, the method which corrects CFEs and data errors should take into account synchronization and communication dependencies between threads of multithreaded program. Disregarding to thread interactions between different threads by previous CFEs correction techniques caused these methods not be suitable in multithreaded architectures. Furthermore, the high memory and performance overheads of previous CFE recovery techniques may be problematic for real-time embedded systems which have tight memory and performance budget. Therefore, regarding to the importance of handling the CFEs, unsuitability of the conventional related techniques to be utilized in the modern processors and high memory and performance overheads of previous CFE recovery techniques, two backward error recovery techniques are proposed in this paper. CRDC technique utilizes control/data dependency graph consideration to recover CFEs and data errors. In this

technique, signature of source basic block and the signature of destination one are given to CFE_handler function as inputs and control transferred to the nearest basic block wherein the modified variables between source and destination are re-initialized. To correct CFE and data errors in CRMC technique, we use distinctive checkpoints according to the location of basic blocks in the dependency graph and thread interaction instructions. Two new concepts, macro basic block and shadow variables, are introduced to decrease the imposed overheads associated with checkpoint-based methods in this section. Simulation fault injection has been used to evaluate recovery capability of the proposed techniques. To evaluate the techniques, five modified multithreaded benchmarks have been used and the GNU debugger, GDB [11] has been used to inject faults on the program. It has been shown that using the approaches presented in the paper, acceptable performance and memory overheads with noticeable error recovery coverage can be achieved. The structure of this paper is as follows: Section 2 introduces control-flow error detection and recovery techniques. Simulation environment and experimental results are presented by section 3. Finally Section 4 concludes the paper. II. CFE DETECTION AND RECOVERY The CFE detection methods used in the CRDC and the CRMC are quite similar, and the differences between the proposed methods which have emerged in Fig. 1, are only generated because of applying different types of recovery. After determining control dependencies among basic blocks of the program, each node of the dependency graph should be labeled by a unique signature. The sequence of these signatures is checked at run-time by the instructions added at the end of each basic block. The checking instructions compare the value of the run-time signature with the pre-defined value assigned to each block at compile time. If an illegal jump occurs before added instructions at the end of the basic block and control transfer to it illegally, then the CFE can be detected by comparing the stored value in the SSj (as the signature of the node) with another one calculated in compile time. Eventually, the control of program is transferred to CFE_handler function of related CFE recovery method. A. The Proposed CRDC Technique When a CFE is detected through added instructions in the CRDC technique, the control is transferred to CFE_handler function. This function is implemented by considering the DFG and CFG of the program at design time. The signatures of the source and destination basic blocks are given to CFE_handler function as inputs. This function can relocate the control to the nearest block from which re-executing the program corrects the CFE, and all of the affected variables between source and destination will be re-initialized. Fig. 2 (a) shows three basic blocks

(a) (b) Figure 1. Illustration of Added Instructions for Methods: (a):CRDC (b):CRMC

from the set of basic blocks of a thread in a program code as well as the DFG extracted from data dependencies among variables in these basic blocks. Fig. 2 (b) illustrates the process of the recovery used by the proposed technique. Regarding to them, if CFE occurred in basic block 2 and the control transferred from basic block2 to basic block3 (step 1 in Fig. 2(b)), then the values stored in variables X and Z cannot be reliable. These variables which are modified by the CFE (X and Z) are initialized in basic block1 and basic block2. For CFE and data errors recovery, the control should be transferred to basic block1 (step 3 in Fig. 2(b)). Therefore, the modified variables are reinitialized and their corresponding computations are reexecuted after this transmission. By re-executing the code from basic block1, the first value which was stored in variable Z is re-loaded again and after completing basic block1, in basic block2, the first value of X is re-loaded. B. The proposed CRMC technique Through the CRMC, the shadow variables always contain the true values of the original ones and they are used for recovering data errors and CFE. If shadow variables updated at the end of each basic block in which the corresponding original variables has been modified, a noticeable performance and memory overheads are imposed to the system. On the other hand, since thread interaction instructions such as synchronization or communication change some variables in different threads, these modifies should be considered in the proposed recovery technique. Therefore, the shadow variables are divided to two different shadows: 1) Local shadows: The contents of the local shadow are chosen by the application programmer with respect to the information provided by the DFG of the program. To reduce the imposed overheads due to shadow variables

updating, we specify a boundary of consecutive executed basic blocks which are free of thread interaction instructions as macro block for each thread. The local shadows are updated at the end of each macro block. Macro block size can be calculated as follow equation: Macro block size = (1) 2) Global shadows: The places where the global shadows are updated should correspond to a consistent state of the application. A miniaturized snapshot of entire system is saved at global shadows and it will be used when the global consistency needed. After detection phase, the control is transferred to CFE_handler function of CRMC. At this time, the signatures of the source and destination basic blocks of each thread are already available in SSj and DSj, respectively. These two values are used to determine which shadows should be considered. In the case that a total consistency is needed, the program should be updated with global shadows and resumed from that point. Otherwise, the function can update the affected original variables in the source and the destination with the local shadow ones. III. EXPERIMENTAL RESULTS In order to evaluate the proposed techniques, the GNU debugger GDB has been used to inject transient faults on the program in a multi-core system (CPU=i7-740QM, RAM=6GB, OS=Linux Ubuntu10). During our experiment we considered five multithreaded benchmarks (Quick Sort, Matrix Multiplication, Bubble Sort, Linked List and Fast Fourier Transform) to execute on multi-core processor system. About 5000 transient faults have been injected on several points in the basic blocks of the programs. The considered fault models were [10]:  Branch insertion: Replacing non-control instructions with control instruction  Branch deletion: Changing control instructions to NOP instruction.  Branch target modification: Modifying target address of a control instruction. Fig. 3 (a) shows the CFE detection latency observed during fault injection in terms of the number of instructions executed before the CFE detection. CFE detection latency of proposed techniques is slightly more than other techniques because they only use checking instruction at the end of each basic block. A comparison of the CFE recovery latency in different methods has presented in Fig. 3 (b). CFE recovery latency is equal to the time between fault detection and the time at which the CFE and data errors caused by the injected fault are recovered. Since the CRDC technique usually return the program flow to the several basic block before the place of CFE happening, it takes more time to program reaches to the point where CFE has occurred. Table 1 represents a comparison of associated overheads and error recovery coverage in different

(a) (b) Figure 2. CFG and DFG Generated from Program Code, (b): Scheme of CRDC Method

methods. The error recovery coverage is calculated as the ratio of recovered faults to the injected faults. Error recovery coverage= ×100 (2) As shown in Table 1, CRDC and CRMC methods have high error recovery coverage for multithreaded programs in compare to other methods. Thanks to multithread dependency graph consideration and new approach for CFE recovery, this improvement has obtained. The BTMR technique mainly concentrates on CFEs which are occurred on branch instructions. This approach decreases the recovery ability because a large number of instructions in program are non-branch instructions. The memory and performance overheads of the proposed techniques are lower than other previous works ([7], [10]). The memory/performance overhead of the ACCED is comparatively higher than the proposed techniques because of adding duplicated instructions and executing the set of instructions used for comparing the results to obtain correct output. Moreover, the memory and performance overheads of the proposed techniques are slightly increased, when the running threads of the programs increase. This is due to the utilizing different checkpoint level and concept of macro block in CRMC and using less recovery instruction at the CFE recovery phase in CRDC. Regarding to the Table 1, the quick sort performance overhead is more than other benchmarks in different CFE recovery techniques. It can be because of its recursive structure and consecutive call and return instructions.

# of instruction

(a) (b) Figure 3. Comparison of (a) CFE detection latency (b) CFE correction latency in different methods International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 57-62, 2003.

IV. Conclusions In this paper, two software-based CFE recovery techniques with tolerable memory and performance overheads and noticeable error recovery coverage in multithreaded programs have been proposed. To detect and recover CFEs and data errors, multithread dependency graph is utilized in proposed techniques. After CFE detection, the program flow is transferred to CFE_handler function. Regarding to the location of source and destination basic blocks of CFE in dependency graph and type of CFE, this function recovers CFEs and data errors. It has been shown the memory and performance overheads of proposed techniques are acceptable for most real-time embedded systems and their error recovery coverage are considerable.

[4] N. Khoshavi, H. R. Zarandi and M. Maghsoudloo “Control-flow Error Recovery Using Commodity Multi-core Architecture Features,” 17th IEEE International On-Line Testing Symposium, pp. 190-191, 2011. [5] R. Venkatasubramanian, J. P. Hayes and B. T. Murray, “Low-cost online Fault Detection Using Control Flow Assertions,” 9th IEEE International On-Line Testing Symposium, pp. 137-143, 2003. [6] R. Vemu and J. A. Abraham, “CEDA: Control-flow Error Detection through Assertions,” 12th IEEE International On-Line Testing Symposium, July, pp. 151–158, 2006. [7] N. F. Ghalaty, M. Fazeli, H. I. Radi and S. G. Miremadi, “Softwarebased Control-flow Error Detection and Correction Using Branch Triplication,” 17th IEEE International On-Line Testing Symposium, pp. 214-217, 2011. [8] C. Bolchini, A. Miele, M. Rebaudengo, F. Salice, D. Sciuto, L. Sterpone and M. Violante, “Software and Hardware Techniques for SEU Detection in IP Processors,” Journal of Electronic Testing Theory and Application, vol. 24, no. 1-3, pp. 35-44, 2008.

References [9] P. Bernardi, L. V. Bolzani, M. Rebaudengo, M. S. Reorda, F. Vargas and M. Violante, “On-line Detection of Control-Flow Errors in SoCs [1] D. Gizopoulos, M. Psarakis, S. V. Adve, P. Ramachandran, S. K. by means of an Infrastructure IP core,” 35th International Conference Hari, D. Sorin, A. Meixner, A. Biswas and X. Vera, “Architectures on Dependable Systems and Networks, pp. 50-58, 2005. for Online Error Detection and Recovery in “Architectures for Online Error Detection and Recovery in Multicore Processors,” Proceedings [10] R. Vemu, S. Gurumurthy and J. A. Abraham, “ACCE: Automatic of Design, Automation and Test in Europe, 2011. Correction of Control-flow Errors,” IEEE International Test Conference, pp. 1-10, 2007. [2] N. Oh, P. Shirvani and E. J. McCluskey, “Control-Flow Checking by Software Signatures,” IEEE Transactions on Reliability, vol. 51, no. [11] Gnu debugger. http://www.gnu.org/software/gdb/. 2, pp. 111- 122, 2002. [3] O. Goloubeva, M. Rebaudengo, M. R. Sonza and M. Violante, “Softerror Detection Using Control Flow Assertion,” 18th IEEE TABLE I. Comparison of the memory and performance overheads and error recovery coverage

Benchmarks Category

ACCED[10] Benchmarks

a

b

BTMR[7] c

M.O. (%)

P.O. (%)

E.C. (%)

M.O. (%)

QS MM BS LL FF

222.6 219.2 226.5 228.0 195.3

112.3 101.0 108.4 104.4 97.8

89.5 91.3 87.3 84.9 91.0

Avg. QS MM BS LL FF

218.3 232.0 231.7 238.1 242.6 217.5

104.7 119.2 117.5 115.9 113.8 102.5

Avg. 232.3 a. Memory Overhead b. Performance Overhead c. Error recovery Coverage d. Macro Block Size

113.7

DualThreaded Programs

QuadThreaded Programs

a

b

CRDC c

a

b

CRMC c

M. B.Sd (%) 4

M.O.a (%)

P.O.b (%)

E.C.c (%)

5 4 4 4

84.5 67.8 86.4 87.6 66.1

37.3 34.1 36.3 35.7 30.7

93.2 94.0 91.1 92.8 92.0

93.0 93.1 93.2 90.1 91.3 92.1

4 3 4 3 3 3

78.4 99.1 78.7 96.4 94.2 75.4

34.8 46.9 37.5 45.7 44.1 33.8

92.6 92.6 92.6 89.3 91.9 90.0

91.9

3

88.7

41.6

91.2

P.O. (%)

E.C. (%)

M.O. (%)

P.O. (%)

E.C. (%)

69.5 58.2 71.6 74.4 54.3

44.4 33.7 40.0 39.1 20.2

82.3 83.8 79.0 80.5 81.5

62.6 45.9 64.3 65.5 44.3

26.2 23.0 25.2 24.9 19.3

94.0 94.3 91.9 92.2 92.9

88.8 88.3 90.6 85.0 83.1 89.7

65.6 84.1 71.5 87.8 90.7 69.0

35.2 57.6 38.3 53.7 51.5 31.4

81.4 80.1 81.3 77.4 78.7 79.3

56.5 78.3 57.2 75.1 73.7 54.6

23.7 35.6 26.5 34.1 33.7 22.2

87.3

80.6

46.5

79.3

67.7

30.4