A Technique for Logic Fault Diagnosis of ... - Semantic Scholar

3 downloads 14706 Views 71KB Size Report
cause interconnects in a digital logic circuit to become open or highly ... 2. Interconnect open faults. In order to accurately simulate the effect of an open interconnect fault, the ..... candidate class's members matches the failing signature; that.
A Technique for Logic Fault Diagnosis of Interconnect Open Defects Srikanth Venkataraman Intel Corporation, Hillsboro, OR [email protected] Abstract A technique to perform logic diagnosis of defects that cause interconnects in a digital logic circuit to become open or highly resistive is presented. The novel features of this work include a diagnostic fault model to capture potential faulty behaviors in the presence of an open defect and diagnosis algorithms that leverage the diagnostic model while circumventing the need for detailed circuit–level (SPICE) simulation and extraction of parasitic capacitance. Other aspects of the technique include a path-tracing procedure to limit the number of interconnects that need to be analyzed and extensions for multiple defects. Experimental results include simulation results on processor functional blocks and silicon results on a chipset from artificially induced defects and production fallout.

1. Introduction Logic fault diagnosis or fault isolation is the process of analyzing failing logic portions of an integrated circuit to isolate the cause of failure to enable design or fabrication process modification to avoid similar failures. Opens, or breaks in the conducting layers of an integrated circuit are a common type of defect that occur in advanced IC manufacturing processes. They are caused by missing conducting material, poorly formed vias or contacts, or due to reliability issues like thermal stress or electromigration. This is especially true for current CMOS ICs with six or more metal interconnect layers. Further, vias and contacts are more susceptible to breaks. Thus, it is important to develop diagnostic procedures to address this defect mechanism. While the problem of diagnosing bridging faults has been extensively investigated [1,2,3,4], no such procedures have been reported in the literature for the diagnosis of open defects. To the best of our knowledge this work is the first reported result on the diagnosis of interconnect opens. Opens can be classified into different categories depending upon their location with a circuit [5,6]. An open can occur within a cell thereby affecting the connections to the drains and sources of the transistors within the cell [7], at the gate input of a transistor [8] or on the interconnect wiring connecting a set of logic gates to their drivers. In modern IC’s a significant area is taken by metal interconnects as compared to the routing within

Scott B. Drummonds Intel Corporation, Santa Clara, CA [email protected] a cell. Thus, there a greater likelihood of a break on an interconnect wiring as compared to a break within a cell. Therefore, the focus of this work is on the diagnosis of interconnect wiring opens. Section 2 summarizes the issues with accurately simulating an interconnect open fault. Section 3 describes a diagnostic fault model to capture potential faulty behaviors in the presence of an open defect, and diagnosis algorithms that leverage the diagnostic model. A path-tracing procedure to limit the number of interconnects that need to be analyzed and extensions for multiple defects are explained in sections 4 and 5 respectively. Finally, experimental results are provided that include simulation results on processor functional blocks and silicon results on a chipset which include artificially induced defects and production fallout.

2. Interconnect open faults In order to accurately simulate the effect of an open interconnect fault, the voltage of the floating interconnect wire needs to be determined under the application of any test vector. Konuk [7] described four factors that influence the voltage of an open interconnect wire and they are summarized here. These include: the total wiring capacitance from the floating wire to supply, adjacent signal and the substrate and wells; the transistor capacitances of the transistor in the cells driven by the floating wire; the trapped charge deposited on the wire during fabrication and the die surface. The determination of wiring capacitance requires at least 2D capacitance extraction [5] and is a function of the exact location of the open in the floating interconnect. Each via in the interconnect is a potential location for an open. However, there is no easy way of predicting where an open may occur on other pieces of the interconnect. This makes the problem of accurate and efficient simulation of interconnect open faults hard.

3. Interconnect open diagnostic fault model and algorithm To circumvent the difficulties of accurate open fault simulation, a diagnostic fault model that captures potential faulty behaviors in the presence of an open defect and efficient diagnosis algorithms that leverage the diagnostic model are

presented. The model and algorithm do not need detailed circuit–level (SPICE) simulation and extraction of parasitic capacitance, making them easily integratable into standard gate level logic test and simulation tools.

3.1. The net diagnostic model Consider an open defect affecting an interconnect line as shown in Figure 1. The logic description of the interconnect is a net that has one stem (A) and three branches (B, C and D). Under the application of a test vector the only logic errors that can be caused by an open on the interconnect are a 0/1 (good circuit logic value / faulty logic value) or a 1/0 error on one or more branches of the net. The logic value of a gate input is said to be controlling if it determines the gate’s output value regardless of other input values [11]. Other inputs of the gate are said to be noncontrolling. Definition 1: When the side inputs of the gates driven by the floating interconnect (inputs other than the branch inputs) are at non-controlling values, then the output of the gate is sensitized to the branch input. Such outputs are defined as sensitized branch outputs. Error propagation Error on a subset of branches B A X X (0/1 or 1/0) (0/1 or 1/0) X C (0/1 or 1/0) A = Stem B, C, D = Branches X D (0/1 or 1/0)

Figure 1. Logic errors from interconnect open For example, in Figure 1, the output of the AND gate driven by the branch D is sensitized to the branch D when the other side input of this gate has a non-controlling value of 1. Observation 1: On the application of any test vector, due to an interconnect open, fault effects can propagate from a subset of the sensitized branch outputs. Depending on the location of the open only a subset of the branches may be disconnected from their driver. For example, in Figure 1, the open disconnects branches B and C from the driver driving stem A. On the application of any test vector, in the presence of an open interconnect, the voltage of the floating wire is determined by the factors summarized in section 2. This voltage may be interpreted as a faulty value by one or more of the sensitized branch outputs, and the faulty value propagates

downstream in the circuit. Alternately, even if all driven gates are disconnected from the driver, the fault effect may only propagate through a subset of the sensitized branch outputs depending on the voltage of the floating wire. For each vector simulated, all possible subsets of branches need to be investigated to compute potential erroneous responses that could be observed at outputs. This would require considering 2n-1 ( nCn + nCn-1 + nCn-2 + … + nC1) possibilities for EO =

Y

i=1,…,6

EOi B

A X

X X C

EO1

EO3

EO5

A (0/1)

B (0/1)

C (0/1)

A (1/0)

B (1/0)

C (1/0)

EO2

EO4

EO6

a n-fanout net. This may be large for large fanout interconnects. Figure 2. The net diagnostic model To address the above problem we propose a new diagnostic model called the net diagnostic model that captures potential errors that can be caused on the observation points (primary outputs, scan cells or observation registers) of a logic circuit in the presence of an open defect on an interconnect. Consider the logic net ABC shown in Figure 2. On each simulation cycle the logic errors on the circuit outputs caused by a 0/1 error at the location A, B and C are recorded in the erroneous observation (EO) sets EO1, EO3, and EO5 respectively. Similarly, the 1/0 error at locations A, B and C are captured in sets EO2, EO4, and EO6 respectively. The diagnostic signature EO for the node A is then computed as the union of the sets EO1, EO2, EO3, EO4, EO5, and EO6. The set EO captures possible erroneous outputs in the presence of an open fault on the net ABC. Note that not all outputs in the set EO will actually be faulty in the presence of an open on net ABC. Also, since multiple errors on branches are not simultaneously simulated, some errors which cause propagation through multiple reconvergent paths may not be captured. The set EO can be relatively easily determined by modifying standard logic and fault simulation techniques without the need for parasitic capacitance information or circuit (transistor) level simulation. The approach used here is similar to the notion of a composite signature [1,3] used for bridging fault diagnosis. The idea behind the net model is to construct a superset of possible faulty behaviors when signal ABC is faulty.

A X

Bs-a-0 X

Ds-a-1 X

Cs-a-0

X X

Es-a-1

Fs-a-1 X

X

Gs-a-0

Equivalence Class of Stuck-at Faults

Figure 3. Improvement over stuck-at diagnosis Contrast this approach to a simple stuck-at diagnosis. Stuckat faults can be simulated on the stem or branches of a net and only cause one type of binary error (a 0/1 or 1/0 error) at the fault location. However, an open defect a net can cause two types of binary errors (0/1 or a 1/0) on a subset of branches. Thus, neither the stuck-at on the stem nor those on the branch would individually describe all observed failures. In the best case a simple stuck-at diagnosis may identify either the stem faults, or one or more branch faults. However, the result would also include all other stuck-at that are in the same equivalence class associated with this fault. As will be shown in the experimental results this grouping of equivalent faults may have a large number of stuck-at faults and inspecting this grouping to find a combination of faults that explains all failing patterns is not feasible by hand. This leaves the failure analysis engineer guessing where to restrict the physical inspection to locate the defect. For example, as shown in Figure 3, even if a stuck-at diagnosis identified the branch it would also include five other equivalent faults. A comprehensive diagnostic capability requires both high accuracy and resolution. Since neither a stuck-at fault on the stem nor a stuck-at fault on the branches may explain all observed failures due to an open interconnect, a simple stuck-at diagnosis is not expected to be accurate. Further, even if a simple stuck-at diagnosis is accurate it is not expected to have a high resolution due to fault equivalence. In contrast, it will be shown in the experimental results that the proposed net model has both high accuracy and resolution. The net model provides a mechanism to break fault equivalences. For example, consider an AND gate. The stuck-at 0 fault on the input is equivalent to the stuck-at 0 fault on the output. However, the stuck-at 1 fault on the input is not equivalent to the stuck-at 1 on the output and is distinguishable from it. Thus, errors on the outputs of the AND gate are distinguishable from errors on its inputs.

3.2. The diagnosis matching algorithm A diagnostic fault under consideration is called a candidate. In addition to the diagnostic models, diagnostic algorithms that

determine the likelihood of a candidate fault being associated with the defect are needed. This is performed by computing diagnostic counts. Figure 4 illustrates the concept. Consider the failing response observed on the tester (labeled as observed in the figure) and the simulated response of the candidate fault. The two sets of responses can be considered as set of tuples, where each tuple is a pair of the failing vector number and failing observation point (scan latch, primary outputs or observe only cells). The relationship between the two sets is captured as diagnostic counts. A failing tuple unobserved by the tester, but failing during simulation is said to be a misprediction. A tuple that failed on the tester but did not for simulation is called a nonprediction. Failing tuples that agree between tester data and simulation data are called intersections [3]. The diagnostic counts are weighted based on the following argument. Since the candidate response is a superset of possible behavior, the matching between the observed and candidate responses looks for a containment of the observed response within the candidate response [3]. Thus the intersection and nonprediction counts are weighted high, while the misprediction count is weighted low. Nonprediction Intersection

Misprediction

Candidate Signature (EO) Observed Failures (EO’ )

Figure 4. Set operations for diagnostic counts Two approaches to perform automated fault diagnosis include static techniques that are fault dictionary based, a database of stored response of faults under the application of a test [1,3,9], and dynamic techniques that are simulation based [2,4,10]. The diagnostic model and algorithms that were described are applicable to both static and dynamic approaches. An optimization to enable dynamic diagnosis using a pathtracing procedure is described in the next section.

4. Path-tracing to reduce candidate selection The path-tracing procedure identifies logic nets in the circuit that are potentially associated with an open interconnect defect by analyzing logic values using good circuit logic simulation to prune out defect locations. The path tracing procedure is performed in conjunction with a good circuit logic simulation and a list of failing observation

points (primary outputs of the circuit or scan cells). The procedure proceeds as follows [4,10] and is similar to the critical path tracing procedure [11] with some differences [4]. For each failing vector, trace back through controlling values from the failing observation point through the combinational logic and add those nets to the set of possible defect locations. Where there are no controlling values at a gate, all inputs must be traced and added to the potential defect location list and where multiple inputs have controlling values, all of the inputs with controlling values must be traced. When a branch is reached, the stem is included.

4.1. Guaranteed potential fault location coverage Observation 2: For any single-location or multiplelocation open defect, the path trace procedure is guaranteed to include the location(s) of the faulty interconnect(s). The above observation is explained with examples. When tracing from a gate’s failing output, there are three possible scenarios. 1 1

0

0

0 1

0

1

1

fault effects from multiple locations. In either case, tracing all controlling inputs guarantees reaching all the faulty locations. 1 1

0

0

0

0 0

1 Figure 7. Multiple controlling values.

5. Extensions for multiple faults The following candidate set partition algorithm divides independent potential fault locations into separate candidate classes with an added benefit of greatly reducing simulation time. Definition 2: A Candidate Class is a subset of the candidate universe that matched the failing signature for one particular test vector. Our notion of a candidate class is useful in reducing the simulated candidate list from the first failing pattern on. Once a fault class is chosen from the first failing pattern, only the members of that class are simulated for the remainder of the test vectors. Using this technique, defects that closely match predicted behavior will be rapidly diagnosed.

Figure 5. Single controlling value Figure 5 demonstrates the case where the gate containing the failing output has exactly one input at its controlling value. In this case, the fault effect propagated through that input and the path trace procedure should follow accordingly.

0 1

1

1

0

1 1

1 Figure 6. No controlling value The path tracing procedure is conservative in the case of Figure 6. Here, none of the failing gate’s inputs are at their controlling values. There is no way to know which path the fault propagated through, therefore, all inputs must be traced. The gate driving the failing output in the circuit in Figure 7 has multiple inputs at their controlling values. This situation can only occur when that gate is the point of reconvergent fanout from a fault at a single location or due to convergence of

Figure 8. Candidate set partitioning algorithm

Definition 3: A Candidate Cover exists when some number of candidate classes explains the entire set of failing patterns. A failing pattern is considered explained when one of the candidate class’s members matches the failing signature; that is, the intersection includes all observed failures for that test vector.

Through application of the algorithm in Figure 8, a set of candidates that explains all failing patterns is derived. The set of all candidates is partitioned by candidate classes which enables the recognition of multiple independent faults. The algorithm will divide the universe of possible open defects into classes of faults that behave similarly. Independent faults will be left in separate groups in the class_list and each class in the list will contain independent defect locations. Patterns (X denotes failing pattern, O denotes pattern explained by class)

Complete Fault List Candidate Class 1 Candidate Class 2

Figure 9. Candidates simulated over patterns Figure 9 illustrates the speedup that can be obtained using the candidate partitioning algorithm. Observe that the entire candidate list need only be simulated through the first failing pattern. At that point, the first candidate class is created that has been observed to explain the first failing vector, and those candidates are simulated through the last failing test vector to see what other failures they can account for. In this case, there were two vectors that class 1 could not explain, and a second iteration of the algorithm successfully obtained a second class that explained all remaining failures. Note that candidate classes tend to be a great deal smaller (multiple orders of magnitude) than the entire candidate universe. Calculating the percentage of patterns simulated using the entire fault list can be used as an approximation of the speedup in simulation time. In the above case, the entire candidate list was only simulated for three out of twelve vectors, reaching a speed-up of 400%. Additionally, two classes have been created which recommends two separate defect locations to the FA engineer. The entire diagnosis procedure is summarized as follows: failures are collected from the defective chip using the tester. Good circuit simulation is performed on the gate-level circuit and the path-tracing procedure reduces the potential defect location list. That list is simulated using the fault partition algorithm, which will only simulate a small portion of all candidates for most test vectors. The end result is a list of candidates ranked by their ability to explain failing patterns. Independent defects are separated for the FA engineer to inspect.

6. Experimental results The above concepts were implemented into a diagnosis tool built on top of a gate-level sequential circuit differential fault simulator. Experiments have been performed on industrial circuits to justify the efficiency and correctness of the presented concepts. Table 1. Simulation results from processor functional blocks Circuit agadderc albindd bbmid bttestd fmdinald mold roipaddd roptrd

Gate Pattern Avg. Sim. Count Count Time (s) 13919 11064 93 762 10350 13 3804 12550 158 397 9601 23 5515 50652 485 3824 14722 149 7906 378 64 1355 11598 386

Table 1 demonstrates the average diagnosis time on some functional blocks used in the Pentium II© microprocessor. Using simulation-created single and multiple failures diagnostic runs were performed. The above represents average time to correct diagnosis and shows the feasibilty of application to real life circuits. The pattern sets were created from functional tests used to test the blocks during design. Table 2. 440BX diagnosis simulation time Circuit 440 BX

Gate Count 235,572

Pattern Count 2703

Avg. Sim. Time (s) 393

In Table 2, the average diagnosis time for diagnosing real silicon defects is presented on the Intel® 440BX© chipset. The 440BX© chipset is tested using scan and has 2703 scan patterns in the test set. Five single net opens were injected using a focused ion beam (FIB) and the correct diagnosis was obtained as shown in Table 3. Table 3. 440BX injected silicon defect experiments Failing Class Top Correct Defect Patterns Count Candidates Diagnosis? Open 295 1 2 Yes Open 11 1 1 Yes Open 321 1 2 Yes Open 64 1 3 Yes Open 281 1 1 Yes

The technique has been employed on production defects. Table 4 shows a comparison with stuck-at diagnosis for open defects. Four chips were selected where stuck-at diagnosis followed by failure analysis identified the defect as open. In the four cases shown, it is clear that the net model produces a better resolution than a simple stuck-at diagnosis. Table 4. Comparison with stuck-at diagnosis Defect Found

Simple Stuck-at #Nodes

Via Contact Via Break

6 10 7 28

% Fails Explained 100 96.8 98.8 85.7

Net Diagnosis #Nets 1 1 1 1

% Fails Explained 100 100 100 85.7

7. Summary and conclusions This work demonstrates the feasibility and applicability of performing automated FA/FI of interconnect open defects in logic. A diagnostic fault model that captures potential faulty behaviors in the presence of an open defect on an interconnect and diagnosis algorithms that leverage the diagnostic model were presented. The technique circumvents the need for detailed circuit–level (SPICE) simulation and extraction of parasitic capacitances, and is easily integratable into conventional test and simulation tools. Other aspects of the technique include a path-tracing procedure to limit the number of interconnects that need to be analyzed and extensions for multiple defects. The new techniques have been shown to work through extensive simulation on processor functional blocks and through diagnosis of real silicon of both production defects and artificially induced defects on a chipset product. The tool is being productized within Intel for wider application in high volume manufacturing.

G

Acknowledgements The authors would like to thank Eric Thorne for his help in implementation, Bobby Feldhousen of Fab 15, Betty Buck and Mike DeVargas of Fab 11, Kam Komeyli of D2, Xiaoping Shao of STTD and Jose Pizano, Pete Johnson and Chris Wagner of PCG for their effort on the data collection and validation of diagnostics on the 440BX in Fab15, Fab11 and PCG.

References [1] S. D. Millman, E. J. McCluskey, and J. M. Acken, “Diagnosing CMOS Bridging Faults with Stuck-at Fault

Dictionaries,” in Proc. of the IEEE International Test Conference, pp. 860-870, Oct. 90. [2] S. Chakravarty and Y. Gong, “An Algorithm for Diagnosing Two-Line Bridging Faults in CMOS Combinational Circuits,” in Proc. of the Design Automation Conference, pp. 520-524, June 1993. [3] D. B. Lavo, T. Larrabee, and B. Chess, “Beyond Byzantine Generals: Unexpected Behavior and Bridging Faults Diagnosis,” in Proc. of the IEEE International Test Conference, pp. 611-619, Oct. 96. [4] S. Venkataraman and W. Kent Fuchs, “A Deductive Techinque for Diagnosis of Bridging Faults,” in Proc. of the IEEE/ACM International Conference on Computer-Aided Design, pp. 562-567, Nov. 97. [5] H. Konuk, “Fault Simulation of Interconnect Opens in Digital CMOS Circuits,” in Proc. of the IEEE International Test Conference, pp. 597-606, Nov. 97. [6] H. Konuk and F. J. Ferguson, “Oscillation and Sequential Behavior Caused by Interconnect Opens in Digital CMOS Circuits,” in Proc. of the IEEE/ACM International Conference on Computer-Aided Design, pp. 548-554, Nov. 97. [7] C. Di and J. A. G. Jess, “On Accurate Modeling and Efficient Simulation of CMOS Opens,” in Proc. of the IEEE International Test Conference, pp. 875-882, Oct. 93. [8] V. H. Champac, A. Rubio, and J. Figueras, “Electrical Model of the Floating Gate Defect in CMOS IC’s: Implications on IDDQ Testing,” in IEEE Transactions on Computer Aided Design, pp. 359-369, March 1994. [9] V. Boppana, I. Hartanto, and W. K. Fuchs, “Full Fault Dictionary Storage Based on Labeled Tree Encoding,” in Proc. of the IEEE VLSI Test Symposium, pp. 174-179, Apr. 96. [10] S. Venkataraman, I. Hartanto, and W. K. Fuchs, “Dynamic Diagnosis of Sequential Circuits,” in Proc. of the IEEE VLSI Test Symposium, pp. 198-203, Apr. 96. [11] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital System Testing and Testable Design, AT&T Bell Laboratories and W. H. Freeman and Company, 1990.