IDDQ Trending as a Precursor to Semiconductor Failure

2008 INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT

IDDQ Trending as a Precursor to Semiconductor Failure Guangfan Zhang1, Diganta Das2, Roger Xu1, and Michael Pecht1

Abstract— Airborne electronic systems have been used virtually everywhere on board military and commercial aircraft. Since Field Effect Transistors (FETs) are building blocks for the electronic systems and their components, the diagnosis and prognosis of potential FET system failures are critical to the flight and ground crew. In this paper, we developed an advanced prognostic methodology based on the Direct Drain Quiescent Current (IDDQ) testing technique for potential Field Effect Transistors (FETs) failures. To predict the Remaining useful life (RUL) of the FET-based devices, a thorough failure mechanism study for FETs was performed in order to select a subset of failure mechanisms that cause progressive degradation and relate with IDDQ signals. With the selected failure mechanisms, we utilized the symbolic dynamics-based method to perform the fault degradation status estimation and a novel Uncertainty Adjusted Prognostics (UAP) method to predict the RUL with uncertainty management. Finally, the prognostic methodology was verified using developed 2-D/3-D simulation models.

Index Terms— Field Effect Transistors, Direct Drain Quiescent Current, Prognostics, Remaining Useful Life

I. INTRODUCTION The health status of metal oxide semiconductor field effect transistors (MOSFET), including the current health condition, remaining useful life (RUL), and related prediction uncertainty, are required to assist in the operation of electronic systems. This information also helps in obtaining safety and cost benefits. For performing MOSFET health management, it is important to understand how and why a device fails. A failure mode is defined as what the user of the device will see when it fails or what the device will exhibit when it fails. There are many failure modes that are specific to MOSFET operation, in addition to those that are applicable to all semiconductor devices. Those specific modes include ones within the MOSFET device itself, and those that occur due to packaging. Many underlying mechanisms can contribute the same failure mode.

1

Intelligent Automation, Inc., Rockville, MD 20855 Center for Advanced Life Cycle Engineering (CALCE), University of Maryland, College Park, MD 20742, USA ([email protected]). 2

9778-1-4244-1936-4/08/$25.00 © 2008 IEEE

A failure precursor is an event or series of events that is indicative of an impending failure [1]. Precursor parameters can be identified based on factors that are crucial for safety, that are likely to cause catastrophic failures, that are essential for mission success, or that can result in long downtimes. A systematic method for parameter selection is accomplished through the analysis of failure mechanisms [2]. Electrically, the MOSFET device can degrade over time. When this happens, the typical failure modes are an open circuit, a short circuit, or operation outside the range of application specifications. In opens and shorts, the output of the device is severely changed and the result is an electrical failure. Operating outside the range of specifications is usually attributed to electrical parameter degradation. There are numerous modes associated with going out of specification, but most of them can fall into the categories of loss of gate control, increased leakage current, or a change in the on-resistance (Rds(ON)). Device failures can be broken down into four different components: (1) the failure mode, (2) the cause of the failure mode, (3) the failure mechanism, and (4) the failure model. The complexity of the MOSFET device and packaging lends itself to a number of possible mechanisms leading to failure. Table 1 gives an example of some failure modes, causes, and mechanisms. Although it is not an exhaustive list, it serves as a useful guide to MOSFET failures. Among the failure modes listed in Table 1, increased leakage current can be frequently found in degraded FETbased electronics. In addition, environmental stresses, such as temperature, radiation, or age, can accelerate degradation or cause damage resulting in elevated leakage current [3][4][5][6]. There are two types of power supply currents: quiescent current (IDDQ) and transient or dynamic current (Iddt). IDDQ is a leakage current drawn by a CMOS circuit in a stable (quiescent) state, while Iddt is a supply current produced during a transient period. Iddt testing has not been found practical in industry due to the difficulty of extracting useful information from fast changing current responses. On the other hand, the IDDQ testing techniques are based upon the fact that defective circuits produce an abnormal, or at least significantly different, amount of current compared to the current produced by fault-free circuits [7]. Therefore, IDDQ testing techniques have been widely adopted by semiconductor manufacturers as a method of detecting die-level failure mechanisms such as

gate oxide shorts or breakdown, both of which are likely to become more important as device dimensions continue to shrink. IDDQ testing techniques have potential for electronics prognostics because an upswing in IDDQ is indicative of part degradation.

Temperature FET-based Electronics

Stresses

IDDQ Testing

Corrosion

RUL Prediction 1000

Table 1. A non-exhaustive list of failure modes with causes and mechanisms Failure Modes Electrical Short

Open Circuit

Increased leakage current

Onresistance change

Loss of Gate Control Device Burnout

Failure Cause

Failure Mechanisms

Temperature cycling, high speed operation, environment, defects/impurities, Temperature cycling, high speed operation, flexing, environment Manufacture process, Overvoltage, high current densities, device design Manufacture process, temperature cycling, high current densities, Defects/impurities, High current densities, manufacture process Overvoltage, high current densities, manufacturing process

Intermetallic compound formation (IMC) formation, Cracking/Voiding, electromigration, Single event gate rupture (SEGR), Time dependent dielectric breakdown (TDDB) IMC formation, Cracking/Voiding, electromigration, conductor burn-out, wirebond lifting, thermal fatigue TDDB, hot electron

Cracks/voids, electromigration, thermal fatigue, IMC formation

Breakdown, latch-up, Single event breakdown (SEB), TDDB, electromigration, SEGR Latch-up, SEB, SEGR, low gate voltage device burnout

In this paper, we show that precursors to the failure by different mechanisms can be related to the IDDQ trends. We identified those failure mechanisms and related them to corresponding IDDQ trends. Also, we utilized the symbolic dynamics method to perform fault degradation status estimation and applied a novel Uncertainty Adjusted Prognostics (UAP) method to predict the RUL using uncertainty management. These prognostic methods can address the critical failure mechanisms and also match them with IDDQ signals, values, and trends. This paper is organized as follows. Section II defines the prognostic architecture. The IDDQ testing technique, including the failure mechanisms related to the IDDQ, failure models, and the testing theory, is studied in Section III. Section IV describes the prognostic methods. The simulation model is explained in Section V. Finally, the IDDQ-based prognostic framework is illustrated by two examples in Section VI, and Section VII contains some concluding remarks. II. IDDQ TRENDING-BASED PROGNOSTIC ARCHITECTURE We developed a novel prognostic methodology based on the IDDQ (Direct Drain Quiescent Current) testing technique for FET-based electronic systems and their components. The architecture is shown in Fig. 1.

500

0 -4

-2

0

2

Feature Extraction/ Selection

Uncertainty Adjusted Prognostics (Prediction Adjustment/ Uncertainty Management)

Hidden-Markov Model based Degradation Status Estimation (Symbolic Dynamics)

Fault Size Estimation

4

Fig. 1. IDDQ trending-based prognostics An Increase in leakage current is a good indication of the degradation of FET-based electronic systems. This architecture is based on an IDDQ testing method that relies on the fact that defective circuits produce an abnormal, or at least significantly different, amount of current compared to the current produced by fault-free circuits. An extensive feature library has been built to make it possible for accurate and reliable diagnosis and prognosis by considering IDDQ-related features. This is necessary because decreasing semiconductor feature sizes (90 nm and below) and the corresponding increase of gate density are causing IDDQ levels to increase, thus making it more difficult to set absolute detection thresholds. Proper features must be selected for proper prognosis, such as the in-circuit trend in IDDQ, the slope of the trend, and IDDQ ratios, among other features. In this architecture, symbolic dynamics [8] [9] is employed for anomaly detection and fault size estimation. Symbolic dynamics is used to describe the dynamics of the system quantitatively (e.g., by coarse graining) in terms of symbol sequences [10] [11]. As a result of this process, fault evolution trending information can be obtained. A novel prognostic algorithm, Uncertainty Adjusted Prognostics (UAP) [17], has been implemented in this paper. UAP makes predictions based on the trending information from carefully selected fault signatures and integrates a unique uncertainty management method. In UAP, different types of uncertainties, such as model uncertainty, parameter uncertainty, threshold uncertainty, and health status uncertainty, are considered. By considering different types of prognostic uncertainties, the prognostic method is able to reduce prediction uncertainty and improve prediction performance.

III. IDDQ RELATED TO FAILURE MECHANISMS Different failure mechanisms affect MOSFETs. We list failure mechanisms that can affect a MOSFET device in Table 2.

Table 2. Failure Mechanisms for MOSFETs Static latchup Single Event Intermetallics Breakdown Formation (SEB) Dynamic Latchup Single Event Gate ThermoRupture (SEGR) mechanical stresses Time Dependent Electrostatic Electromigration Dielectric Discharge (ESD) Breakdown (TDDB) Conductor BurnExceeding Drain- Low Gate out source Voltage Device breakdown Burnout voltage Latchup Gate-source Hot Carrier breakdown Effects voltage Of these mechanisms, only the failure mechanisms that can affect the current at the die level are of interest in merging the results with IDDQ. Among these mechanisms, only the mechanisms that cause progressive degradation, not overstress due to a single surge event, are useful in prognostics. From these constraints, only TDDB and hot carrier effects are of interest in evaluation for IDDQ. However, for proper prognosis, it is not useful to simply deal with times to failure. We need to be able to predict failure based on recorded observations and not simply make an a priori calculation of time to failure. For that reason, we need to make an assessment of how these mechanisms affect the currents on the transistor and what the failure models are for TDDB and hot carrier effects. 1)

TDDB Model

Time-dependent dielectric breakdown (TDDB) is the gradual change in a transistor’s gate control behavior. These changes can include leakage currents and voltage shifts. We will examine some of the physics behind the individual gatemodifying effects, as well as aggregate statistical models for MOSFET time to failure due to TDDB. A. TDDB Physical Effects Models

First, we will consider gate voltage shift due to fixed charges distributed (in one dimension) in the gate oxide (in this case accumulated through defects over time): ⎛ oxide ⎞ 1 ⎟⎟ ≈ − ΔVG ⎜⎜ ∫ x ⋅ ρox (x )⋅ dx (1) charges ε ⎠ ⎝ ox where εox is the dielectric permittivity of the gate oxide, ρox is a one-dimensional approximate charge distribution in the gate, and x is the distance from the gate electrode to the silicon. Thus, positive charges will result in a reduction in the apparent gate voltage, while negative charges will increase it. For certain charges that only build up at the oxidesemiconductor interface, the equation simplifies to:

Q ⎛ interface ⎞ ⎟⎟ ≈ − ic ΔVG ⎜⎜ C ox ⎝ charges ⎠

(2)

where Qic is the total charge present, and Cox is the gate capacitance. It should also be noted that ΔVG can also be viewed as an equivalent fixed change in the turn-on threshold voltage (Vth).

B. TDDB Time-to-Failure Models

There are two different models for describing TDDB timeto-failure (TF). The first model, denoted as the E-model (proportional to the electric field), is based on a dipolar field lowering of the activation energy required for thermal bond breakage. The E-model is [12]:

⎡ ΔH 0 − a ⋅ Eox ⎤ TF = A0 exp ⎢ ⎥ kT ⎣ ⎦

(3)

where A0 is a constant determined experimentally, ΔH0 is the activation energy associated with the E-model, a is the effective dipole moment (~13 eÅ), E is the electric field across the dielectric layer (the dielectric in the case of a silicon power MOSFET is silicon dioxide), k is Boltzmann’s constant, and T is temperature. The second model, denoted as the 1/E-model (proportional to the inverse of the electric field), is based on current-induced hole injection into the oxide (Fowler-Nordheim conduction). The time to failure using this model is shown in (4) [12].

⎡G ⎤ ⎡Q⎤ TF = t0 exp ⎢ ⎥ exp ⎢ ⎥ ⎣E⎦ ⎣ kT ⎦

(4)

where t0 is a constant determined empirically (1x10-11s for SiO system), G is the field acceleration factor (350 MV/cm for Si-O system), E is the electric field across the dielectric layer (typically oxide), and Q is the activation energy associated with the 1/E-model. McPherson et al. showed that both models predict the time to failure well for power devices with electric fields above 7 MV/cm. Below 7 MV/cm, however, where typical MOSFETs operate, the E-model has better prediction power than the 1/Emodel. 2)

Hot Carrier Model Many models exist today to predict the lifetime of MOSFETs that exhibit the hot electron effect. These models are typically based on the specific MOSFET topology, operating range, and environment. A more recent model that is based on the reduction of the inversion layer mobility due to the generation of interface states was developed by Chung et al. [13][12]. This empirical model is of the form in (5).

⎛I τ × ⎜⎜ Drain ⎝ Weff

⎞ ⎟= ⎟ ⎠

⎛ I A⎜⎜ Sub ⎝ I Drain

⎛ ΔI D / I DO ⎞ ⎟ A≈⎜ ⎜ K (1 / L )C ⎟ eff ⎝ ⎠

⎞ ⎟⎟ ⎠

−3

1/ n

(5)

where τ is the measured device lifetime and ΔID/IDO is the amount of the linear-current degradation, which ultimately determines the failure. The quantities C (the type of gate oxide, the quality of the Si-SiO2 interface, and the exact structure of the gate-drain overlap region) and n can be determined from the accelerated stressing of MOSFETs. IDO is the drain current of an unstressed device, ΔID is the difference in the drain current between the stressed and unstressed devices, IDrain is the current of the stressed device, Leff is the effective gate length, and K is a set of measurable parameters. This model allows for the prediction of the hotelectron lifetime of devices with different Leff and Tox (gate oxide thickness) values using a limited sample size. There are numerous simple and complex models that have been developed to predict the effects of hot-electrons. They vary on the basis of the device topology, materials, defects, operation range, and environment. We observe that there are some operational (e.g., electric field, gate current) and some environmental (e.g., temperature) parameters that affect failure mechanisms. The closed form equations used in the estimate of times to failure typically work single, and often worst case, values of the parameters. Prognostics that can monitor the actual values of those parameters during the operation of a FET will allow us to better estimate and continually update the expected time to failure. IV. PROGNOSTIC METHODS A. Symbolic Dynamics-based Fault Size Estimation Before performing RUL prediction, the fault degradation status needs to be estimated. We utilized Symbolic Time Series Analysis (STSA) [14] [15] to estimate the fault size based on the collected time series data. Model-based methods may not always be feasible due to unknown parametric or non-parametric uncertainties and noise. A convenient way of learning the dynamical behavior is to rely on the additional information provided by the time series data. As shown in Fig. 2 [14], bursts of data are collected. The fast sampling during a burst allows for the capturing of fast system dynamics. Comparison of data from burst to burst allows for the capturing of slowly developing anomaly dynamics. The tool for behavior description of nonlinear dynamical systems is based on the concept of formal languages for transitions from smooth dynamics to a discrete symbolic description. The phase space of the dynamical system is partitioned into a finite number of cells so as to obtain a coordinate grid of the space. A compact (i.e., closed and bounded) region Ω in the phase space is identified, within

which the stationary part of the phase trajectories can be circumscribed. Encoding of Ω is accomplished by introducing a partition β≡ {B0, …, Bm−1} consisting of m mutually exclusive and exhaustive cells. The dynamical system describes an orbit O ≡{x0,x1,…,xn,…} in Ω, which passes through or touches the cells of the partition B.

Fig. 2. Continuous dynamics to symbolic dynamics [14][16] With the partition defined, the time-series data can be converted to a symbol sequence and can be used to construct a finite-state machine. We used the D-Markov machine introduced in Ray [16] to identify patterns in the time series data and generate the fault-size estimation based on the assumption that the symbolic process can be represented to a desired level of accuracy as a Dth order Markov chain. B. Uncertainty Adjusted Prognostics Due to the nature of uncertainty, the prediction of RUL is the Achilles’ heel for the Prognostics and Health Management (PHM) community. Currently, many prognostic algorithms have been developed, such as model-based approaches, datadriven approaches, and life-consumption-based approaches. However, very few have been successfully implemented into real systems. One of the most important reasons for this is the lack of the understanding of the uncertainties of prognostics. To address prognostics uncertainties, we implemented a novel approach, called Uncertainty Adjusted Prognostics (UAP) [17], which deals with model uncertainty, parameter uncertainty, threshold uncertainty, and health status uncertainty. By integrating different types of uncertainties and using extrapolation techniques, the UAP algorithm is able to perform RUL prediction, and the output of the algorithm will be a baseline prediction of fault evolution and a distribution of the predicted time-of-failure information. V. SIMULATION MODELS The defects that are created in gate oxide can be due to several mechanisms including trap creation, anode hole injection, and impact ionization. All these mechanisms can cause electron traps, fast and slow interface states, and generation/recombination centers. In our simulation, we did not distinguish between the different mechanisms of the defects or the different types of defects. We have selected a method of simulation to aggregate the defect generation as being proportional to the total charge that has passed through the oxide in the form of gate current. Typically, the analysis for times to failure by TDDB is estimated when the number of defects reaches a “critical” value. However, it is well understood that the number of

defects are distributed in the oxide in a “random” manner; and only when a path is created between the channel and the gate is there a sudden change in the overall conductivity of the oxide that leads to changes in gate current. We have assumed in this simulation that the gate current is a contributor to IDDQ. We have built the simulation models in both 2-D and 3-D models. Since the 3-D simulation model is conceptually similar to the 2-D simulation model, we will use the 2-D simulation model to explain the simulation modeling process and will present the results with the 3-D simulation model. The two dimensional model can be represented by a maze structure representing the oxide that grows from a structure, like the one on the top, to the structure on the bottom in Fig. 3. Each completed path is a path for gate current that is also measured as IDDQ on the semiconductor.

There are differences and similarities between SBD and HBD. Since the thick oxide will become effectively thin with the increase in the number of defects, it is worthwhile to make some of the failures soft, based on conditions at the time of occurrence. That may lead us to have SBDs as precursors of HBD. There are three possible cases: 1. 2.

3.

The first SBD is a precursor for the final HBD HBD occurs at the same location as SBD SBD and HBD are competing events with different origins Different SBD and HBD statistics for 1st events Different voltage acceleration and activation energy SBD and HBD events have a common failure mechanism SBD and HBD statistic distributions coincide, indicating the same defect generation and breakdown triggering process SBD and HBD manifest themselves differently depending on breakdown spot size

In this simulation we prove that we can find that, depending on the stochastic nature of the part creation, all possible situations of soft breakdown (SBD—which we consider to be a precursor) and hard breakdown (HBD—final failure) can occur. We have proved that a simple two-dimensional simulation can reveal the interrelation between the breakdowns. The goal in our simulation was to determine the number of steps it takes to get from the first soft breakdown to the hard breakdown and determine the statistical properties of that time gap. Once the concept was proven, we assumed a three-dimensional structure where the path initiation and completion happens in a rectangular parellelopoid structure. Fig. 3: Growth of Defect Structure [18] We have evaluated all the inputs that go into the estimated Fowler-Nordheim current. For the sake of simplicity and the need to make the case for precursor-based prediction of times to failure, we have not included the effects of variations in flat band voltage, the surface potential of the well, or voltage drop on the polysilicon, on the electric field across the oxide. We have also at this stage kept the simulation two dimensional and assumed that all the paths within the oxide are planar. In the future, we will evaluate the effects of three dimensions on the creation of path. However, since both defect creation and path creation are considered in the same plane, we estimate that the error introduced by this simplification will not cause a major error. Hard breakdown (HBD) is a low-ohmic post-breakdown path created by thermal runaway after a percolation path is completed. Soft breakdown (SBD) generally occurs for thin oxides