A novel algorithm for high-throughput programming of ... - IEEE Xplore

3 downloads 793 Views 422KB Size Report
A Novel Algorithm for High-Throughput. Programming of Multilevel Flash Memories. Marco Grossi, Massimo Lanzoni, and Bruno Riccò, Senior Member, IEEE.
1290

IEEE TRANS. ELECTRON DEVICES, VOL. 50, NO. 5, MAY 2003

A Novel Algorithm for High-Throughput Programming of Multilevel Flash Memories Marco Grossi, Massimo Lanzoni, and Bruno Riccò, Senior Member, IEEE

Abstract—This paper presents a new method to program multilevel (ML) Flash memories that combines ramped-gate programming with minimum verification of the sense transistor threshold voltage, in order to achieve high program throughput, i.e., number of bits programmed per second. Such a method is studied by means of extensive measurements on production quality test chips and is found able to allow a program throughput about three times as large as the state of the art presented in the literature. Furthermore, it is found adequate for 3-bit-per-cell multilevel schemes, while for the extension to the 4-bit-per-cell case the use of error correcting codes cannot be avoided. Index Terms—Flash, memories, multilevel, programming.

I. INTRODUCTION

T

HE EVER-increasing demand for nonvolatile memories, in particular for mass-storage applications, leads to increased interest for multilevel (ML) storage techniques [1], allowing significant cost per bit reduction for the same cell dimension. As known, however, the ML approach is more critical than the conventional one (1 bit per cell) in terms of charge retention on the cell floating gate (FG), read and write disturbs and sensing accuracy. In particular this is the case for accurate programming, i.e., the placement of the right amount of charge on the cell FG to produce tight distributions of mean values (wherein hereafter denotes the number of bits per cell) of the sense transistor threshold voltage ( ) within a total voltage window (i.e., the separation between the highest and the lowest value of ), that tends to shrink with new technologies aimed at low voltage operation. Accurate charge placement is normally obtained by means of program and verify (P&V) algorithms featuring a sequence of small program steps, each followed by a read operation to determine whether or not further programming is to be made. This approach obviously leads to the required accuracy provided that individual program step is small enough (in practice its effects distributions). must be smaller than the target width of the On the other hand, precision is heavily paid in terms of program time, hence program throughput (PT), i.e., number of bits that can be programmed per second, since the number of P&V steps distribution widths. increases with decreasing Manuscript received October 29, 2002; revised January 13, 2003. The review of this paper was arranged by Editor S.-I. Kimura The authors are with D.E.I.S., University of Bologna, 40136 Bologna, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/TED.2003.813455

This, of course, is particularly true for increasing values of (3, 4, …), since the width of the distribution decreases (for the same total voltage window). essentially as In the literature, few papers discuss programming of 2-bits per cell Flash memories featuring both NOR [2], [3] and NAND [4] architecture. In particular [2] presents a 4-level NOR Flash memory achieving a PT of 0.17 MB/s, namely 0.68 Mcells/s programmed with a degree of parallelism (DOP) of 64, hence resulting in a program time of 95 s for 64 cells. In this case, the program voltage on the cell control gate (CG) is increased mV at each P&V step. by and 4, and for the same total voltage In the case of distributions to be correctly sepwindow, in order for all the should become approximately 150 arated from one another, and 4, respectively. Assuming for simand 75 mV for mV, the number of P&V steps should become plicity three times higher: Thus, considering also the delay inherent to the switching back and forth from program and read operations, this leads to (at least) a four-times increase in program time. In particular, with a program time at a cell level of 400 s (320 of which dedicated to verify operations) and DOP 256 , the PT can be estimated to remain about the same (0.64 10 cells/s, i.e., 0.64 Mcells/s) as in the 2-bit per cell case. More generally, the increase in the number of levels, hence in that of program steps, increases the fraction of program time dedicated to verification (hereafter denoted verify time): In the conventional 1-bit per cell case such a fraction is about 20%, (16 levels) it becomes 80%. while for In order to limit the negative effects of verify time on PT, [5] introduced a self-converging program technique based on the use of a ramped voltage on the cell CG, that (in principle) does not require verification. We have checked the actual capabilities of such a method by means of extensive measurements on blocks of 512 10 cells (512 Kcells) on our test chips and have distribution widths verified that, in practice, it can achieve of about 500 mV, provided suitably low values of the slope of the ramped voltage (0.04 V/ s) are used. Furthermore, in the case of ML programming, not only small distribution widths, but also correct placement for the distribution mean values, must be guaranteed; and from this point of view our experiments have shown a difference (displacement) of about between the distribution central value and the target 50 mV (see Fig. 8). This error is substantially worsened by program-erase cycling, because the program efficiency decreases with the number of cycles.

0018-9383/03$17.00 © 2003 IEEE

GROSSI et al.: NOVEL ALGORITHM FOR HIGH-THROUGHPUT PROGRAMMING

Thus, all considered, the program accuracy of the method of distribution width [5] can be described in terms of effective of about 600 mV. This value, however, is not suitable for ML memories with or 4; hence, suitable program algorithms are needed to distributions and to reduce their displacement tighten the from their nominal position within the total voltage window. The use of verify operations in conjunction with ramped programming, however, requires preliminary (accurate) analog dein order to set the initial value of the termination of the cell determinaCG voltage. On the other hand, since accurate tion is a lengthy operation, programming techniques combining ramped CG and verify can be competitive only if the number of verifies is minimized. In this context, this paper presents a novel technique combining ramped programming and verify, that is shown capable distribution widths and displacement smaller than to achieve 150 and 20 mV, respectively using only two program steps (and determinations) for each cell. This method is adequate for the results of 3 bits per cell ML schemes while for this work indicate that the separation between adjacent distributions is probably insufficient to guarantee the reliability requirements. It, however, could still be used if error correcting codes are adopted. The proposed algorithm leads to a program time six times lower than that of [2] at cell level that, with a cell matrix scheme featuring DOP 256, results in a PT about three times as large as that of [2].

1291

Fig. 1. Schematic representation of the experimental setup used to program Flash memories by the proposed algorithm.

II. DEVICES AND EXPERIMENTS The experiments of this work have been carried out using test chips of Flash memories with NOR, common ground architecture, fabricated by ST-Microelectronics (Milan, Italy) with a production-quality of 0.18 m technology. These memories contain 4 10 cells (4 Mcells) divided in 8 sectors, each with 512 Kcells, organized in 256 rows each of 2048 cells. These test chips, configurable by means of suitable latches, allow us to select the operation to be performed as well as the number of cells to be simultaneously programmed. The cell array is realized in triple-well technology, allowing the application of negative substrate bias during programming and positive during erasing [6]. Since substrate bias greatly improves PT [7], for any given bit-line bias (i.e., drain-source ), the maximum value (5.5 V) of voltage of the cell transistor the transistor drain-substrate voltage ( ) compatible with drain-junction breakdown has always been used for cell programming. The experimental setup used for this work is schematically represented in Fig. 1. Two acquisition boards provide all the digital and analog signals used for memory address and control. All the setup (acquisition boards and external instruments) is controlled via a GPIB bus, by means of LabVIEW programs. is provided by a programmable DC The bit-line voltage power supply whose floating ground terminal is connected to converter that provides a measure for a shunt resistance

Fig. 2. Schematic representation of the on-board circuit that generates V .

current absorption during programming without affecting the memory bias conditions. The word-line voltage (i.e., the CG bias of the cell transistor ) is provided by a circuit integrated on the memory board whose scheme is schematically represented in Fig. 2. During programming, the initial digital state of the CG voltage is loaded into an 8-bit counter whose clock is generated by an external pulse generator, allowing us to set the signal period so as to obtain the required ramp slope. The output of the counter is then used as the input for an 8-bit DAC (dividing a 0–10 V interval mV) to provide the analog in 256 steps each of value to the cell CG. During read operations, the procedure is the same as programming but clock signal is always low; thus, the digital-to-analog converter (DAC) output assumes a fixed value.

1292

IEEE TRANS. ELECTRON DEVICES, VOL. 50, NO. 5, MAY 2003

Fig. 3. Representation of the evolution of V and V during the first program step. t denotes the total programming time for the first step while t denotes the time required to reach the equilibrium condition where V V is constant.

0

III. PROGRAM ALGORITHM As known through [5] and [8], for ramp programming to be really convenient (in terms of accuracy, current control and, ultimately, DOP) it is necessary to reach the (equilibrium) condition current of the cell transistor is constant in spite of where the steadily increasing voltage on the CG. Under this condition, the increases linearly with the same slope of ; thus, a) cell (namely, equilibrium overdrive ) is constant (see Fig. 3) at a value depending only on ramp slope and capacitative coupling between CG and FG; and b) the final value of is determined by the time at which programming is stopped (i.e., the bit-line is grounded). Because of the dispersion in cell characteristics, however, is different from cell to cell: Thus, considering a number of cells to be programmed at the same final value of , the cell (i.e., at the beginning of the programoverdrive ming) must be suitably adjusted in each cell for the dispersion of to be adequately small. This paper presents the programmed an efficient algorithm to do so. In particular, in our method, cell programming is divided in two steps, at the end of the first of is determined. Then the which, the obtained (intermediate) necessary “overdrive adjustment,” illustrated in details just after point 4 below, is made to compensate for non idealities in cell characteristics. In details, and with reference to Figs. 4 and 5, the program algorithm proposed in this work consists of the following steps. of the cell is determined. 1) The initial value to an intermediate 2) The cell is programmed from using a ramped CG voltage with slope target value and the same overdrive for all cells. of after this program step 3) The obtained value is determined. to the final value 4) The cell is programmed from with a CG voltage of slope and overdrive , where . In more details, the first determination of threshold voltage is carried out to guarantee a quasiequilibrium condition during the first program operation (modulating the initial value

Fig. 4. Representation of the proposed P&V algorithm. Inside the boxes the CG voltage during the two program steps is shown.

of the CG ramp ) so to avoid high current absorption at the beginning of the programming and loss of accuracy in placing charge into the FG [8]. , As for the second determination of threshold voltage it aims to adjust in the program overdrive of each individual cell and represents the essential element of our method to obtain adequate program accuracy. In fact, as Fig. 3 shows, after the first write step the threshold value is , where . Of course is the result of the dispersion in cell characteristics that need to be compensated. Since CG voltage is obtained with a ramp, , thus the intermediate measure allows to deused to fix the duration of the second program step. termine Hereafter, the whole operation (shown in Fig. 5) is referred to as “overdrive adjustment.” When the operation is carried out determination is performed by dichotomous on a single cell (or binary) reading, based on the division of the current voltage range in two equal subranges at each step. At the beginning the entire voltage range (0–10 V) is divided into two subranges by applying a word line voltage of 5 V, then the same procedure is is located, and so on. applied to the subrange where the cell If the entire voltage range is divided in states, the number of . steps needed to complete reading is As for accuracy in the programmed , it must be pointed out that errors introduced in the variables involved in the algorithm distribution. If it is assumed and as widen the final cell determination and determination the errors introduced in , thus (from ) respectively, it is

Since under the equilibrium condition cell increases linearly with the same slope of CG, the error on pulse duration produces

GROSSI et al.: NOVEL ALGORITHM FOR HIGH-THROUGHPUT PROGRAMMING

1293

Fig. 5. Representation of the steps needed by the proposed algorithm. Word line and bit line voltages (V and V , respectively) during the two program steps are shown. V modulates the initial CG value of the ramp during WRITE 1, while V modulates the final CG value of the ramp during WRITE 2. V represents the overdrive compensation and is the difference between the target threshold value V and the value V reached at the end V of the first write step.

1=

0

an equal error on final cell multiplied by distribution width can be expressed as

, so the final

where

is the number of DAC states. IV. PROGRAM THROUGHPUT

where and is a term accounting for other errors in the algorithm. distribution broadFrom the term it can be seen that the : So a tradeoff between fast and acening increases with curate programming is in order. Moreover, in this paper mV and, since distribution widths for sets of 16 K cells are determination is responsible for more about 120 mV, error in distribution width. So accurate determithan 60% of the distribunation represents the main element to obtain tight tions, at least when not very high program velocities are used. required to proAs for performance, the total time gram a cell is given by

Denoting the time required for a single step of diand require a time chotomous reading, each, while the given by . Thus assuming (as V/ s, V, worst case): ns, V, V we obtain s

s

s

This time is about six times smaller than that required by conventional P&V techniques (400 s). At this regard, it is important to point out that, since our algorithm features always and only two program steps, the time dedicated to verification rep, that can be approxiresents only a small fraction of mately expressed as

Of course, as far as applications are concerned, the decisive factor is PT, that calls for high DOP. From this point of view, the P&V algorithm described in [2], here taken as a reference for , and 8, during programming comparison, features and reading, respectively. As for the method of this work, since a set of cells on the same word line can be programmed in parallel by controlling the bit line voltage, the main problem comes from the need to s in parallel. For classic determine a high number of cell NOR architecture, the cell s can be determined by means of a serial read operation with increasing word-line voltage, recat which each cell starts to conduct (this is ognizing the distributions presented in the the method used to determine paper). However, considering that after the first writing step the cover a range of about 3 V (i.e., 77 steps possible values of of the DAC), with reading parallelism of 8 and ns (as in [2]), (about) 306 s are required to determine for 256 cells, and this value is evidently incompatible with high PT. Since dichotomous reading cannot be applied to many cells in parallel without a strong change in array architecture, a possible solution features analog determination of cells . can be determinated in different essentially equivalent ways. A circuit to be used for this purpose, here presented as an example, is illustrated in Fig. 6. In this scheme, M1 is the must be determinated, while M2 is a dummy cell cell whose is set to the target threshold level to be obtained. If whose , and and are such that both devices operate in saturation it is (1) (2) (3)

1294

Fig. 6.

IEEE TRANS. ELECTRON DEVICES, VOL. 50, NO. 5, MAY 2003

Circuit used to make an analog determination of cell V .

TABLE I PERFORMANCE COMPARISON BETWEEN THE CONVENTIONAL P&V AND THE PROPOSED ALGORITHM. BOTH ALGORITHMS USE CHE FOR PROGRAMMING WITH A DOP OF 256 FOR A 16-LEVEL (4 BITS PER CELL) FLASH MEMORY OF THE SAME TECHNOLOGY Fig. 7. Comparison between V distribution width obtained with RVPnoverify and with the proposed P&V algorithm. Measures are carried 3:5 V, V = 2 V, V = 0:01 V/s, out using the conditions: V V = 0:35 V, V = 6 V.

=

V. EXPERIMENTAL RESULTS

Thus, represents just the difference of the reference and the , and , respectively) required to actual cell threshold ( implement the scheme proposed in this article. pF have shown that SPICE simulations with to reach the equilibrium value exthe maximum time for for pressed by (2) is 1 s. Therefore, assuming to determine all cells of a same level in parallel, 16 s are needed to read for 256 cells in a 16-level Flash memory ( s). Considering that and assuming for program operation V, V, V, V/ s and a distribution width of 0.5 V, we have that must range in the interval 0.5 V–3.75 V and 1 V–6.5 V during the first and the second write step, respectively. Consequently, the time to write 256 cells in parallel can be expressed as s The total time to program 256 cells is thus expressed as s which is much smaller than the time required by conventional P&V (400 s). With the assumptions made above the PT can be calculated as PT

Mcells/s

which is almost three times higher than 0.64 Mcells/s obtained with the programming algorithm of [2]. A performance comparison between classic P&V algorithm described in the introduction and the proposed one is shown in Table I.

In this work, an extensive set of measurements has been carried out in order to demonstrate that the proposed algorithm provides significantly higher accuracy than ramped voltage programming without verify (RVPnoverify). Fig. 7 shows a comparison of the results obtained programming 16 Kcells of the test chips described in Section II with the algorithm of this paper and with RVPnoverify. In both cases, the V, same program conditions have been used, namely, V, V/ s, V, V. As can be seen in the case of RVPnoverify the width of the distributions is approximately 400 mV, while with the algorithm presented in this work it is 120 mV. distributions, As for accuracy in the central value of the the results of the comparison between RVPnoverify and the algorithm of this work are shown in Fig. 8. Here, 16 different blocks of 16 K cells each are programmed with both RVPnoverify and the proposed algorithm. The central value of the distribution for each block is then represented for both methods. As can be seen, while the former method produces a dispersion of about 100 mV, with the method of this work such a dispersion is only 6.6 mV. A consequence of the improved accuracy obtained with the method of this work is that the dispersion depending on the position of the memory cell, resulting from the application of RVPnoverify is almost eliminated. In order to demonstrate that the precision of the algorithm proposed in this work is adequate for ML programming, whole sectors (512 Kcells) of the test chips have been programmed with 8 and 16 levels. Figs. 9 and 10 show the results obtained for 8 and 16 levels V, V and respectively (while in both cases V/ s is used). In particular, Fig. 9 represents the 8 levels case and shows that of each distribution and the minimum distance between the the reference levels (located midway among the central values of the distributions) corresponds to 0.2 V. The separation of the

GROSSI et al.: NOVEL ALGORITHM FOR HIGH-THROUGHPUT PROGRAMMING

Fig. 8. Comparison between the final V mean value dispersion obtained with RVPnoverify and with the proposed P&V algorithm. The measures are carried 3:5 V, V = 2 V, V = 0:01 V/s, out using the conditions: V V = 0:35 V, V = 6 V. The proposed algorithm has the advantage to suppress the dispersion due to position of the cell in the array.

=

1295

Fig. 10. V distributions for 16-level programming. The minimum distance between the V of each distribution and the reference levels is 40 mV.

In any case, the adoption of suitable Error Correcting Codes (with inherent data redundancy) could make it possible to use distributions such as those of Fig. 10 also with 16 level schemes. VI. CONCLUSIONS

Fig. 9. V distributions for eight-level programming. The minimum distance between the V of each distribution, and the reference levels is 200 mV.

different distributions and their placement with respect to the reference levels is adequate for 8 levels programming. Fig. 10, instead, represents the case of 16 levels. In this case distributions are separated by about 100 mV from one the another, with the minimum distance between distribution borders and reference levels of about 40 mV. These values are probably not compatible with reliable memory operations, particularly considering the effects of read and write disturbs due to repeated program-erase cycles. From this point of view, however, it should be pointed out that the limits are essentially due to the use of a 6 V total voltage window for 16 levels, while the program algorithm distributions uniformly seems capable of producing tight occupying the available space.

This paper has presented a novel program scheme for ML Flash memories that combines ramped voltage programming with minimum verification of the obtained results, in order to improve the accuracy in the placement and width of the distributions of the cell threshold voltages within the available voltage window. With respect to the conventional approach based on P&V procedures, the method of this work has a significant advantage in terms of program throughput, in that it allows programming almost three times as many bits per unit time. The proposed method has been experimentally studied by means of an extensive set of measurements performed on suitable test chips, and has been shown suitable for applications to the case of eight-level storage (i.e., 3 bits per cell), without the need of data redundancy and error correcting codes. As for the case of 16 levels (4 bits per cell), the total voltage window considered in this work (6 V) leads to a separation between the threshold voltage distributions that is probably insufficient for direct use in real memory operation without data redundancy and error correcting codes. Although this limitation is not essentially due to the proposed scheme, that seems able to adequately exploit all the voltage window. ACKNOWLEDGMENT The authors would like to thank Dr. A. Modelli, ST Microelectronics, Agrate, Italy, for helpful discussions and providing the devices used in this work. REFERENCES [1] B. Riccò, G. Torelli, M. Lanzoni, A. Manstretta, H. Maes, D. Montanari, and A. Modelli, “Nonvolatile multilevel memories for digital applications,” Proc. IEEE, vol. 89, pp. 2399–2423, Dec. 1998.

1296

IEEE TRANS. ELECTRON DEVICES, VOL. 50, NO. 5, MAY 2003

[2] A. Silvagni, S. Zanardi, A. Manstretta, and M. Scotti, “Modular architecture for a family of multilevel 256/192/128/64mbit 2-bit/cell 3v only nor flash memory devices,” IEEE Trans. Electron Devices, vol. 48, pp. 937–940, Jan. 2001. [3] M. Bauer et al., “A multilevel-cell 32Mb flash memory,” in IEEE ISSCC Tech. Dig., 1995, pp. 132–133. [4] T.-S. Jung, Y.-J. Choi, and K.-D. Suh, “A 117mm 3.3v only 128mb multilevel nand flash memory for mass storage applications,” IEEE J. Solid-State Circuits, vol. 31, pp. 1575–1583, Nov. 1996. [5] D. Esseni, A. Della Strada, P. Cappelletti, and B. Riccò, “A new and flexible scheme for hot-electron programming of nonvolatile memory cells,” IEEE Trans. Electron Devices, vol. 46, pp. 125–133, Jan. 1999. [6] C. Auricchio, R. Bez, A. Losavio, A. Maurelli, C. Sala, and P. Zabberoni, “A triple well architecture for low-voltage operation in submicron CMOS devices,” in Proc. Eur. Solid-State Device Res., 1996, p. 613. [7] R. Versari, D. Esseni, G. Falavigna, M. Lanzoni, and B. Riccó, “Bandwidth optimization of flash memories with the RGP technique,” IEEE Trans. Electron Devices, vol. 48, pp. 1737–1740, Aug. 2001. [8] R. Versari, G. Falavigna, M. Lanzoni, and B. Riccò, “Optimized programming of multilevel flash eeproms,” IEEE Trans. Electron Devices, vol. 48, pp. 1641–1646, Aug. 2001.

Marco Grossi was born in Bologna, Italy, on May 31, 1973. He received the Laurea degree in electronic engineering from the University of Bologna, in 2000. In 2001, he joined the D.E.I.S. Laboratory of the University of Bologna as a Ph.D. student, where he currently works. His research interests are focused in characterization of nonvolatile memories. He is currently working in the field of Flash memories and the multilevel programming of these memories using the ramped-gatetechnique.

Massimo Lanzoni was born in Bologna, Italy, on August 9, 1961. He received the Laurea degree in Ingegneria Elettronica from the University of Bologna in 1987. He has been with the Microelectronics Research Group at the Department of Electronics at the University of Bologna, working on research projects in the fields of the experimental characterization and simulation of EEPROM memory cells and MOS devices and the automatic test of VLSI devices. His scientific interests cover the characterization of thin-dielectrics reliability, nonvolatile memory cell characteristics and reliability, MOS transistor experimental characterization, and new techniques for IC testing such as nonvolatile memories endurance testing and CMOS IC latch-up testing. He is now involved in projects concerning analog applications of nonvolatile memories and multilevel programming.

Bruno Riccò (SM’91) was born in Parma, Italy, on February 8, 1947. In 1971, he received the B.S. degree in electrical engineering from the University of Bologna, Bologna, Italy, and in 1975, he received the Ph.D. degree from the University of Cambridge, Cambridge, U.K. He has worked at the Cavendish Laboratory, Cambridge, U.K. In 1980, he became Full Professor of applied electronics at the University of Padova, Padova, Italy, and in 1983, he joined the University of Bologna. Since 1978, he has been holding courses on electron devices, digital integrated electronics, semiconductor technology, and IC reliability and testing. He has been a Visiting Professor with the University of Stanford, Stanford, CT; the IBM Thomas J. Watson Research Center, Yorktown Heights, NY; and the University of Washington, Seattle. He has been consulting for major companies interested in IC fabrication and evaluation and for the Commission of the European Union in the definition, evaluation, and review of research projects in microelectronics. He has worked in the field of solid-state devices and ICs, making many contributions to the understanding and modeling of electron transport, tunneling in heterostructures and thin-insulating films, silicon dioxide physics, MOSFETs physics, latch-up in CMOS structures, device modeling, and simulation. He is currently working in the field of IC design, evaluation, and testing. He has authored or coauthor over 270 publications, more than half of which have been published in major international journals. He has also written three books and holds six patents in the field of nonvolatile memories. Dr. Riccò received the G. Marconi Award from the Italian Association of Electrical and Electronics Engineers (AEI) in 1995 for his research in electronics. In 1996, he became President of the Group of Electron Devices, Technologies, and Circuits of AEI and, in 1998, became President of the Italian Group of Electronics Engineers. In 1999, he was appointed European representative for the International Electron Device Meeting (IEDM). In 2000, he became Vice President of the North Italy Section of IEEE. He was the European Editor of the IEEE TRANSACTIONS ON ELECTRON DEVICES from 1986 to 1996.