Impact of Packaging Technology on System Partitioning: A Case Study

24 downloads 0 Views 182KB Size Report
The impact of bonding technologies such as area array Flip-Chip, FC and peripheral Wire-Bond,. WB on the die size and layout of VLSI dies is discussed in 3 .
Impact of Packaging Technology on System Partitioning: A Case Study Peyman Dehkordi, Karthi Ramamurthi, Don Bouldiny Howard Davidsonz and Peter Sandbornx Abstract This paper emphasizes concurrent consideration of the partitioning of a microelectronic circuit design into multiple dies and the selection of the appropriate packaging technology for implementation of the entire system. Partitioning a large design into a multichip package is a non-trivial task. Similarly, selection of the MCM packaging technology to accommodate a multichip solution can also be puzzling. The interdependencies of these two problems a ord the opportunity to achieve a global optimum when considered concurrently. In this paper we address the partitioning/MCM technology tradeo , their interdependency, and previous work in this area. The SUN MicroSparc CPU is used as a demonstration vehicle and is partitioned for di erent MCM technologies. The preliminary results show that the optimum number of partitions and contents of each partition depend heavily on the choice of MCM technologies for a given application.

1 Introduction Multi-chip modules have been gaining popularity and becoming more available during the past few years. The designer is now faced with a variety of MCM packaging technologies and has to understand and compare them for a given application. Conventionally, this has been performed by the package designer mainly toward the end of the design cycle. However, to achieve a more nearly optimum system, packaging-related issues should be considered throughout the design cycle by system, IC, and This research was sponsored in part by the Advanced Research Projects Agency under ARPA order A879 and monitored by the Army Research Oce under grant DAAH0494-G-0004 to the University of Tennessee. y University of Tennessee, Knoxville, TN z Sun Microsystems, Mountain View, CA x MCC, Austin, TX

package designers. About 40% of the the product cost is determined by the decisions made in the rst 10% of the design cycle [1]. This suggests that the choice of packaging should be explored at the early stages of the design for a more globally optimum system. Considering the critical packaging issues early in the design cycle is termed \Design for Packageability" (DFP) and is discussed in [2]. Comparisons between di erent MCM technologies cannot be made by just considering the physical and electrical parameters of the technologies. True comparisons should be made at the system level by understanding the impact of the di erent MCM technologies on the overall cost/system performance of the nal system.

2 Problem De nition and Motivation As a part of our research, we are trying to identify the stages in the design cycle which will bene t the most from taking advantage of the new solutions o ered by MCM packaging. Figure 1 shows how partitioning drives all the system performance parameters such as system cost, size, thermal, power, packaging delay and simultaneous switching noise. Chip bonding and substrate technologies determine the physical constraints on the partitioning process. It can be seen that the choice of the packaging technology propagates through the partitioning process and then impacts system performance. Thus, there is a need to explore the e ect of the various packaging technologies on system partitioning and hence system performance to achieve an optimum system. The main objective of our work is to study the interaction and impact of the various bonding and substrate technology alternatives on system partitioning and performance. Partitioning an ultra-large single die into multiple smaller dies housed on a MCM is used as an example; however, this approach can be applied to a larger class of applications using multiple levels of packaging hierarchy.

Delay

Substrate Size

Die Size Bonding Technology

Substrate Technology

Bonding Technology

System Power Dissipation

Packaging

System Size

Number of Dies

# I/O per Die

Number of Dies

Bonding Technology

Net Length

Logic Size

Substrate Technology

Clock Frequency

# I/O per Die

Substrate Technology

Logic

Bonding Technology User Defined Constraints

PARTITIONING

Substrate Technology Logic

Substrate Size

Substrate TestingCost

Substrate Cost

# I/O per Die

Die Size

Defect Density

Bonding Technology

System Power Dissipation

Thermal Budget

Substrate Technology

# I/O per Die

Bonding Technology

Substrate Technology Characeristic Impedance

Test Cost

Die Cost

Module Assembly & Testing

System Cost

Allowed External Thermal Resistance

Simultaneous Switching Noise

Figure 1: Interaction between System Performance Parameters and Partitioning using DFP. The impact of bonding technologies such as area array (Flip-Chip, FC) and peripheral (Wire-Bond, WB ) on the die size and layout of VLSI dies is discussed in [3]. That work was focused at the the physical die layout level rather than the system level. A more conceptual trade-o analysis between peripheral and area array bonding in MCMs is presented in [4]. The design is described in terms of gate counts used Rent's rule to establish the number of I/Os for a given circuit size. Our work tries to establish the interaction of the various packaging alternatives, partitioning and the system performance when more information is available about the design. The SUN MicroSparc is used as a demonstration vehicle to illustrate the extent of the interaction. The design is described at the functional unit level consisting of the following: integer unit (IU),

oating point unit (FU), memory management unit (MMU), data cache (D-CACHE), instruction cache (I-CACHE), S-bus controller (S-BUS-CTL), memory interface unit (MEM-INTF), clock control and bu ers (CLK-CTL), and miscellaneous control logic (MISC) as shown in Figure 2.

3 Approach The intent of this work is not to promote a new partitioning scheme since numerous techniques

have already been described [5],[6]. Our approach is to develop a framework in which various packaging/partitioning choices can be explored and evaluated concurrently. Since performing a detailed evaluation at the system level for various packaging/partitions can turn into a time-consuming task, we have employed an estimation-based, early-analysis technique. At this level there are only nine functional units so the number of possible candidates is low enough (approx. 21,000) that it is possible to search the solution space exhaustively. This guarantees the best solution will be found. At the register transfer level where there are a large number of components, it is more appropriate to use partitioning techniques based on algorithms such as simulated annealing which can guarantee a good solution among several possible solutions. The steps involved in our concurrent analysis are shown in Figure 3. The user speci es the package and die related information (i.e. bonding technology, maximum die size). The constraint generator derives the actual constraints from these user speci cations. The exhaustive partitioner then generates all possible candidate partitions. The algorithm used to generate partitions is described in [7]. The die size and die I/O estimates are then calculated for each of the partitions. Next the partitions are veri ed against the constraints. These constraints are used to qualify a partition for further processing. Further

IC_SCAN_IN

IC_SCAN_OUT

IT_SCAN_OUT

IC_TG_STROBE

IT_SCAN_IN

IT_VAL_F

IT_TG_STROBE

IC_IT_CLR

IC_IBUS[31:0]

IC_TAGV_WE

IC_TAGV_DIN

IC_TAG_WE

IC_TAG_DIN[12:0]

IC_BE

IC_WE

IC_ADR_IN[11:2]

B_MEMPAR_OUT[1:0] B_MEMDATA_OUT[63:00] B_MEM_OEN

SP_SEL R_ISSUE_REQ

PIN_SP_SEL PIN_R_ISSUE_REQ PIN_B_MEMPAR[1:0]

JTAG_TRST

SS_DI_VAL[1:2]

IT_DOUT_F[12:0]

DP_PERR[2:0]

DP_BUF0_EN

MC_MDATA1

MC_MDATA0

MC_CADDR_HLD

MC_MBSY

MC_MSTB_L

SS_SCAN_MODE

SS_DI_VAL

SS_CLOCK

SP_SEL

MM_ODDMPAR

SS_RESET

MC_MDATA_EN1

MC_MDATA_EN0

MM_DATA_VIEW

MM_RF_CNTL[1:0]

MM_MREQ[3:0]

MM_ISSUE_REQ

MM_PAGE

MM_CADDR[11:3]

MM_PA_B[2:0]

MM_PA_A[26:12]

DP_PERR[2:0]

MC_MSTB_L

MC_MBSY

SBCLK

IOSTBEN SS_SCAN_MODE

SB_PE RESET_NONWD

SB_WRT

STOP_EVEN

MC_MDATA_EN

RESET_ANY

SB_SIZ[1:0]

STOP

BR_L[4:0] START

LERR_L TESTCLKEN

ARBEN[4:0]

SB_IOREQ[3:0]

MC_CAS_L_BUS[1:0]

JTAG_TDO

RS_DSBL_CLOCKS_IN

TESTCLK

EXT_EVENT_L

IU_EVENT

MM_SBSIZE[1:0] MM_CPCYC_STB_L MM_CPSB_WRT

IU_ERROR

MM_HOLD_RST

MM_EVENT

DATAOUT[31:0] IRL[3:0] IC_SEL_DVA MM_SSCR_BA8[3:0]

MM_SBAE[4:0]

MM_SB_PE

INPUT_RESET_L

JTAG_TRST_L

IC_SEL_FILLP

EXT_FCCV R_ISSUE_REQ MM_SLOCK

MM_CADDR[11:3]

MM_ISSUE_REQ

MM_MREQ[3:0]

MM_ADDR_VIEW

SB_VA_VAL_L

MM_PA[30:0]

RL_MEMIF_SCAN_IN

MC_MDATA_IN[31:0]

MC_CAS_L[1:0] STOP

MC_RAS_L[3:0]

MC_MWE_L

DP_BEN SBUS_1ST_HALF_DLY

B_MEMDATA_OUT[63:0]

B_MEMPAR_OUT[1:0]

MC_MEMADDR[11:0]

RCC_RST_L

SBUS_1ST_HALF

STOPPED

B_MEMDATA_IN[63:0]

B_MEMPAR_IN[1:0]

PIN_B_MEMDATA[63:0]

MC_CAS_L[1:0] MC_RAS_L[3:0]

PIN_MC_CAS_L[1:0]

MC_RAS_L_BUS[3:0]

PIN_MC_RAS_L[3:0]

START

TC_SCAN_CLK

STOP_EVEN

JTAG_TRST_L

INPUT_RESET_L

INPUT_CLOCK

LOGIC_1

LOGIC_0

TESTCLKEN

SS_MISC_SCAN_OUT

ACK_IN_BUS[2:0]

BSR_TDO

JTAG_TDO_OEN_BUG1059

MM_IOSTBEN BSR_TDO

JTAG_CK

SS_MISC_SCAN_IN

RCC_CLK

SB_ERRSIZE[2:0]

SB_SBSLOT[2:0]

SB_IOA[31:0]

RS_DSBL_CLOCKS_IN

SS_CLOCK[1:2]

ACK_OUT_BUS[2:0]

AREA=6328580

MMU_17[2:0]

MMU_16[3:0]

MMU_15[2:0]

MMU_18[1:0] MMU_14[31:0]

SB_SIZE_IN_BUS[2:0]

IU_DVA_E[10:4]

DT_DOUT_W[14:0]

DT_VAL_W

DT_TG_STROBE

DC_TG_STROBE

DT_SCAN_IN

DT_SCAN_OUT

DC_SCAN_IN

DC_SCAN_OUT

DC_SCAN_MODE

DC_BE

DC_DT_WE

DC_DTV_WE

DC_DT_CLR

DC_DTV_DIN

DC_ADR_IN[3:2]

DC_DATA_IN[31:0]

DC_WE[0:3]

DC_DT_DIN[14:0]

AREA=1019523.7 GATES=1357 TRANSISTORS=5431 PRIMARY_OUT=3 PRIMARY_IN=6

SB_SIZE_OUT_BUS[2:0]

PRIMARY_OUT=80 PRIMARY_IN=6

SBADDR_BUS[27:00]

SB_BG_L_BUS[4:0]

SBDATA_IN_BUS[31:00]

SBDATA_OUT_BUS[31:00]

SB_SIZE_OUT_BUS[2:0]

MC_MDATA[31:0]

IOREQ[3:0]

SB_IOA[31:0]

ERR_CRD_L

SSIZ[2:0]

SEL_L[4:0]

BG_L[4:0]

ERR_TYPE[1:0]

LOGIC_1

TG_STRB

MM_IODATEN

MM_SBDAT_RDY

JTAG_TDI

JTAG_MS

SBUS_1ST_HALF_DLY

SBUS_1ST_HALF

RCC_RST_L

STOPPED

MM_MDATA_VIEW

SB_CYC_PND

MM_ODDPAR MM_PAGE

SB_SLOCK

RL_MEMIF_SCAN_OUT

IC_MDATA_BUF[31:0]

B_MEMPAR_IN[1:0] B_MEMDATA_IN[63:00]

PIN_MC_MEMADDR[11:0]

MC_MEMADDR[11:0]

MC_MWE_L

MC_MWE_L

AREA=7177860 GATES TRANSISTORS

SB_ACK_OEN

SB_ACK_L_OUT[2:0]

PIN_SB_CR_L SB_CR_L

EXT_FEXC

EXT_FHOLD CLKBUF_1[2:2]

EXT_VALID_DECODE SS_RESET

SS_CLOCK[1:6]

Figure 2: Functional-Level Diagram of the MicroSparc.

B_MEMDATA_OUT_BUS[63:00]

PIN_JTAG_TDI

JTAG_TDI

JTAG_BCTL[4:0]

JTAG_MS

IU_ERROR_L

PIN_JTAG_MS

INT_EVENT_L

MISC.

CSTB_L

JTAG_CK

CPA[30:0]

PIN_JTAG_CK

PA_VAL

CP_STAT_L[1:0]

LOGIC_0

RD_IN

PIN_CP_STAT_L[1:0]

CPU_LOCK

CP_STAT_L_BUS[1:0]

ACK_IN[2:0]

REF_CLK

SB_SIZE_IN_BUS[2:0] SIZ_IN[2:0]

D_IN[31:0]

PIN_REF_CLK

MC_MDATA_IN[31:0]

PIN_SBCLK

D_VIEW

INPUT_RESET_L

A_VIEW

PIN_INPUT_RESET_L

SS_DI_VAL[1:2]

DC_DBUS_A[31:0]

DC_DBUS_B[31:0]

SB_SLOT[2:0]

S-BUS-CTL

SEND_DAT

INPUT_CLOCK

CLKBUF_2[6:6]

MMU_23[3:0]

SBC_SCAN_IN

PIN_INPUT_CLOCK

MM_PA_VALID

CP_STAT_L_BUS[1:1]

SB_BR_L_BUS[4:0]

MMU_25[1:0]

MMU_24[4:0]

MMU_19[30:0]

MMU_3[10:4]

MMU_29[14:0]

B8[3:0]

REF_CLK

DC_LAST_DMHOLD

DC_DOUT_DONE

DC_DATA_AVAIL

DC_2ND_DATA_AVAIL

DC_DTV_DIN

IC_IT_CLR

DC_DT_CLR

DC_DTV_WE

DC_DT_WE

DC_BE

DC_WE[0:3]

DC_DATA_IN[31:0]

IC_SUSTAIN_IMHOLD

MM_START_IMHOLD

DC_SUSTAIN_DMHOLD

MM_START_DMHOLD

MM_DBUS_VIEW

MM_TG_STROBE

MC_TLB_SCAN_IN

SS_SCAN_MODE

SS_CLOCK[1:21]

MM_RFR_CNTL[1:0]

CP_STAT_L_BUS[0:0]

SS_RESET SS_RESET_ANY SS_SCAN_MODE

MMU_28[17:19]

PHASE_0

AREA=342037.24 PRIMARY_IN=1 PRIMARY_OUT=2

IU_IFLUSH_W

MMU_28[1:21]

IU_IN_TRAP MC_CADDR_HLD SS_CLOCK[1:3]

MMU_27[31:0]

MMU_26[0:3]

DI_VAL

DI_VAL[1:6]

IC_LAST_IMHOLD

IC_INSTR_AVAIL

IC_BE

IC_TAGV_WE

IC_TAGV_DIN

IC_TAG_WE

MM_TAG_DIN[26:11]

IC_WE

FP_FPC_SCAN_IN

MMU

DT_VAL_W IU_FETCH_F

MMU_30[31:0]

CLKBUF_2[1:2]

MMU_28[15:16] MMU_12[26:12]

SS_CLOCK

EARLY_CLK[1:2]

IC_IMHOLD_D

DC_DBUS[31:0] DT_DOUT_W[14:0]

MIU_1[31:0]

SS_RESET

PIN_JTAG_TRST

SS_CLOCK[1:2]

SBCLK

SB_ERR_TYPE[1:0]

PA[27:0]

RCC_CLK

SB_WRITE_ERROR

IU_WRITE_E

ACK_OUT[2:0]

$0.01 per I/O

IC_SCAN_MODE

DC_SHOLD

FP_ROM_SCAN_OUT

IU_ANYSTORE_IN_E

AREA=21779540 GATES TRANSISTORS PRIMARY_OUT=2

AREA=19509993 GATES TRANSISTORS PRIMARY_IN=4

CLKBUF_1[1:1]

FPUFPC_3[31:0]

D_OUT[31:0]

CLK-CTL

IU_PIPE_HOLD

MM_HOLD_RST

IRL_BUS[3:0]

IRL[3:0]

AREA=18015704 GATES TRANSISTOR

D_EN

MC_MEMADDR_BUS[11:0]

IU_SIZE_E[1:0]

MC_MDATA[31:0]

IU_STD_IN_E

AREA=4902380 GATES TRANSISTORS PRIMARY_OUT=85 PRIMARY_IN=1

IU_LDSTO_E

M_MMU_CACHE_SCAN_OUT

IU_HELP_W

MEM-INTF

IU_PIPE_HOLD_FAST

MM_LVL15_INT_L

IU_READ_W

B_MEMDATA_IN_BUS[63:00]

DP_BUF0_EN

IU_BYTEMARKS[0:3]

IU_IVA_G[31:2]

6 inches 0.4 inches 3 defects per sq.inch $800 $200 0.2 defects per inch

MC_MDATA_IN[31:0]

MMU_13[2:0]

IU_SUP_INST_G

IU_DBUS[31:0]

IU_TRAPD_IAER

MMU_22[1:0] MMU_21[3:0] MMU_20[11:3]

IU_ASI_E[5:0]

IC_IBUS[31:0]

IU_DVA_E[31:0]

B_MEMPAR_OUT_BUS[1:0]

DOUT[31:0]

IU_READ_E

B_MEMPAR_IN_BUS[1:0]

D_CACHE_ADR[31:0]

MM_DACC_ERR

250( microns)

I_CACHE_ADR[31:2]

ICACHE_DATA[31:0]

MM_MDATA_EN0

200 ( microns)

DATA_IN[31:0] IC_INSTR_AVAIL

DCACHE_DATA[31:0]

MM_MDATA_EN1

MMU_12[26:11]

IN_LINE

MMU_7[31:0]

IU_IFLUSH_W

I-CACHE

IU_SUP_INST_G

IT_DOUT_F[12:0]

MM_IACC_ERR_PAR

IT_VAL_F

EXT_HOLD

IU

MM_IACC_ERR_TLB

MMU_5[5:0]

SS_SCAN_MODE

MMU_2[3:2]

MM_IACC_EXC

AREA=9719830

MMU_28[13:14]

MM_START_IMHOLD

MIU_2[3:2]

EXT_FXACK

MMU_1[4:2]

DC_DAT_AVL TRAP

200 * 200 (microns) 125 * 125 (microns)

MMU_8[12:0]

HLD_DIRREG IU_READ_W

MMU_12[26:14]

MMU_4[11:2]

STD_IN_E

MMU_3[31:0]

IU_WRITE_E

MMU_7[31:0]

ANYSTORE_IN_E

MMU_6[1:0]

IU_NUM_DACC_EXC_R

ER_SDOUT

MMU_11[0:3]

IU_BYTE_MARKS_E[0:3]

MMU_4[31:2]

IC_SUSTAIN_IMHOLD

MMU_10[31:0]

IU_READ_E

6

Value 4

IC_IBUS[31:0]

MMU_9[31:0]

MMU_28[8:12]

PIN_IRL[3:0]

SIZ_OUT[2:0]

Table 1: Die Assumptions Provided by the User. The result of the cost analysis is shown in Figure 4(a) which displays those partitions satisfying the given constraints with the lowest system cost for a particular number of dies in the die set. The

Property Signal/Ground (peripheral) Signal/Ground (area array) Bond pad size (peripheral) Bond pad size ( area array) Min Bond Pad pitch (peripheral) Min Bond Pad pitch (area array) Wafer Diameter Unusable Wafer Border Wafer defect density Processed Wafer cost Wafer Bumping cost Defects due to Wafer Bumping Die Test cost

DPC[31:2] FPUFPC_1[31:2]

EXT_FCC[1:0] FPUFPC_2[1:0]

FU

IU_LDSTO_E

MMU_19[26:12]

We have concurrently considered the following: a) Wire-Bond/MCM-C b) Wire-Bond/MCM-D c) Wire-Bond/MCM-L d) Flip-Chip/MCM-C e) Flip-Chip/MCM-D f) Flip-Chip/MCM-L The exhaustive partitioner has generated over 21,000 partitions for each type of packaging but only those partitions that meet the die and package constraints have been considered for analysis. The die parameters provided by the user are given in Table 1.

PIN_SB_SIZE[2:0]

4 Results

SB_AS_L

FC/MCM-D design o ers the lowest overall system cost for implementing this particular application in a MCM. The system cost is comprised of the die, bonding, substrate and assembly cost. The substrate and assembly cost estimates used in this analysis is discussed in [4]. The ip chip design o ers higher I/O count and takes full advantage of the higher interconnect density of the MCM-D. The combination of these two choices reduces the die area (and hence the die cost) considerably as compared to the conventional peripheral wire-bond design. It should be noted that FC/MCM-D is not highly sensitive to the number of chips in the partition. The multichip design implemented in WB/MCM-C and WB/MCM-D exhibit the highest overall system cost. The lower I/O count o ered by the peripheral wire-bond design results in larger die area which results in reduced yield and higher die cost.

SLVSEL_L_BUS[4:0]

processing involves estimating the system performance characteristics such as system cost, system size, module power, allowed external thermal resistance and total simultaneous switching noise in the module. The MSDA (Multichip System Design Advisor) tool developed by MCC is used to estimate the system performance characteristics.

CLKBUF_2[3:4]

MMU_28[7:7]

DC_2ND_DAT_AVL IU_TRAPD_IAER

MMU_7[31:0]

EXT_FLUSH IU_ASI_E[5:0]

CLKBUF_1[1:2]

IU_SIZE_E[1:0]

MMU_19[2:0]

IU_DCA_LOW[3:2]

MMU_28[20:20]

LAST_IMHOLD

IU_PIPE_HOLD

VALID_DECODE_FPU

CLKBUF_2[5:5]

IMHOLD

MMU_7[1:1]

SS_CLOCK_EARLY MM_START_DMHOLD

MMU_7[0:0]

MMU_28[1:6] FHOLD

IU_MM_DACC_EXC_R

CLKBUF_2[1:6]

LAST_DMHOLD

MMU_28[21:21]

D_DATA_AVAIL

SB_CG_L

DC_SUSTAIN_DMHOLD

PIN_SB_CG_L

DC_SEL_LDD_REG

SB_ACK_L_IN[2:0]

DC_SHOLD

PIN_SB_ACK_L[2:0]

DC_DOUT_DONE

SB_READ_OUT

DC_LDD_STB_L

PIN_SB_READ

IC_NFILLP[4:2]

SB_READ_IN

IC_SEL_DVA

MM_DACC_EXC

SB_SIZE_IN[02:00]

MM_IACC_MISS

SB_SIZE_OUT[02:00]

IC_SEL_FILLP

DPC[31:2]

SB_SIZE_OEN

MM_DACC_MISS

IU_IN_TRAP

SBDATA_OUT[31:00]

FXACK

MM_IACC_ERR_PAR

PIN_SBDATA[31:0]

MM_IACC_ERR_TLB

SBDATA_IN[31:00]

MM_IACC_EXC

MM_BP_DTCT

SB_DATA_OEN

DC_NFILLP[3:2]

SB_BG_L[4:0]

DC_NFILLP[3:2]

IU_HELP_W

PIN_SB_BG_L[4:0]

DC_SEL_FILLP

IU_FETCH_F

PIN_SB_AS_L

HLD_DIRREG

MM_IACC_MMU_MISS

SBADDR[27:0]

MM_DACC_MMU_MISS

PIN_SBADDR[27:0]

DC_SEL_FILLP

DC_SEL_LDD_REG

PIN_SLVSEL_L[4:0]

MM_DACC_EXC

IC_NFILLP[4:2]

IU_EVENT

SLVSEL_L[4:0]

MM_DACC_ERR

MIU_SCAN_OUT AS_L

SB_LERR_L

SS_RESET

CPEND

PIN_SB_LERR_L

PFCC[1:0]

IO_LOCK

SB_BR_L[4:0]

ACK_EN

PIN_SB_BR_L[4:0]

SS_CLOCK[8:12]

JTAG_TDO

PFCCV

JTAG_TDO_OEN

SS_SCAN_MODE

SIZ_RD_EN

PIN_JTAG_TDO

SS_CLOCK_EARLY

RD_OUT

INT_EVENT_L

SS_CLOCK23

LD_VA_L

PIN_INT_EVENT_L

MIU_SCAN_IN

CG_L

EXT_EVENT_L

FP_STORE_DATA[31:0]

SBC_SCAN_OUT

PIN_EXT_EVENT_L

FEXC

D-CACHE CR_L

User inputs

Package information

Die information

MSDA Early analysis tool

Maximum die size Constraint Generator

Design

Design Capture

Exhaustive Partition generator Candidate partitions # of Power & Ground I/O Estimation

Size Estimation

Power Estimation

I/O Area

Constraints Satisfied

no

yes System Performance calculation

Size,Cost,Thermal,Pkg.delay, Power,SSN

All partitions generated

no

yes

End

Figure 3: Block Diagram of Exhaustive Partitioning. For this application, the die cost in the wire-bond case dominates the substrate and assembly costs. Therefore, the die set which o ers the lowest system cost is the same for wire-bond design using any of the three substrates. However, this is not true for the ipchip designs since the their die costs are comparable to the substrate and assembly costs. Figure 4(b) shows the size of the partitions which o er the lowest system cost. The ip-chip design using MCM-D exhibits the smallest module size. This is due to the combination of the reduction in die area because of area-array bonding and the reduction in substrate size with the use of MCM-D interconnect. A measure of the simultaneous switching noise analysis is shown in Figure 4(c). The noise data shown corresponds to the partitions having the lowest system cost. The ip-chip designs have lower inductance and hence provide lower switching noise.

Figure 4(d) shows the total power dissipation of the modules which have the lowest system cost. The ipchip/MCM-D designs have higher power dissipation compared to the wire-bond designs since the ipchip designs have more I/Os. The ip-chip/MCMC has the worst power dissipation due to the higher interconnect capacitance of the substrate. In this particular application, the power dissipation increases with the increase in number of chips due to the increase in the number of outputs in the die-set. The results from the thermal analysis is shown in Figure 4(e). The worst-case external thermal resistance of the die in the partition is heavily dependent upon the total power dissipation in the module. Higher values of external thermal resistance indicate less power dissipation inside the MCM. Thus, the external thermal resistance decreases with the increase in the number of chips in the partition. There are some versions of ip-chip/MCM-D where special process techniques (e.g. potting and lapping the completed assembly) result in better external thermal resistance characteristics. Figure 4(f) shows a gure-of-merit for packaging delay of these MCM systems. The delay was computed for a length equal to the diagonal length of the module. The interconnect line was modeled as either lumped RLC or a transmission-line based on their lengths. Each line was terminated and a total of eight receivers were assumed for each driver. The delay calculations include time-of- ight, RC charging and re ections and, therefore, are a function of the dielectric constant and size of the MCM module. For the monolithic case, the delay was calculated for an interconnect signal line within the die with a length equal to the diagonal length of the die.

5 Summary and Conclusions Each type of MCM technology has a di erent cost/performance characteristic. It is important to evaluate these technologies for the speci c application in hand for the best price/performance. Evaluation and selection of these technologies should not be solely based on the physical and electrical characteristics of the technology itself but should be based on price/performance of the entire system by considering the interdependency of MCM technologies and partitioning at the system level. The performance parameters of cost, size, power, thermal, simultaneous switching noise and package delay for the six di erent packaging alternatives are shown in Table 2. The candidate ranking was arrived by considering an overall gure of merit of the various

Monolithic System Cost ($) System Size in2 Module Power (W) Ext. Therm. Res. (degC/W) SSN Pkg. Delay (ns) Ranking

400.05 0.3488 4.9 12.69 124 0.7918

WB WB WB FC FC FC MCM-L MCM-C MCM-D MCM-L MCM-C MCM-D 330.70 365.17 364.94 147.46 66.18 57.45 1.34 1.34 1.34 0.9 0.91 0.6 5.0579 5.1162 5.0388 5.1946 5.7227 5.6963 11.45 11.05 11.59 10.52 8.06 9.63 410 1.3229 4

494.17 2.1792 6

476.03 1.3806 5

6.85 1.2134 2

8.07 1.9289 3

7.77 1.1459 1

Table 2: Comparison of System Parameters for Bonding and Substrate Technologies. Chip 1 2 3

Pins Area (mm2 ) 485 49.428059 298 45.510590 414 28.451555

Modules D-CACHE, I-CACHE, MMU FU, IU MEM-INTF, SBC, CLK-CTL, MISC.

Table 3: Contents of the Best Overall Partition. system performance parameters. The best partition, consisting of three dies, is shown in Table 3.

For this particular application, the results indicate that the overall system cost would be reduced by a factor of seven if the single-chip CPU were divided into three chips, bonded using ip-chip technology and interconnected on an MCM-D substrate. To date, the functionality of the partitions has not been considered by the partitioning tool. There is still a need for an experienced system architect designer to compare the results for the best design architecture. We plan to analyze the cost/performance of the di erent cache sizes added to the design and perform the detailed analysis of the above candidate designs to verify the validity of the model used in the analysis. The methodology of partitioning with DFP in mind is applied here to a design described in the functional unit level. We plan to extend this concept to designs described at the behavioral and structural (RTL) levels as well.

References [1] Kuk, D., \Examining the Impact of DFM on Product Development," Electronic Packaging and Production,Vol. 33, No. 5., pp. 36-4 to 36-7, May 1993. [2] Dehkordi, P. and D. Bouldin, \ Design for Packageability: Early Consideration of Packaging

from a VLSI Designer's Viewpoint," Computer, vol. 1, pp. 76-81, April 1993. [3] Dehkordi, P. and D. Bouldin, \Design for Packageability: The Impact of Bonding Technology on the Size and Layout of VLSI Dies," Proc. IEEE Multichip Module Conf., pp 153-159, March 1993. [4] Sandborn, P., Abadir, M. and C. Murphy, \The Tradeo Between Peripheral and Area Array Bonding of Components in Multichip Modules," IEEE Trans. on Components, Packaging and Mfg. Tech. - Part A, vol. 17, no.2, pp. 249-256, June

1994. [5] Shih, M., Kuh, E. and R. Tsay, \Performance Driven System Partitioning on Multichip Modules," Proc. 29th Design Automation Conference, pp 53-56, June 1992. [6] Vemuri, Ram., Kumar, N. and Ranga Vemuri, \Two Randomized Algorithms for Mulichip Partitioning Under Multiple Constraints," Tech. Report TM-ECE-DDE-94-36, Univ. of Cincinnati, 1994. [7] Even, S. Algorithmic Combinatorics, The Macmillan Company, 1973.

700