The impact of bonding technologies such as area array Flip-Chip, FC and peripheral Wire-Bond,. WB on the die size and layout of VLSI dies is discussed in 3 .
Impact of Packaging Technology on System Partitioning: A Case Study Peyman Dehkordi, Karthi Ramamurthi, Don Bouldiny Howard Davidsonz and Peter Sandbornx Abstract This paper emphasizes concurrent consideration of the partitioning of a microelectronic circuit design into multiple dies and the selection of the appropriate packaging technology for implementation of the entire system. Partitioning a large design into a multichip package is a non-trivial task. Similarly, selection of the MCM packaging technology to accommodate a multichip solution can also be puzzling. The interdependencies of these two problems aord the opportunity to achieve a global optimum when considered concurrently. In this paper we address the partitioning/MCM technology tradeo, their interdependency, and previous work in this area. The SUN MicroSparc CPU is used as a demonstration vehicle and is partitioned for dierent MCM technologies. The preliminary results show that the optimum number of partitions and contents of each partition depend heavily on the choice of MCM technologies for a given application.
1 Introduction Multi-chip modules have been gaining popularity and becoming more available during the past few years. The designer is now faced with a variety of MCM packaging technologies and has to understand and compare them for a given application. Conventionally, this has been performed by the package designer mainly toward the end of the design cycle. However, to achieve a more nearly optimum system, packaging-related issues should be considered throughout the design cycle by system, IC, and This research was sponsored in part by the Advanced Research Projects Agency under ARPA order A879 and monitored by the Army Research Oce under grant DAAH0494-G-0004 to the University of Tennessee. y University of Tennessee, Knoxville, TN z Sun Microsystems, Mountain View, CA x MCC, Austin, TX
package designers. About 40% of the the product cost is determined by the decisions made in the rst 10% of the design cycle [1]. This suggests that the choice of packaging should be explored at the early stages of the design for a more globally optimum system. Considering the critical packaging issues early in the design cycle is termed \Design for Packageability" (DFP) and is discussed in [2]. Comparisons between dierent MCM technologies cannot be made by just considering the physical and electrical parameters of the technologies. True comparisons should be made at the system level by understanding the impact of the dierent MCM technologies on the overall cost/system performance of the nal system.
2 Problem De nition and Motivation As a part of our research, we are trying to identify the stages in the design cycle which will bene t the most from taking advantage of the new solutions oered by MCM packaging. Figure 1 shows how partitioning drives all the system performance parameters such as system cost, size, thermal, power, packaging delay and simultaneous switching noise. Chip bonding and substrate technologies determine the physical constraints on the partitioning process. It can be seen that the choice of the packaging technology propagates through the partitioning process and then impacts system performance. Thus, there is a need to explore the eect of the various packaging technologies on system partitioning and hence system performance to achieve an optimum system. The main objective of our work is to study the interaction and impact of the various bonding and substrate technology alternatives on system partitioning and performance. Partitioning an ultra-large single die into multiple smaller dies housed on a MCM is used as an example; however, this approach can be applied to a larger class of applications using multiple levels of packaging hierarchy.
Delay
Substrate Size
Die Size Bonding Technology
Substrate Technology
Bonding Technology
System Power Dissipation
Packaging
System Size
Number of Dies
# I/O per Die
Number of Dies
Bonding Technology
Net Length
Logic Size
Substrate Technology
Clock Frequency
# I/O per Die
Substrate Technology
Logic
Bonding Technology User Defined Constraints
PARTITIONING
Substrate Technology Logic
Substrate Size
Substrate TestingCost
Substrate Cost
# I/O per Die
Die Size
Defect Density
Bonding Technology
System Power Dissipation
Thermal Budget
Substrate Technology
# I/O per Die
Bonding Technology
Substrate Technology Characeristic Impedance
Test Cost
Die Cost
Module Assembly & Testing
System Cost
Allowed External Thermal Resistance
Simultaneous Switching Noise
Figure 1: Interaction between System Performance Parameters and Partitioning using DFP. The impact of bonding technologies such as area array (Flip-Chip, FC) and peripheral (Wire-Bond, WB ) on the die size and layout of VLSI dies is discussed in [3]. That work was focused at the the physical die layout level rather than the system level. A more conceptual trade-o analysis between peripheral and area array bonding in MCMs is presented in [4]. The design is described in terms of gate counts used Rent's rule to establish the number of I/Os for a given circuit size. Our work tries to establish the interaction of the various packaging alternatives, partitioning and the system performance when more information is available about the design. The SUN MicroSparc is used as a demonstration vehicle to illustrate the extent of the interaction. The design is described at the functional unit level consisting of the following: integer unit (IU),
oating point unit (FU), memory management unit (MMU), data cache (D-CACHE), instruction cache (I-CACHE), S-bus controller (S-BUS-CTL), memory interface unit (MEM-INTF), clock control and buers (CLK-CTL), and miscellaneous control logic (MISC) as shown in Figure 2.
3 Approach The intent of this work is not to promote a new partitioning scheme since numerous techniques
have already been described [5],[6]. Our approach is to develop a framework in which various packaging/partitioning choices can be explored and evaluated concurrently. Since performing a detailed evaluation at the system level for various packaging/partitions can turn into a time-consuming task, we have employed an estimation-based, early-analysis technique. At this level there are only nine functional units so the number of possible candidates is low enough (approx. 21,000) that it is possible to search the solution space exhaustively. This guarantees the best solution will be found. At the register transfer level where there are a large number of components, it is more appropriate to use partitioning techniques based on algorithms such as simulated annealing which can guarantee a good solution among several possible solutions. The steps involved in our concurrent analysis are shown in Figure 3. The user speci es the package and die related information (i.e. bonding technology, maximum die size). The constraint generator derives the actual constraints from these user speci cations. The exhaustive partitioner then generates all possible candidate partitions. The algorithm used to generate partitions is described in [7]. The die size and die I/O estimates are then calculated for each of the partitions. Next the partitions are veri ed against the constraints. These constraints are used to qualify a partition for further processing. Further
IC_SCAN_IN
IC_SCAN_OUT
IT_SCAN_OUT
IC_TG_STROBE
IT_SCAN_IN
IT_VAL_F
IT_TG_STROBE
IC_IT_CLR
IC_IBUS[31:0]
IC_TAGV_WE
IC_TAGV_DIN
IC_TAG_WE
IC_TAG_DIN[12:0]
IC_BE
IC_WE
IC_ADR_IN[11:2]
B_MEMPAR_OUT[1:0] B_MEMDATA_OUT[63:00] B_MEM_OEN
SP_SEL R_ISSUE_REQ
PIN_SP_SEL PIN_R_ISSUE_REQ PIN_B_MEMPAR[1:0]
JTAG_TRST
SS_DI_VAL[1:2]
IT_DOUT_F[12:0]
DP_PERR[2:0]
DP_BUF0_EN
MC_MDATA1
MC_MDATA0
MC_CADDR_HLD
MC_MBSY
MC_MSTB_L
SS_SCAN_MODE
SS_DI_VAL
SS_CLOCK
SP_SEL
MM_ODDMPAR
SS_RESET
MC_MDATA_EN1
MC_MDATA_EN0
MM_DATA_VIEW
MM_RF_CNTL[1:0]
MM_MREQ[3:0]
MM_ISSUE_REQ
MM_PAGE
MM_CADDR[11:3]
MM_PA_B[2:0]
MM_PA_A[26:12]
DP_PERR[2:0]
MC_MSTB_L
MC_MBSY
SBCLK
IOSTBEN SS_SCAN_MODE
SB_PE RESET_NONWD
SB_WRT
STOP_EVEN
MC_MDATA_EN
RESET_ANY
SB_SIZ[1:0]
STOP
BR_L[4:0] START
LERR_L TESTCLKEN
ARBEN[4:0]
SB_IOREQ[3:0]
MC_CAS_L_BUS[1:0]
JTAG_TDO
RS_DSBL_CLOCKS_IN
TESTCLK
EXT_EVENT_L
IU_EVENT
MM_SBSIZE[1:0] MM_CPCYC_STB_L MM_CPSB_WRT
IU_ERROR
MM_HOLD_RST
MM_EVENT
DATAOUT[31:0] IRL[3:0] IC_SEL_DVA MM_SSCR_BA8[3:0]
MM_SBAE[4:0]
MM_SB_PE
INPUT_RESET_L
JTAG_TRST_L
IC_SEL_FILLP
EXT_FCCV R_ISSUE_REQ MM_SLOCK
MM_CADDR[11:3]
MM_ISSUE_REQ
MM_MREQ[3:0]
MM_ADDR_VIEW
SB_VA_VAL_L
MM_PA[30:0]
RL_MEMIF_SCAN_IN
MC_MDATA_IN[31:0]
MC_CAS_L[1:0] STOP
MC_RAS_L[3:0]
MC_MWE_L
DP_BEN SBUS_1ST_HALF_DLY
B_MEMDATA_OUT[63:0]
B_MEMPAR_OUT[1:0]
MC_MEMADDR[11:0]
RCC_RST_L
SBUS_1ST_HALF
STOPPED
B_MEMDATA_IN[63:0]
B_MEMPAR_IN[1:0]
PIN_B_MEMDATA[63:0]
MC_CAS_L[1:0] MC_RAS_L[3:0]
PIN_MC_CAS_L[1:0]
MC_RAS_L_BUS[3:0]
PIN_MC_RAS_L[3:0]
START
TC_SCAN_CLK
STOP_EVEN
JTAG_TRST_L
INPUT_RESET_L
INPUT_CLOCK
LOGIC_1
LOGIC_0
TESTCLKEN
SS_MISC_SCAN_OUT
ACK_IN_BUS[2:0]
BSR_TDO
JTAG_TDO_OEN_BUG1059
MM_IOSTBEN BSR_TDO
JTAG_CK
SS_MISC_SCAN_IN
RCC_CLK
SB_ERRSIZE[2:0]
SB_SBSLOT[2:0]
SB_IOA[31:0]
RS_DSBL_CLOCKS_IN
SS_CLOCK[1:2]
ACK_OUT_BUS[2:0]
AREA=6328580
MMU_17[2:0]
MMU_16[3:0]
MMU_15[2:0]
MMU_18[1:0] MMU_14[31:0]
SB_SIZE_IN_BUS[2:0]
IU_DVA_E[10:4]
DT_DOUT_W[14:0]
DT_VAL_W
DT_TG_STROBE
DC_TG_STROBE
DT_SCAN_IN
DT_SCAN_OUT
DC_SCAN_IN
DC_SCAN_OUT
DC_SCAN_MODE
DC_BE
DC_DT_WE
DC_DTV_WE
DC_DT_CLR
DC_DTV_DIN
DC_ADR_IN[3:2]
DC_DATA_IN[31:0]
DC_WE[0:3]
DC_DT_DIN[14:0]
AREA=1019523.7 GATES=1357 TRANSISTORS=5431 PRIMARY_OUT=3 PRIMARY_IN=6
SB_SIZE_OUT_BUS[2:0]
PRIMARY_OUT=80 PRIMARY_IN=6
SBADDR_BUS[27:00]
SB_BG_L_BUS[4:0]
SBDATA_IN_BUS[31:00]
SBDATA_OUT_BUS[31:00]
SB_SIZE_OUT_BUS[2:0]
MC_MDATA[31:0]
IOREQ[3:0]
SB_IOA[31:0]
ERR_CRD_L
SSIZ[2:0]
SEL_L[4:0]
BG_L[4:0]
ERR_TYPE[1:0]
LOGIC_1
TG_STRB
MM_IODATEN
MM_SBDAT_RDY
JTAG_TDI
JTAG_MS
SBUS_1ST_HALF_DLY
SBUS_1ST_HALF
RCC_RST_L
STOPPED
MM_MDATA_VIEW
SB_CYC_PND
MM_ODDPAR MM_PAGE
SB_SLOCK
RL_MEMIF_SCAN_OUT
IC_MDATA_BUF[31:0]
B_MEMPAR_IN[1:0] B_MEMDATA_IN[63:00]
PIN_MC_MEMADDR[11:0]
MC_MEMADDR[11:0]
MC_MWE_L
MC_MWE_L
AREA=7177860 GATES TRANSISTORS
SB_ACK_OEN
SB_ACK_L_OUT[2:0]
PIN_SB_CR_L SB_CR_L
EXT_FEXC
EXT_FHOLD CLKBUF_1[2:2]
EXT_VALID_DECODE SS_RESET
SS_CLOCK[1:6]
Figure 2: Functional-Level Diagram of the MicroSparc.
B_MEMDATA_OUT_BUS[63:00]
PIN_JTAG_TDI
JTAG_TDI
JTAG_BCTL[4:0]
JTAG_MS
IU_ERROR_L
PIN_JTAG_MS
INT_EVENT_L
MISC.
CSTB_L
JTAG_CK
CPA[30:0]
PIN_JTAG_CK
PA_VAL
CP_STAT_L[1:0]
LOGIC_0
RD_IN
PIN_CP_STAT_L[1:0]
CPU_LOCK
CP_STAT_L_BUS[1:0]
ACK_IN[2:0]
REF_CLK
SB_SIZE_IN_BUS[2:0] SIZ_IN[2:0]
D_IN[31:0]
PIN_REF_CLK
MC_MDATA_IN[31:0]
PIN_SBCLK
D_VIEW
INPUT_RESET_L
A_VIEW
PIN_INPUT_RESET_L
SS_DI_VAL[1:2]
DC_DBUS_A[31:0]
DC_DBUS_B[31:0]
SB_SLOT[2:0]
S-BUS-CTL
SEND_DAT
INPUT_CLOCK
CLKBUF_2[6:6]
MMU_23[3:0]
SBC_SCAN_IN
PIN_INPUT_CLOCK
MM_PA_VALID
CP_STAT_L_BUS[1:1]
SB_BR_L_BUS[4:0]
MMU_25[1:0]
MMU_24[4:0]
MMU_19[30:0]
MMU_3[10:4]
MMU_29[14:0]
B8[3:0]
REF_CLK
DC_LAST_DMHOLD
DC_DOUT_DONE
DC_DATA_AVAIL
DC_2ND_DATA_AVAIL
DC_DTV_DIN
IC_IT_CLR
DC_DT_CLR
DC_DTV_WE
DC_DT_WE
DC_BE
DC_WE[0:3]
DC_DATA_IN[31:0]
IC_SUSTAIN_IMHOLD
MM_START_IMHOLD
DC_SUSTAIN_DMHOLD
MM_START_DMHOLD
MM_DBUS_VIEW
MM_TG_STROBE
MC_TLB_SCAN_IN
SS_SCAN_MODE
SS_CLOCK[1:21]
MM_RFR_CNTL[1:0]
CP_STAT_L_BUS[0:0]
SS_RESET SS_RESET_ANY SS_SCAN_MODE
MMU_28[17:19]
PHASE_0
AREA=342037.24 PRIMARY_IN=1 PRIMARY_OUT=2
IU_IFLUSH_W
MMU_28[1:21]
IU_IN_TRAP MC_CADDR_HLD SS_CLOCK[1:3]
MMU_27[31:0]
MMU_26[0:3]
DI_VAL
DI_VAL[1:6]
IC_LAST_IMHOLD
IC_INSTR_AVAIL
IC_BE
IC_TAGV_WE
IC_TAGV_DIN
IC_TAG_WE
MM_TAG_DIN[26:11]
IC_WE
FP_FPC_SCAN_IN
MMU
DT_VAL_W IU_FETCH_F
MMU_30[31:0]
CLKBUF_2[1:2]
MMU_28[15:16] MMU_12[26:12]
SS_CLOCK
EARLY_CLK[1:2]
IC_IMHOLD_D
DC_DBUS[31:0] DT_DOUT_W[14:0]
MIU_1[31:0]
SS_RESET
PIN_JTAG_TRST
SS_CLOCK[1:2]
SBCLK
SB_ERR_TYPE[1:0]
PA[27:0]
RCC_CLK
SB_WRITE_ERROR
IU_WRITE_E
ACK_OUT[2:0]
$0.01 per I/O
IC_SCAN_MODE
DC_SHOLD
FP_ROM_SCAN_OUT
IU_ANYSTORE_IN_E
AREA=21779540 GATES TRANSISTORS PRIMARY_OUT=2
AREA=19509993 GATES TRANSISTORS PRIMARY_IN=4
CLKBUF_1[1:1]
FPUFPC_3[31:0]
D_OUT[31:0]
CLK-CTL
IU_PIPE_HOLD
MM_HOLD_RST
IRL_BUS[3:0]
IRL[3:0]
AREA=18015704 GATES TRANSISTOR
D_EN
MC_MEMADDR_BUS[11:0]
IU_SIZE_E[1:0]
MC_MDATA[31:0]
IU_STD_IN_E
AREA=4902380 GATES TRANSISTORS PRIMARY_OUT=85 PRIMARY_IN=1
IU_LDSTO_E
M_MMU_CACHE_SCAN_OUT
IU_HELP_W
MEM-INTF
IU_PIPE_HOLD_FAST
MM_LVL15_INT_L
IU_READ_W
B_MEMDATA_IN_BUS[63:00]
DP_BUF0_EN
IU_BYTEMARKS[0:3]
IU_IVA_G[31:2]
6 inches 0.4 inches 3 defects per sq.inch $800 $200 0.2 defects per inch
MC_MDATA_IN[31:0]
MMU_13[2:0]
IU_SUP_INST_G
IU_DBUS[31:0]
IU_TRAPD_IAER
MMU_22[1:0] MMU_21[3:0] MMU_20[11:3]
IU_ASI_E[5:0]
IC_IBUS[31:0]
IU_DVA_E[31:0]
B_MEMPAR_OUT_BUS[1:0]
DOUT[31:0]
IU_READ_E
B_MEMPAR_IN_BUS[1:0]
D_CACHE_ADR[31:0]
MM_DACC_ERR
250( microns)
I_CACHE_ADR[31:2]
ICACHE_DATA[31:0]
MM_MDATA_EN0
200 ( microns)
DATA_IN[31:0] IC_INSTR_AVAIL
DCACHE_DATA[31:0]
MM_MDATA_EN1
MMU_12[26:11]
IN_LINE
MMU_7[31:0]
IU_IFLUSH_W
I-CACHE
IU_SUP_INST_G
IT_DOUT_F[12:0]
MM_IACC_ERR_PAR
IT_VAL_F
EXT_HOLD
IU
MM_IACC_ERR_TLB
MMU_5[5:0]
SS_SCAN_MODE
MMU_2[3:2]
MM_IACC_EXC
AREA=9719830
MMU_28[13:14]
MM_START_IMHOLD
MIU_2[3:2]
EXT_FXACK
MMU_1[4:2]
DC_DAT_AVL TRAP
200 * 200 (microns) 125 * 125 (microns)
MMU_8[12:0]
HLD_DIRREG IU_READ_W
MMU_12[26:14]
MMU_4[11:2]
STD_IN_E
MMU_3[31:0]
IU_WRITE_E
MMU_7[31:0]
ANYSTORE_IN_E
MMU_6[1:0]
IU_NUM_DACC_EXC_R
ER_SDOUT
MMU_11[0:3]
IU_BYTE_MARKS_E[0:3]
MMU_4[31:2]
IC_SUSTAIN_IMHOLD
MMU_10[31:0]
IU_READ_E
6
Value 4
IC_IBUS[31:0]
MMU_9[31:0]
MMU_28[8:12]
PIN_IRL[3:0]
SIZ_OUT[2:0]
Table 1: Die Assumptions Provided by the User. The result of the cost analysis is shown in Figure 4(a) which displays those partitions satisfying the given constraints with the lowest system cost for a particular number of dies in the die set. The
Property Signal/Ground (peripheral) Signal/Ground (area array) Bond pad size (peripheral) Bond pad size ( area array) Min Bond Pad pitch (peripheral) Min Bond Pad pitch (area array) Wafer Diameter Unusable Wafer Border Wafer defect density Processed Wafer cost Wafer Bumping cost Defects due to Wafer Bumping Die Test cost
DPC[31:2] FPUFPC_1[31:2]
EXT_FCC[1:0] FPUFPC_2[1:0]
FU
IU_LDSTO_E
MMU_19[26:12]
We have concurrently considered the following: a) Wire-Bond/MCM-C b) Wire-Bond/MCM-D c) Wire-Bond/MCM-L d) Flip-Chip/MCM-C e) Flip-Chip/MCM-D f) Flip-Chip/MCM-L The exhaustive partitioner has generated over 21,000 partitions for each type of packaging but only those partitions that meet the die and package constraints have been considered for analysis. The die parameters provided by the user are given in Table 1.
PIN_SB_SIZE[2:0]
4 Results
SB_AS_L
FC/MCM-D design oers the lowest overall system cost for implementing this particular application in a MCM. The system cost is comprised of the die, bonding, substrate and assembly cost. The substrate and assembly cost estimates used in this analysis is discussed in [4]. The ip chip design oers higher I/O count and takes full advantage of the higher interconnect density of the MCM-D. The combination of these two choices reduces the die area (and hence the die cost) considerably as compared to the conventional peripheral wire-bond design. It should be noted that FC/MCM-D is not highly sensitive to the number of chips in the partition. The multichip design implemented in WB/MCM-C and WB/MCM-D exhibit the highest overall system cost. The lower I/O count oered by the peripheral wire-bond design results in larger die area which results in reduced yield and higher die cost.
SLVSEL_L_BUS[4:0]
processing involves estimating the system performance characteristics such as system cost, system size, module power, allowed external thermal resistance and total simultaneous switching noise in the module. The MSDA (Multichip System Design Advisor) tool developed by MCC is used to estimate the system performance characteristics.
CLKBUF_2[3:4]
MMU_28[7:7]
DC_2ND_DAT_AVL IU_TRAPD_IAER
MMU_7[31:0]
EXT_FLUSH IU_ASI_E[5:0]
CLKBUF_1[1:2]
IU_SIZE_E[1:0]
MMU_19[2:0]
IU_DCA_LOW[3:2]
MMU_28[20:20]
LAST_IMHOLD
IU_PIPE_HOLD
VALID_DECODE_FPU
CLKBUF_2[5:5]
IMHOLD
MMU_7[1:1]
SS_CLOCK_EARLY MM_START_DMHOLD
MMU_7[0:0]
MMU_28[1:6] FHOLD
IU_MM_DACC_EXC_R
CLKBUF_2[1:6]
LAST_DMHOLD
MMU_28[21:21]
D_DATA_AVAIL
SB_CG_L
DC_SUSTAIN_DMHOLD
PIN_SB_CG_L
DC_SEL_LDD_REG
SB_ACK_L_IN[2:0]
DC_SHOLD
PIN_SB_ACK_L[2:0]
DC_DOUT_DONE
SB_READ_OUT
DC_LDD_STB_L
PIN_SB_READ
IC_NFILLP[4:2]
SB_READ_IN
IC_SEL_DVA
MM_DACC_EXC
SB_SIZE_IN[02:00]
MM_IACC_MISS
SB_SIZE_OUT[02:00]
IC_SEL_FILLP
DPC[31:2]
SB_SIZE_OEN
MM_DACC_MISS
IU_IN_TRAP
SBDATA_OUT[31:00]
FXACK
MM_IACC_ERR_PAR
PIN_SBDATA[31:0]
MM_IACC_ERR_TLB
SBDATA_IN[31:00]
MM_IACC_EXC
MM_BP_DTCT
SB_DATA_OEN
DC_NFILLP[3:2]
SB_BG_L[4:0]
DC_NFILLP[3:2]
IU_HELP_W
PIN_SB_BG_L[4:0]
DC_SEL_FILLP
IU_FETCH_F
PIN_SB_AS_L
HLD_DIRREG
MM_IACC_MMU_MISS
SBADDR[27:0]
MM_DACC_MMU_MISS
PIN_SBADDR[27:0]
DC_SEL_FILLP
DC_SEL_LDD_REG
PIN_SLVSEL_L[4:0]
MM_DACC_EXC
IC_NFILLP[4:2]
IU_EVENT
SLVSEL_L[4:0]
MM_DACC_ERR
MIU_SCAN_OUT AS_L
SB_LERR_L
SS_RESET
CPEND
PIN_SB_LERR_L
PFCC[1:0]
IO_LOCK
SB_BR_L[4:0]
ACK_EN
PIN_SB_BR_L[4:0]
SS_CLOCK[8:12]
JTAG_TDO
PFCCV
JTAG_TDO_OEN
SS_SCAN_MODE
SIZ_RD_EN
PIN_JTAG_TDO
SS_CLOCK_EARLY
RD_OUT
INT_EVENT_L
SS_CLOCK23
LD_VA_L
PIN_INT_EVENT_L
MIU_SCAN_IN
CG_L
EXT_EVENT_L
FP_STORE_DATA[31:0]
SBC_SCAN_OUT
PIN_EXT_EVENT_L
FEXC
D-CACHE CR_L
User inputs
Package information
Die information
MSDA Early analysis tool
Maximum die size Constraint Generator
Design
Design Capture
Exhaustive Partition generator Candidate partitions # of Power & Ground I/O Estimation
Size Estimation
Power Estimation
I/O Area
Constraints Satisfied
no
yes System Performance calculation
Size,Cost,Thermal,Pkg.delay, Power,SSN
All partitions generated
no
yes
End
Figure 3: Block Diagram of Exhaustive Partitioning. For this application, the die cost in the wire-bond case dominates the substrate and assembly costs. Therefore, the die set which oers the lowest system cost is the same for wire-bond design using any of the three substrates. However, this is not true for the ipchip designs since the their die costs are comparable to the substrate and assembly costs. Figure 4(b) shows the size of the partitions which oer the lowest system cost. The ip-chip design using MCM-D exhibits the smallest module size. This is due to the combination of the reduction in die area because of area-array bonding and the reduction in substrate size with the use of MCM-D interconnect. A measure of the simultaneous switching noise analysis is shown in Figure 4(c). The noise data shown corresponds to the partitions having the lowest system cost. The ip-chip designs have lower inductance and hence provide lower switching noise.
Figure 4(d) shows the total power dissipation of the modules which have the lowest system cost. The ipchip/MCM-D designs have higher power dissipation compared to the wire-bond designs since the ipchip designs have more I/Os. The ip-chip/MCMC has the worst power dissipation due to the higher interconnect capacitance of the substrate. In this particular application, the power dissipation increases with the increase in number of chips due to the increase in the number of outputs in the die-set. The results from the thermal analysis is shown in Figure 4(e). The worst-case external thermal resistance of the die in the partition is heavily dependent upon the total power dissipation in the module. Higher values of external thermal resistance indicate less power dissipation inside the MCM. Thus, the external thermal resistance decreases with the increase in the number of chips in the partition. There are some versions of ip-chip/MCM-D where special process techniques (e.g. potting and lapping the completed assembly) result in better external thermal resistance characteristics. Figure 4(f) shows a gure-of-merit for packaging delay of these MCM systems. The delay was computed for a length equal to the diagonal length of the module. The interconnect line was modeled as either lumped RLC or a transmission-line based on their lengths. Each line was terminated and a total of eight receivers were assumed for each driver. The delay calculations include time-of- ight, RC charging and re ections and, therefore, are a function of the dielectric constant and size of the MCM module. For the monolithic case, the delay was calculated for an interconnect signal line within the die with a length equal to the diagonal length of the die.
5 Summary and Conclusions Each type of MCM technology has a dierent cost/performance characteristic. It is important to evaluate these technologies for the speci c application in hand for the best price/performance. Evaluation and selection of these technologies should not be solely based on the physical and electrical characteristics of the technology itself but should be based on price/performance of the entire system by considering the interdependency of MCM technologies and partitioning at the system level. The performance parameters of cost, size, power, thermal, simultaneous switching noise and package delay for the six dierent packaging alternatives are shown in Table 2. The candidate ranking was arrived by considering an overall gure of merit of the various
Monolithic System Cost ($) System Size in2 Module Power (W) Ext. Therm. Res. (degC/W) SSN Pkg. Delay (ns) Ranking
400.05 0.3488 4.9 12.69 124 0.7918
WB WB WB FC FC FC MCM-L MCM-C MCM-D MCM-L MCM-C MCM-D 330.70 365.17 364.94 147.46 66.18 57.45 1.34 1.34 1.34 0.9 0.91 0.6 5.0579 5.1162 5.0388 5.1946 5.7227 5.6963 11.45 11.05 11.59 10.52 8.06 9.63 410 1.3229 4
494.17 2.1792 6
476.03 1.3806 5
6.85 1.2134 2
8.07 1.9289 3
7.77 1.1459 1
Table 2: Comparison of System Parameters for Bonding and Substrate Technologies. Chip 1 2 3
Pins Area (mm2 ) 485 49.428059 298 45.510590 414 28.451555
Modules D-CACHE, I-CACHE, MMU FU, IU MEM-INTF, SBC, CLK-CTL, MISC.
Table 3: Contents of the Best Overall Partition. system performance parameters. The best partition, consisting of three dies, is shown in Table 3.
For this particular application, the results indicate that the overall system cost would be reduced by a factor of seven if the single-chip CPU were divided into three chips, bonded using ip-chip technology and interconnected on an MCM-D substrate. To date, the functionality of the partitions has not been considered by the partitioning tool. There is still a need for an experienced system architect designer to compare the results for the best design architecture. We plan to analyze the cost/performance of the dierent cache sizes added to the design and perform the detailed analysis of the above candidate designs to verify the validity of the model used in the analysis. The methodology of partitioning with DFP in mind is applied here to a design described in the functional unit level. We plan to extend this concept to designs described at the behavioral and structural (RTL) levels as well.
References [1] Kuk, D., \Examining the Impact of DFM on Product Development," Electronic Packaging and Production,Vol. 33, No. 5., pp. 36-4 to 36-7, May 1993. [2] Dehkordi, P. and D. Bouldin, \ Design for Packageability: Early Consideration of Packaging
from a VLSI Designer's Viewpoint," Computer, vol. 1, pp. 76-81, April 1993. [3] Dehkordi, P. and D. Bouldin, \Design for Packageability: The Impact of Bonding Technology on the Size and Layout of VLSI Dies," Proc. IEEE Multichip Module Conf., pp 153-159, March 1993. [4] Sandborn, P., Abadir, M. and C. Murphy, \The Tradeo Between Peripheral and Area Array Bonding of Components in Multichip Modules," IEEE Trans. on Components, Packaging and Mfg. Tech. - Part A, vol. 17, no.2, pp. 249-256, June
1994. [5] Shih, M., Kuh, E. and R. Tsay, \Performance Driven System Partitioning on Multichip Modules," Proc. 29th Design Automation Conference, pp 53-56, June 1992. [6] Vemuri, Ram., Kumar, N. and Ranga Vemuri, \Two Randomized Algorithms for Mulichip Partitioning Under Multiple Constraints," Tech. Report TM-ECE-DDE-94-36, Univ. of Cincinnati, 1994. [7] Even, S. Algorithmic Combinatorics, The Macmillan Company, 1973.
700