Massively Parallel Processing for Fast and Accurate ... - Extras Springer

MASSIVELY PARALLEL PROCESSING FOR FAST AND ACCURATE STAMPING SIMULATIONS Jeffrey J. Gress, Siguang Xu, Ramesh Joshi, Chuan-tao Wang, Sabu Paul* General Motors Corp. Manufacturing Engineering Die Center, 2000 Centerpointe, Pontiac, MI 4834 * Electronic Data Systems, Manufacturing Engineering Die Center, 2000 Centerpointe, Pontiac MI 48341 Abstract. The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

achieving total manufacturability and quality demand predict and resolve the surface quality and spring-back originating not only from the draw die but from the line dies as well.

INTRODUCTION Beginning in the early 1990’s, GM has been committed to the application of stamping simulation to engineer formability and robustness into its stamping dies. In addition to the engineering benefits, this stamping technology has dramatically reduced the lead time required to engineer and try-out the dies..

This new target (i.e., line die forming formability and quality) is now the new benchmark at GM Die Center for further reduction in lead-time though MPP stamping technology.

In the early 1990’s, the longest lead time part was the body side outer draw die, which took literally several days for a single stamping simulation to complete. In 1997, GM Die Center began its foray into the parallel computing world in an attempt to reduce the analysis lead times of these largest parts to within a single day. Today, with a matured parallel stamping technology product, a draw analysis of a whole body side outer is accomplished well within a few hours.

A LOOK BACK… In the ‘old days’ of stamping simulation, the computational world at GM had transitioned from centralized computing (i.e. CRAY supercomputing) to distributed computing via networked UNIX workstations. At that time (1997), approximately 200 engineering and CAD workstations were available to be utilized in a moderately sized distributed computing

However, with ever shortening design cycles, the lead-time is further reduced and the new target for

CP778 Volume A, Numisheet 2005, edited by L. M. Smith, F. Pourboghrat, J.-W. Yoon, and T. B. Stoughton © 2005 American Institute of Physics 0-7354-0265-5/05/$22.50

152

environment. Hence a distributed memory processing (DMP) form of parallel computing made perfect sense from the structural cost point-of-view.

link is the hardware bus. In DMP, or Distributed Memory Processing, the communication link is the network consisting of network interface cards (NICs) , high-speed cabling, switches and gates

However, DMP computing in a general production environment uniquely presented it’s own set of problems. First of all, DMP computing environment resources are shared, and not dedicated. Secondly, networks are utilized for many different forms of communication, particularly served applications and served file systems. Finally, workstation resources are shared by multiple users, who use various applications that put heavy demands on memory and CPU resources.. Therefore the task of developing and implementing a DMP stamping simulation technology within a complex production computing environment at GM Manufacturing Engineering Die Center was quite daunting. To underscore this point, consider that nearly 5 years elapsed (1997-2002) before the technology was fully matured.

Whether DMP or MPP, the components of this stamping technology consist of a parallel equation solver, a message passing interface, a communication link, CPUs, and a job scheduler (See Figure 1) Each of these components are now discussed.

PARALLEL EQUATION SOLVER The Parallel Equation Solver is the heart of the parallel computing environment.. In a typical single processor (i.e. Sequential) explicit finite element application (e.g. PAMSTAMP V2000), Newton’s Second Law is applied in a forward stepping algorithm to analyze a single analysis domain. In the parallel computing environment, the equation solver is now parallelized to process the same analysis domain in a uniquely distinct way.

FAST FORWARD TO TODAY…

In the parallel computing environment, the analysis domain is sub-divided into smaller analysis domains, called sub-domains. The process of sub-dividing the analysis domain is called Domain Decomposition. Several domain decomposition strategies exist, the simplest one involves sub-divisions along a principal direction (See Figure 2). Slicing a loaf of bread is a good analogy to this approach.

Today, the computational world at GM Die Center is transitioning back to a centralized computing environment. With the advent of high end PCs costing nearly half as much as UNIX workstations, CAD UNIX workstations are rapidly being replaced across the corporation. In addition, with the centralization of all manufacturing engineering at a single site, and the infrastructure of terra-byte networks, all applications and heavy computing are now served from a single location. Therefore, with this changing computational landscape, DMP stamping technology is no longer viable nor as cost effective as MPP. And, with the availability of high-end multi-node UNIX processors, Massively Parallel Processing is now the ‘in’ technology.

The paradigm of the analysis domain/sub-domain relationship is a master–slave configuration (Figure 3) Each sub-domain is treated as a slave in which the forward stepping algorithm is applied. The master is the overriding process that manages the slaves, and controls the various parameters (e.g. time steps) that are used by the slaves. Communication between master and slave, and slave to slave is handled by the message passing interface (MPI) and inter-process control (IPC).

Today, the DMP stamping technology of yesterday running on a cluster of 200 UNIX workstations has been replaced by a centralized LAN of several 8nodes UNIX multi-processors running an MPP form of stamping simulation software.

The problems associated with a parallel equation solver involve information that must be shared among sub-domains. This includes state variables of subdomain boundaries, tool contact, and draw bead information. This information is processed through the message passing interface. The sequencing of messages that are passed, as well as the volume of the messages can greatly impact the performance of the parallel equation solver..

MPP OR DMP , - WHAT’S THE BUZZ? Two of the most used, abused (and confused!) acronyms in parallel computing are MPP and DMP. Between the two, the primary difference is the communication link between the processors. In MPP, or Massively Parallel Processing , the communication

153

While increasing the bandwidth can increase the potential speed and volume in processing communication messages, it can also have detrimental effects on latency. Going back to our straw analogy, consider replacing the straw with a drain pipe (of the same length) and again, the latency effect is apparent.

MESSAGE PASSING INTERFACE The message passing interface is the software layer of the communication process between master and slaves, and among slaves. In parallel computing, two competing message passing interfaces are commonly in use; PVM (Parallel Virtual Machine), and MPI (Message Passing Interface). PVM is a public-domain software product that is available free of charge, while MPI is now a standardized licensed product. In GM Die Center, PVM was used successfully in varying degrees in draw die simulations from 1997 to 2000. But bugs in the software combined with a lack of support ultimately led to its replacement with MPI in 2001. After overcoming some initial installation difficulties, MPI is now the only message passing interface in use at MFD-DES

So, in a DMP parallel environment, increasing the number of slaves to perform the computations will ultimately lead to a degradation in performance. This degradation typically varies with the size of the problem, - the larger the size of problem, the larger the number of slaves that are required before degradation sets in. By comparison, the hardware bus in an MPP environment is approximately a thousand times faster than the communication speed of a typical 100 basedT (i.e. 100 Mb/sec) network. As a result, the hardware bus has virtually no latency to impact the performance of the parallel equation solver.

COMMUNICATION LINK While the parallel equation solver is considered as the heart of the parallel computing environment, the communication link is surely its backbone. And, as in the case of DMP, this backbone sets the upper bound limit of performance.

CENTRAL PROCESSING UNITS(CPU) CPU’s are the cogs that keep the whole system turning. They are the worker bees of the hive. Whether DMP or MPP, cpus are what makes parallel computing what it is today.

When DMP was the primary parallel environment in use at DES from 1997 to 2003, the network was the communication link in use by all applications. Like any typical engineering computing environment at the time, all applications (i.e. UG, PAMSTAMP, NETSCAPE, etc) were served by a dedicated on-site application server. In addition, NFS auto-mounted UNIX file systems were also served by a dedicated onsite NFS server. Messages passed by a parallel equation solver application would compete daily with this network traffic.

In 1997, we started with a mix of HP and SUN workstations running HP-UX and SUNOS respectively. At that time processor speeds were at 300 MHz and memory was at 0.5 GB per processor. By the end of 2001, we now had a DMP environment of approximately 200 SUN BLADE 1000’s(dual processors) rated at 900 MHz per processor and 2 GB of memory per processor. Today, one can have a centralized LAN of IBM P655 8-node multi-processors rated at 1.7 GHz with 16GB of memory (2Gb per processor). How quickly technology changes the computational landscape!

The limiting aspect of the network is its inherent latency in communicating messages. The latency effect is exacerbated by the amount (size) of the network that is used in the message passing. To understand this phenomena, consider the analogy of using a straw to drink water from a cup. The longer the straw, the harder it is to initially draw the water from the cup. This the latency effect, and is the overhead in using this apparatus to perform this function. Now consider the effect of replacing the straw with a garden hose to do the same thing, and you can immediately understand how latency can actually overcome the ability to pass information in an expansive network.

JOB SCHEDULER Parallel stamping simulations can eat up gobs of processors and processor memory. A parallel stamping simulation can use as little as 2 processors, or as many as 8 . Keeping track of all these resources around the clock can be mind-boggling.

154

elapsed time of the benchmark to the elapsed time of the MPP application. For both DMP and MPP

In 1997, when the computational paradigm shifted from centralized to distributed, GM Die Center invested in a job scheduler called LSF (Load Sharing

TABLE 1. MPI-PAMSTAMP PERFORMANCE SPEED-UP FACTORS FOR TYPICAL DRAW SIMULATIONS STAMPING NUMBER OF ELAPSED TIME SPEED-UP FACTOR SIMULATION PROCESSORS (HOURS) FENDER OUTER 5 1.77 3.25 7 1.35 4.29 9 1.28 4.53 11 1.22 4.73 BODY SIDE OUTER

5 7 9 10

10.56 7.99 6.02 5.63

3.46 4.57 6.06 6.48

DOOR FRAME

5

10.43 7.40 8.04

2.75 3.88 3.57

applications, the speed-up factor varies with the number of processors used to evaluate the parallel application. Speed-up factors obtained for simulations of larger draw dies at GM Die Center using MPI-PAMSTAMP are summarized in Table 1 above.

Facility ) by Platform Computing. This software is used to create arrangements of processors, called cluster. Each node (processor) in the cluster is a host, to which certain resources are attributed (such as cpu time, memory, etc.) This software technology is very efficient at handling the resources of hundreds of processors.

As is evident from the Elapsed Time column in this table, our largest draw panel is indeed the body side outer draw panel. With the matured DMP stamping simulation technology, our goal of completing a stamping simulation in under 8 hours was achieved as early as 2001 (See Figure 4).

As computation landscapes change as they have, LSF accommodates the changes transparent to the end user. Today, LSF is used to manage multiple clusters of processors, - the off-site multiple cpu MPP HPC LAN, and the on-site DMP network of UNIX workstations.

Today, a body side outer draw simulation is now completed in several hours. A fender line die simulation (See Figure 5) which used to take on order of weeks to complete, now can be achieved in several days.

PERFORMANCE Performance of this type of technology is measured in terms of accuracy and completion time, relative to a benchmark. When DMP parallel computing had matured in 2001-2002, extensive benchmarking was performed to verify the robustness of the DMP installation. For this benchmark, the sequential PAMSTAMP V2000 version was used.

SUMMARY Parallel computing in now at the fore-front of all line die stamping simulations at GM. Through a thorough and comprehensive approach, GM Die Center has successfully advanced the state of the art of parallel computing through its experience gained in the development of DMP and MPP parallel computing environments for stamping simulations. DMP stamping simulations with MPI-PAMSTAMP show the benefits of this technology through a three-to sixfold reduction tin simulation time. Today GM

To assess accuracy, comparisons of strains, thinning, and draw-in measurements were made between MPI-PAMSTAMP and PAMSTAMP V2000. Comparisons of these results showed good agreement between these two software products. To measure the performance in terms of reduced completion times for a DMP application, the speed-up factor ratio is used. The speed-up factor is a ratio of

155

continues to push the envelope forward to apply the MPP form of this technology to shorten the lead time

for line die engineering and validations.

Figure 1. Hardware, and software layers for DMP & MPP environments..

Figure 2. Domain decomposition of a typical stamping analysis domain (door frame).

156

FIGURE 3 Analysis sub-domains overlayed onto the processing layer for a door frame.

FIGURE 4: Body side outer speed-up curve

FIGURE 5: Line die forming simulations for a fender

157