How to Make LS-DYNA Run Faster

11 downloads 528 Views 206KB Size Report
How to Make LS-DYNA Run Faster. Guangye Li ([email protected]). Jeff Zais ([email protected]). IBM Deep Computing Team. 2. IBM Deep Computing ...
th

4 European LS-DYNA Users Conference

MPP / Linux Cluster / Hardware II

IBM Deep Computing Group

How to Make LS-DYNA Run Faster

Guangye Li ([email protected]) Jeff Zais ([email protected]) IBM Deep Computing Team

May, 2003 | IBM Deep Computing Group

© 2002 IBM Corporation

IBM Deep Computing Team – 2003 LS-DYNA Conference

Topics  pSeries POWER4 Performance Topics Recent SMP Optimization Effect of Parallel Repeatability Flag Effect of Parallel Force Assembly Version 970 tuning

 xSeries IA-32 Xeon Performance Topics Faster Processors Faster Frontside Bus Version 970 Tuning Interconnect Options

 Comparisons and Summary

2

LS-DYNA Conference | May 2003

© 2002 IBM Corporation

K – II - 03

MPP / Linux Cluster / Hardware II

th

4 European LS-DYNA Users Conference

IBM Deep Computing Team – 2003 LS-DYNA Conference

IBM pSeries Performance     

POWER4 and AIX product line Clusters of individual SMP nodes SP Switch 2 high performance interconnect Individual nodes range up to an SMP of 32 processors Entire product line in transition from POWER4 to POWER4+ processor  Interconnect Option: Gigabit Ethernet

3

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Recent Optimization of version 960 SMP LS-DYNA 10000 8000 Elapsed Time (sec)

6000 4000 2000 0 4-CPU

p690 – Dec 2002 para flag on repeatability flag on refined Neon-535k elements 4

K – II - 04

LS-DYNA Conference | May 2003

8-CPU Revision 1488

16-CPU

32-CPU

Revision 1647

© 2002 IBM Corporation

th

4 European LS-DYNA Users Conference

MPP / Linux Cluster / Hardware II

IBM Deep Computing Team – 2003 LS-DYNA Conference

Improved Performance from use of the PARA Flag 18000 16000 14000 12000 Elapsed 10000 Time 8000 (sec) 6000 4000 2000 0 2-CPU p690 – Dec 2002 repeatability flag on refined Neon-535k elements 5

4-CPU

8-CPU

para=0

16-CPU 32-CPU

para=1

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Effect of the Repeatability Flag 35000 30000 25000 Elapsed 20000 Time 15000 (sec) 10000 5000 0 1-CPU 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU p690 – Dec 2002 para flag on refined Neon-535k elements 6

LS-DYNA Conference | May 2003

repeatability on

repeatability off

© 2002 IBM Corporation

K – II - 05

th

MPP / Linux Cluster / Hardware II

4 European LS-DYNA Users Conference

IBM Deep Computing Team – 2003 LS-DYNA Conference

Recent MPI LS-DYNA Optimization 8000 7000 6000 5000 Elapsed Time 4000 (sec) 3000 2000 1000 0 4-CPU p655 – Jan 2003 version 970 revision 3535 refined Neon-535k elements 7

8-CPU before tuning

16-CPU

32-CPU

after tuning

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Comparison of v960 and v970 Performance 10000 8000 Elapsed 6000 Time 4000 (sec) 2000 0 4-CPU p655 – Jan 2003 v960 r1647 MPI LS-DYNA refined Neon-535k elements 8

K – II - 06

LS-DYNA Conference | May 2003

8-CPU

16-CPU

v970 r3535

32-CPU

v970 r3535 tuned

© 2002 IBM Corporation

th

4 European LS-DYNA Users Conference

MPP / Linux Cluster / Hardware II

IBM Deep Computing Team – 2003 LS-DYNA Conference

IBM xSeries Performance    

9

Linux clusters One or two processor nodes (Intel IA-32 Xeon) Interconnect Options: Gigabit Ethernet or Myrinet Several decisions regarding LS-DYNA (LAM/MPI, MPICH, …)

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Interconnect – Effect on Performance 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU

4-CPU

2.2 GHz IntelliStation Cluster June 2002 MPI LS-DYNA Fast Ethernet refined Neon-535k elements 10

LS-DYNA Conference | May 2003

8-CPU

16-CPU 32-CPU

Gigabit Ethernet

Myrinet

© 2002 IBM Corporation

K – II - 07

MPP / Linux Cluster / Hardware II

th

4 European LS-DYNA Users Conference

IBM Deep Computing Team – 2003 LS-DYNA Conference

Performance Improvement with Version 970 20000 15000 Elapsed Time 10000 (sec) 5000 0 2-CPU

2.8 GHz x335 Cluster Gigabit Ethernet March 2003 LAM/MPI LS-DYNA refined Neon-535k elements 11

4-CPU

8-CPU

version 960

16-CPU 32-CPU

version 970

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Performance Improvement with Faster Processors 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU 64-CPU

V960 r1488 LS-DYNA Gigabit Ethernet Jan-March 2003 LAM/MPI refined Neon-535k elements 12

K – II - 08

LS-DYNA Conference | May 2003

2.4 GHz

2.8 GHz

© 2002 IBM Corporation

th

4 European LS-DYNA Users Conference

MPP / Linux Cluster / Hardware II

IBM Deep Computing Team – 2003 LS-DYNA Conference

Speedup from Faster 533 MHz Frontside Bus Model Size (elements) 12000

Speedup: 400MHz to 533 MHz Frontside Bus 1.10

32000

1.08

155000

1.20

430000

1.18

V960 r1488 LS-DYNA March 2003 LAM/MPI 2.8 GHz x335 node – 2 processor runs 13

© 2002 IBM Corporation

LS-DYNA Conference | May 2003

IBM Deep Computing Team – 2003 LS-DYNA Conference

Configuring Each Node with One Processor 16000 14000 12000 10000 Elapsed 8000 Time 6000 (sec) 4000 2000 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU 64-CPU

V960 r1488 LS-DYNA Gigabit Ethernet x335 2.8 GHz 2 CPUs per node March 2003 LAM/MPI Front crash model 430k elements 14

LS-DYNA Conference | May 2003

1 CPU per node

© 2002 IBM Corporation

K – II - 09

MPP / Linux Cluster / Hardware II

th

4 European LS-DYNA Users Conference

IBM Deep Computing Team – 2003 LS-DYNA Conference

POWER4 and IA-32 Xeon Performance Compared 25000 20000 Elapsed 15000 Time 10000 (sec) 5000 0 2-CPU

4-CPU

8-CPU

V960 LS-DYNA 2.4 GHz Xeon + Myrinet Jan 2003 Refined Neon 535k Elements 15

16-CPU 32-CPU

1.3 GHz POWER4 p655

LS-DYNA Conference | May 2003

© 2002 IBM Corporation

IBM Deep Computing Team – 2003 LS-DYNA Conference

Interconnect Performance Compared 30 25 20 Parallel 15 Speedup 10 5 0 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU x335+Fast Ethernet x335+Gigabit Ethernet x335+Myrinet p655+SP Switch2

V960 LS-DYNA Jan 2003 Refined Neon 535k Elements 16

K – II - 10

LS-DYNA Conference | May 2003

© 2002 IBM Corporation

th

4 European LS-DYNA Users Conference

MPP / Linux Cluster / Hardware II

IBM Deep Computing Team – 2003 LS-DYNA Conference

Summary  IBM Continues to work with LSTC on improving the performance of LS-DYNA  IBM pSeries still provides top performance and the advantages of the AIX user environment  IBM xSeries platforms offer a very cost effective Linux Cluster solutions for LS-DYNA customers  Users today can customize their system in order to pick the features which serve them best Processors Operating system Interconnect

17

LS-DYNA Conference | May 2003

© 2002 IBM Corporation

K – II - 11

MPP / Linux Cluster / Hardware II

K – II - 12

th

4 European LS-DYNA Users Conference