Some Performance Metrics for Heterogeneous

Some Performance Metrics for Heterogeneous Distributed Systems D. Grosu Department of Electronics and Computers Transilvania University Brasov, Romania

Abstract

In this paper we present some de nition of heterogeneity and its relation with other performance metrics such as speedup and eciency. We also introduce a new metric for heterogeneous systems called ecacy. This metric is useful in system tunning studies. To validate and support performance modeling results, we conducted a set of experimental measurements for evaluating performance of a heterogeneous distributed system. Keywords: Heterogeneous Distributed System, PVM, Speedup, Eciency, Ecacy

1 Introduction Heterogeneous computing oers new challenges and opportunities to research communities. In recent years heterogeneous computing has enjoyed unprecedented attention from researchers, gouvernment agencies, and industries. This attention is mainly due to the fact that in most institutions exist several LAN and high performance computers and these machines can be used to build a heterogeneous computing environment[1]. A heterogeneous distributed system can have heterogeneity at various level. More speci cally, the heterogeneity may be: in the processors, with processors of dierent speeds; in the memory, with dierent amount of available memory on dierent machines; in the network, with varying cost of communication among pairs of processors. Heterogeneity makes traditional performance metrics for homogeneous computing performance evaluation not suitable for heterogeneous computing. In practice the main performance measures are the execution time and the speedup. These performance metrics are determined for xed-size or scaled-size problems[2,3]. These metrics compare the parallel performance with the performance of a single processor node. This reference base does not exist in heterogeneous systems. The speedup is de ned in several papers [4, 5] as the ratio of the time of a program running on the fastest machine to the time of the program running on the heterogeneous system. In this paper we try

to relate the speedup with the metrics that quantify the system heterogeneity. Also we de ne ecacy as a new performance measure. We have use PVM as a framework to conducted some experimental measurements[6]. The underlying system is a network of workstations that include SPARC workstations. The paper is organized as follows: In section 2 we present the concepts of heterogeneous systems and develop some metrics for heterogeneity. In section 3 we show the relation between heterogeneity and proposed performance metrics. In section 4 we report measurement results and conclusions.

2 Basic concepts 2.1 System`s model A distributed heterogeneous computing systems (DHCS) can be modeled as a connected graph HS(M,N). M = fM ; M ; : : : ; Mng is a set of machines that forms the vertices of graph. n is the number of machines. This machines can be dierent in computation power. The computation power depend on its CPU speed, I/O capabilities and amount of memory. N is a standard network. This network can include dierent standard network such as Ethernet or ATM. In order to quantify the relative computing power among the machines in the DHCS for each machine Mi, we de ne power weight Wi for running program P on machine Mi in two ways. The rst way is to refers the speed to the speed of the fastest machine on the network[5]. i Wif = SP j i = 1; : : : ; n (1) maxj SP where SPi is the speed of machine Mi to solve P dedicatedly. In terms of execution time the power weight becomes: 1

2

Wif

Mj min j tP = Mi tP

i = 1; : : : ; n

(2)

where tMP i is the execution time for program P on machine Mi Another way is to refer the speed to the slowest machine on the system. j

Wis = minSji SP

i = 1; : : : ; n

(3)

Wis =

i = 1; : : : ; n

(4)

P tMP i maxj tMP j

2.2 Heterogeneity There are several ways to quantify the heterogeneity of the system. We use four ways to de ne heterogeneity and we study the in uence of changes on the system, in these four

de nition of heterogeneity. One way to de ne heterogeneity is as a standard deviation: s Pn (Wmed Wj ) H = j (5) n Heterogeneity can also be de ned as a mean absolute deviation: Pn jWmed Wj j H = j (6) n where Wmed is average power weight. Another way of de ning heterogeneity is to use the power weight of the fastest or slowest machine as the comparison reference (Wi = 1). Pn (1 W f (P )) j (7) H = i n Pn (1 W s(P )) j H = i (8) n In fact H and H can be expressed as follows: H = 1 Wmed (9) H = 1 Wmed (10) We de ne a dual metric named homogeneity: Hhom = Wmed (11) H and H use the average value of data set as the base reference for comparison to calculate the heterogeneity. Thus H and H cannot precisely capture the critical system changes. The most suitable way to quantify the heterogeneity is to use H , H or Hhom. This metrics are statisticaly equivalent. We propose the use of H as a metric for heterogeneity and of Hhom as a dual metric, because they are simple and provide a resonable capture of system changes. However these metrics re ects an average balance of the power distribution. Let's take an example. A DHCS with 10 workstations, the fastest workstation has W f = 1 and the rest of 9 workstation 0.5. The speed of the fastest workstation is then doubled and the rest of workstations have the identical power weight of 0.5. Initially: W = 1 Wi = 0:5 i = 2; : : : ; n After changes: W = 1 Wi = 0:25 i = 2; : : : ; n The heterogeneity metrics for our example are presented in Table 1. The change in H and H is less than 10% and in H , H and Hhom is greather than 20%. Thus the last three metrics are the most suitable to quantify the heterogeneity (homogeneity) of the system. 1

=0

2

=0

3

=0

4

3

2

=0

4

3

4

1

2

1

2

3

4

4

1

1

1

1

2

3

4

3 Performance metrics 3.1 Speedup One of the most frequently used performance metrics in parallel and distributed processing is speedup. It is de ned as sequential execution time over parallel execution

Table 1: Heterogeneity Heterogeneity Con guration H H H H Initial 0.15 0.09 0.45 0.68 After changes 0.23 0.14 0.68 0.45 1

2

3

4

time. For heterogeneous distributed system this type of speedup must be reformulated. Lets consider a heterogeneous distributed system comprising n machines. tMP i is the wall clock time required to execute code P on machine M : : : Mn and tHP is the time required to execute P in heterogeneous distributed system. In [4] the speedup is de ned as: M1 M2 Mn (12) SP = minftP ; ttPH ; : : : ; tP g 1

P

Let's assume that code P can be decomposed into task P ; P ; : : : ; Pn which can execute on machines Mi in parallel. tHP can be reformulated as the longest time required by the individual tasks. tHP = maxftMP11 ; tMP22 ; : : : ; tMPnn g (13) where each tasks is assigned to the architecture best suited for it. SP becomes: M1 M2 Mn (14) SP = minftPM1; tPM2; : : : ; tPMng maxftP1 ; tP2 ; : : : ; tPn g If M1 M2 Mn maxftMP11 ; tMP22 ; : : : ; tMPnn g minftP ; tPn ; : : : ; tP g (15) then the speedup for the heterogeneous distributed computing systems is superliniar. We will try to relate the speedup for heterogeneous system with the heterogeneity and homogeneity. For this, we need one adittional metric. This metric is parallelism degree. The parallelism degree was de ned in [5] as: Pn ta (16) Pdeg (P ) = itH i P where tai is the active computation time on each workstation and tHP is the total time accross the heterogeneous system. The speedup has the following property: 1

2

=0

SP = Hhom Pdeg (17) SP = (1 H ) Pdeg (18) To demonstrate this we will start from the de nition of speedup. Zhang in [5] demonstrates that: SP = (1 H ) Pdeg and because Hhom = 1 H the speedup becomes: SP = (1 H ) Pdeg = Hhom Pdeg 3

3

4

3

The speedup decreases if the heterogeneity H increases and increases if the homogeneity increases, assuming Pdeg keeps unchanged. 3

Table 2: Power weights of the three types of workstation SPARC10 SPARC20 SPARCclassic Matrix Multiply 1 0.64 0.36

3.2 Eciency and ecacy In homogeneous parallel computing the eciency EP is a measure of the fraction of the time the processors are busy. The classical de nition of eciency is: EP = SnP (19) where SP is speedup and n is number of processors. For heterogeneous system we use the EP de ned by Zhang in [6] as the ratio of the total eective computing time to the total available cycle time in the system.

Pn

Wi size(Pi) SiP EP = (20) own ti ) Wi where size(Pi ) is the size of the application and town i is the time of machine Mi executing i=0 n (tH i=0 P

P

the owner workload. We propose a new metric called ecacy. This metric re ects how well the processors are used. The ecacy EFP is de ned as follows: (21) EFP = PnEPW i i EFP = n EHP (22) hom EFP increases to a maximum and then decrease. The con guration that attains the maximum EFP is the one for which the rates of the bene t (increase in speedup) to the cost (decrease of eciency) is maximum. The characterization of systems through their optimal con guration is useful in design studies or in system tunning studies. =0

4 Experimental results and conclusions 4.1 Experimental system We have conducted experiments on a heterogeneous distributed system consisting of four SPARC 20 workstations, one SPARC 10 and four SPARCclassic connected by Ethernet network of 10 Mbs. We have used matrix multiplication (600*600 matrix) implemented by PVM [6] as an aplication program for studing the performance metrics. In our experiment the execution time was measured by using dierent system sizes. The power weights of the three types of workstations for matrix multiply are presented in Table 2.

speedup 6

bb b

2

HH H

@@ @ -

1

2

3

4

5

weight

Fig.1 Speedup vs. power weight 1

efficacy

6

0.5 0 1 1 0.5 0 1

hhhhP PP 2

3

efficiency

6

((((HHH 2

3

(((((`````` 4

5

weight

(((((aaa``` 4

5

weight

Fig.1 Eciency and ecacy vs. power weight

4.2 Experimental results We have scaled the system power by increasing the number of workstations from 2 to 9. Initially the system had one SPARC 10 and one SPARC 20 workstations. Then four SPARCclassic workstations were added and nally three SPARC 10 workstations. Using this scaling, we measured the speedup, eciency and ecacy. The performance results are shown in Fig. 1 and 2. We notice that matrix multiplication reaches its maximal speedup on 6 workstations and the maximal e cacy on 3 workstation. Usually the maximum of ecacy and speedup are obtained for dierent values of total power weight. Thus we can conclude

that the size of system is optimal when the system comprises 3 workstations. This means that the bene t in allocating more than 3 workstations (with the considered problem size) is less than the cost associated with the additional machines. The proposed metric (ecacy) can be used for selecting the optimal con guration of the system. In our case the optimal con guration comprises 3 workstations.

4.3 Conclusions In this paper we characterize the performance of heterogeneous computing system and we propose ecacy as a metric useful for possible optimization. We have showed that heterogeneity metrics and the homogeneity are eective concepts to quantitatively describe the heterogeneous distributed system. We are further investigating the following important issues:

developing a theoretical model for performance prediction of heterogeneous dis-

tributed systems. developing a tool for parallel performance measurement in heterogeneous distributed systems.

References [1] A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, C. L. Wang. Heterogeneous Computing: Challenges and Opportunities. IEEE Computer, 26(6):18{27, June 1993. [2] A. L. Couch. Locating Performance Problems in Massively Parallel Executions Proceedings of the IEEE, 81(8):1116{1126, Aug. 1993. [3] M. Calzarossa, G. Serazzi. Workload Characterization: A Survey. Proceedings of the IEEE, 81(8):1136{1150, Aug. 1993. [4] C. R. Mechoso, J. D. Farrara, J. A. Spahr. Achieving superlinear speedup on a Heterogeneous, Distributed system. IEEE Parallel&Distributed Technology, 2(2):57{ 61, Summer, 1994. [5] X. Zhang, Y. Yan. Modeling and characterizing parallel computing performance on heterogeneous networks of workstations. Proceedings of the Seventh IEEE Symposium on Parallel and Distributed Processing, 25{34, October, 1995. [6] Al Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Mancheck, V. Sunderam. PVM 3 user' s guide and reference manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, Oak Ridge, Tenessee, May 1993