Hybrid Resource Allocation Method for Grid Computing - IEEE Xplore

Second International Conference on Computer Research and Development

Hybrid Resource Allocation Method for Grid Computing Syed Nasir Mehmood Shah, Ahmad Kamil Bin Mahmood and Alan Oxley Department of Computer and Information Sciences Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, Malaysia. [email protected], [email protected], [email protected] technique for resource allocation. Our research aims to develop a Grid resource allocation method that makes efficient utilization of resources in such a way as to minimize the computational cost. This paper mainly focuses on the research of resources allocation for Grid jobs. The contributions of this paper are the design of Hybrid Resource Allocation method and its comparison with other four well known resource allocation methods, to solve the Grid resource allocation problem. Also, resource allocation methods are evaluated by taking Grid resource allocation scenario. The structure of the paper will now be described. Section 2 is a literature review of Grid allocation methods. Section 3 describes the linear programming model and section 4 describes the new Grid resource allocation method. Section 5 focuses on the results and a discussion and section 6 concludes the paper.

Abstract— The resource management system is the ‘brain’ of a Grid. It manages the shared resource of the Grid and maps user jobs to the resources. Grid resource allocation is one of the critical functions affecting the performance of a Grid, because the number of jobs and amount of required resources are massive and quick responses to users are necessary in a real Grid computing environment. Many methods have been developed for Grid resource allocation. In this paper, we propose the Hybrid Resource Allocation method, based on the Least Cost Method (LCM) and Divisible Load Theory (DLT) method. The Hybrid Resource Allocation method is an improved form of the DLT method. We have developed a modified assignment strategy of DLT by integrating it with LCM. In this paper we describe and evaluate Hybrid Resource Allocation method for Grid resource allocation scenario. This paper also proposes a Grid scheduling model and includes a comparative performance analysis of our proposed Hybrid Resource Allocation method with the existing ones.

II. RELATED RESEARCH A computational grid is a high performance computational system which consists of heterogeneous distributed resources. One of the main issues is to schedule separate tasks of an application on distributed resources with the objective of maximizing the users’ benefits [3, 4]. Grids are complex structures. The resources in a Grid are heterogeneous in nature. The structure of a Grid is a dynamic one. Resource availability changes whilst an application is underway. This may include the failure of resources or communication links, some of which will have already been allocated to jobs. A good Grid resource scheduling method should be distributable, scalable, and fault tolerant [5]. Grid resource allocation can be one of two types: static and dynamic. In static allocation, all tasks are assigned to processing nodes immediately prior to executing a job. In dynamic resource allocation, each task is assigned immediately prior to its execution, which may be after the job has commenced. With static resource allocation, the assignment of tasks to processing nodes tends to be done on the basis of minimizing a job’s turnaround time. This is made up of the communication time and the execution time. With dynamic resource allocation, a task tends to be assigned on the basis of load balancing of the nodes [6]. In [7] the authors looked at the centralization and decentralization of the duties of the Grid resource scheduler. They did this by referring to tree structures. A centralized scheduler, where a single node (master node) is in control, can be represented by a flat tree. They argued that with a flat structure, as demand for resources becomes high so the master node easily becomes overloaded. A flat structure is therefore not a suitable structure. A decentralized scheduler,

Keywords- distributed systems, Grid computing, Grid scheduling, load balancing, task synchronization, transportation problems, linear programming, parallel processing

I. INTRODUCTION Grid computing is an important approach to distributed computing. This is because it is sure to change the way organizations solve complex computational problems. A large number of powerful computers are capable of being connected to a Grid. The problem of assigning data and computation to them in such a way as to maximize the overall performance is a challenging one. This is particularly true for parallel jobs that require the use of multiple resources available on time-sharing operating systems. There are three main phases of Grid resource scheduling. Phase one is resource discovery, which provides a list of available resources. Phase two is resource allocation, which involves selection of feasible resources and mapping of jobs to resources. The third phase includes job execution, which includes file staging and cleanup. In the second phase, the selection of the best match of jobs to resources is an NPcomplete problem [1]. Grid resource allocation is a special case of linear programming transportation problems. Grid resource allocation deals with the assignment of tasks from a number of ‘sources’ (jobs) to a number of ‘destinations’ (processors) at minimum ‘transportation’ cost [2]. Previous Grid resource allocation research has focused mainly on the ordering of jobs and the matching of resources to jobs. We believe that there is potential for improving the 978-0-7695-4043-6/10 $26.00 © 2010 IEEE DOI 10.1109/ICCRD.2010.86

426

considers the dynamic characteristics of Grid applications and also makes the scheduling adaptive to the Grid environment. An ‘Ant algorithm’ has been proposed for efficient resource management and task scheduling in Grids [13]. An Ant algorithm is a new type of heuristic algorithm. In our context, the algorithm includes a resource state prediction mechanism for proper dynamic task scheduling. It also has the inherent parallelism and scalability features. Simulation results showed that the algorithm performs well as regards response time, resource average utilization and task parallel proportion. A shortcoming with this algorithm is that it has not been tested on a real time Grid environment. The divisible load mechanism [14] is one solution for optimal allocation and scheduling of a divisible load to computing nodes and links. It divides the computing loads into partitions and distributes the partitioned load among static processors depending upon their capacity. This approach is inappropriate for Grid applications which include indivisible data in a dynamic Grid environment. G. Murugesan proposed a resource scheduling model using divisible load theory (DLT) method. The scheduling strategy divides the load equally into portions and each portion is allocated to a separate processor in such a way as to minimize the processing time. The author also formulated an LP model for resource scheduling and conducted the experiment using the LINDO software package [15]. However, there are a few shortcomings with this resource scheduling model. It does not support the dynamic nature of the loads and resources. A random number method has been used for the division of a load, from multiple sources, into equal amounts. The author also suggested that his work can be extended so that division of the load depends upon the resource capacity. TORA is Windows based software and offers modules for the solution to different types of problem. TORA can be used for resource allocation problems. TORA can be executed in automated as well as in tutorial mode. The automated mode results in the final solution of the problem, usually in the standard format. The tutorial mode is a unique feature that provides instant feedback to test the reader's understanding of the computational details of each algorithm [2, 16]. The following methods, relevant to resource allocation can be used in TORA [2, 16]: 1. Northwest Corner Method (NWCM) 2. Least Cost Method(LCM) 3. Vogel’s Approximation Method This paper is significant for proposing Hybrid Resource Allocation method for the multiple resource scheduling problems. In this paper we dealt with multiple sources (jobs) with multiple resources (processors); part of the work is based on optimal distribution of loads [15]. Each source having their jobs and the entire job of each source is divided into tasks of equal sizes. Each task is assigned to the processor using LCM while satisfying the set of constraints, as discussed in next section.

where several nodes (master nodes) are in control can be represented by a non-flat tree. Each master node can control a subset of processors in order to avoid bottlenecks but multiple nodes might cause additional processing and transfer delays. In this paper a simple heuristic approach was adopted for the dynamic reconfiguration of the tree structure. Experiments showed that an optimal tree structure results in minimizing the overall average response time, for a given set of parameters and job distribution policy. The deficiency with this approach is that no consideration has been given to the synchronization issues associated with a job distribution policy. In Grid resource scheduling, some means of estimating a task’s execution time must be used. Furthermore, information about each node’s capability and availability must be gathered. The matching of tasks to nodes and the monitoring of the tasks needs to take place. The software to perform these management functions could be located on a central computer, i.e. centralized, or could be located on several computers, i.e. decentralized [8, 9]. [10] focuses on Data Grids, i.e. Grids involving large amounts of data. As an example, physicists at CERN generate vast quantities of data which are processed by many researchers located around the world. As the jobs being processed require access to a lot of data, the location of the data (termed ‘data locality’) plays a part in deciding on which site a job should run. When the number of users and resources is large there is a case for using decentralized strategies rather than centralized ones. Furthermore, the replication of data to several sites needs to be considered, as does caching. [10] refers to a proposed Grid having three schedulers at each site – External Scheduler (ES), Local Scheduler (LS), and Data Scheduler (DS). If it is decided to process a job locally then LS is used otherwise the site(s) to which the job is sent is chosen by ES. DS makes decisions about whether data needs to be replicated and whether data files need to be deleted. Five ES and four DS algorithms were considered. A simulator, ChicagoSim, was developed in order to analyze the algorithms. The paper concludes that it is preferable to process a job at a site where the bulk of the data is and that frequently used data should be replicated across the Grid. Both of these conclusions seem to be as one would expect. [11] proposes a compensation based resource scheduling approach to a Grid environment. This approach allocates resource dynamically. This approach has been implemented and evaluated using the ALiCE Grid system. Experimental results show that compensation based scheduling is effective in reducing execution time estimation misses and the total execution times of Grid applications. The authors also highlighted future work, which includes multi-resource compensation, resource partitioning and allocation. In [12] the author proposed an adaptive resource scheduling system by using a Max-min algorithm. The experimental results show that the proposed model can schedule tasks efficiently. The proposed system is particularly good at detecting and using idle processors. This system dynamically selects the proper scheduling strategy according to the accuracy of the predictor. This system also

427

III. LINEAR PROGRAMMING MODEL A Grid is a computing system on which a number of jobs are processed by a number of resources. Let m be the number of jobs and n the number of processors in a Grid computing domain. More formally; J = {J1, J2, …, Jm} P= {P1, P2, …, Pn} where n >= 1; m >= 1 and J is set of m jobs; P is set of n processors. Each job is split into k tasks. Ji = {Ti1, Ti2, …, Tik} where k >= 1

The Grid resource allocation problem itself is an LP problem. The objective function and the set of constraints are as follows: Minimize

Z min

m

n

h

¦¦¦ c T

ij ik

X ijk

i 1 j 1 k 1

Subject to

X ijk ^0,1` n

h

¦¦ T

ik

X ijk

m > 0; n > 0; h > 0 cij >= 0; Tik >= 0 (1)

ai

; i=1,2,3,…,m

(2)

X ijk d b j

; j=1,2,3,…,n

(3)

j 1k 1

m

h

¦¦ T

ik

i 1 k 1

The objective function is the computational cost, which we wish to minimize. With reference to constraint (1), the value of Xijk shows whether or not the kth task of job i is executed by processor j. Constraint (2) ensures that all the tasks for each job have been allocated. Constraint (3) ensures that total quantity of processing units allocated to each processor does not exceed its availability.

Figure 1. Task Allocation Model

Where xij is the number of units of processing from job i that is to be executed on processor j ai is the number of units of processing making up job i bj is the number of units of processing that processor j has available cij is the unit cost associated with allocating job i to processor j. This is made up of the processing time and the cost to execute one unit of processing. In what follows, we will refer to cij as the ‘allocation cost’. Tik is the number of units of processing making up the kth task of job i Xijk is a Boolean variable which is set to1 if the kth task of job i is executed by processor j. If this task is not executed by processor j then the variable is set to 0. The main variables relevant to the Grid resource allocation problem can be represented in the form of a table (see Table I). TABLE I.

IV.

The Hybrid Resource Allocation method divides the jobs into tasks of equal size and allocates the tasks to available processors using the Least Cost Method (LCM). The detailed steps of the Hybrid Resource Allocation method are as follows: 1. For each workload, divide it into portions (called ‘tasks’) of equal size. The size is randomly chosen from one of the factors of the number signifying the workload. 2. Select the cell of the table with the least allocation cost. If there is a tie, then resolve the tie by selecting the cell which can host the task of maximum size. 3. For the selected cell, compare the available processor availability with the task demand. If the task demand is less than or equal to the processor availability then allocate the task to this cell. Reduce the available processor availability by the task size and eliminate the task demand. If either the workload or the processor availability has been reduced to zero (or both have been reduced to zero), then ignore the relevant job and/or processor from now on. 4. If no more allocation can be made then stop as an initial feasible solution has been found. If more allocations are still to be made, go to step 2.

REPRESENTATION OF MAIN VARIABLES

Jobs\Processors

P1

P2

……

Pn

J1

c11

c12

……

c1n

Size of Job (Workload), (W) a1

J2 :

c21 :

c22 :

c2n :

a2 :

Jm Processor Availability (PA)

cm1

cm2

……

cmn

am

b1

b2

……

bn

HYBRID RESOURCE ALLOCATION METHOD

428

V. RESULTS AND DISCUSSION In this paper we describe a comparative analysis of our proposed ‘Hybrid Resource Allocation’ method with other well known resource allocation methods. We continue to use some of the notation defined above: Ji represents job i; Pj represents processor j; ai denotes the size of job i, i.e. its workload; bj represents the availability of processor j. We consider a Grid resource allocation scenario, taken from [15]. In this scenario, the Grid system consists of five processors (resources) namely P1, P2, P3, P4 and P5 with four jobs (sources) J1 , J2 , J3 and J4 trying to utilize the Grid system. The total workload and the division of workload of each job are shown in Table II. TABLE II.

TABLE IV. J\P

Total Workload

No of Subdivision

6

3

9

3

6

2

8

2

J1 J2 J3 J4

P2 9

P3 2

P4 8

P5 8

W

J1

P1 8

J2

8

9

2

8

8

9

J3

8

9

2

8

8

6

J4 PA

8

9

2

8

8

8

12

12

12

12

12

TABLE V.

Processors P1 P2 P3 P4 P5

PROCESSOR ALLOTMENT

Source/Jobs J1

In Table III we give an example of how the allocation costs (cij) might be calculated. TABLE III.

J2

ALLOCATION COSTS

Processing time

Processing Cost

Allocation Cost

2

4

8

3

3

9

1

2

2

2

4

8

4

2

8

6

To recap, the allocation cost (cij) is the cost associated with allocating one unit of processing from job i to processor j. These are also shown in Table IV. The jobs have workloads of 6, 9, 6 and 8, as shown in column ‘W’. The processors have capacities of 12 each, as shown in column ‘PA’. The problem is to find out which tasks should be allocated to which processors so as to minimize the overall computational cost. There are several algorithms which attempt to do this. The results of one of these, the DLT method, are shown in Table V, taken from [15].

TOTAL WORKLOAD AND ITS D IVISION

Sources

RESOURCE ALLOCATION TABLE

J3

J4

Processor Allotted

T11

P2

T12

P3

T13

P4

T21

P1

T22

P1

T23

P3

T31

P3

T32

P5

T41

P3

T42

P4

Table V shows the mapping of tasks, of each job, to processors. We also show the detailed resource allocation by the DLT method in Table VI.

Let us consider a unit of processing as being a fixed amount of processing, independent of the particular processor that executes it. The column called ‘processing time’ refers to the time taken to execute one unit of processing on a particular processor. The column called ‘processing cost’ refers to the monetary cost of using a particular processor per unit time. The cost of allocating one unit of processing to a processor is the product of ‘processing time’ and ‘processing cost’. We formulated the Resource Allocation Table IV from Tables II and III, arbitrarily setting the availability of each processor to 12 units of processing.

TABLE VI. J\P

J3

8 (3)(3) 8

J4

8

PA

12

12

P3 2 (2) 2 (3) 2 (3) 2 (4) 12

PAR

6

10

0

J1 J2

429

P1 8

P2 9 (2) 9

RESOURCE ALLOCATION BY DLT

9 9

P4 8 (2) 8

P5 8

W

S

6

3

8

9

3

8

8

6

2

8

2

(3) 8

8

(4) 12

12

6

9

Table V shows that J1 is equally divided into 3 tasks namely T11, T12 and T13. T11 is allotted to P2, T12 is allotted to P3 and T13 is allotted to P4. J2 is partitioned into 3 tasks. T21 is allotted to P1, T22 is allotted to P1 and T23 is allotted to P3. Tasks T31 and T32 of J3 are is allotted to P3 and P5, respectively. J4 is partitioned into tasks T41 and T42 which are allotted to P3 and P4, respectively. Table VI shows the detailed allocation of each job’s tasks to processors for the DLT method. 2 units of J1 have been allocated to P2, as indicated by the number in brackets. The remaining four units of J1 are equally distributed between P3 and P4. Six units of J2 have been allocated to P1, whilst the remaining 3 units have been allocated to P3. Six units of job J3 are equally distributed among P3 and P5. Eight units of J4 are equally assigned to P3 and P4. The overall computation cost is calculated as follows: 9 x 2 + 2 x 2 + 8 x 2 + 8 x 3 + 8 x 3 + 2 x 3 + 2 x 3+ 8 x 3 + 2 x 4 + 8 x 4 = 162 Allocation scenarios can be solved by three other methods - NWCM, LCM and VAM. We have experimented with these methods using the TORA optimization system. Finally, we can use our new Hybrid Resource Allocation method to solve the problem. The overall computational cost of using each method to solve our example is shown in Table VII. TABLE VII.

TABLE IX. J\P

DLT (Divisible load theory method) NWCM (Northwest Corner method)

214

LCM (Least Cost method)

202

VAM (Vogel Approximation method)

160

Hybrid Resource Allocation

160

J\P

J3 J4

8

PA

12

9 (3) 9 (6) 9 (3) 12

PAR

12

12

J2

8

9

J3

8

9

2

J4

8

PA

12

12

12

12

(8) 12

PAR

12

12

7

0

0

9

J\P

2

J1

P1 8

P2 9

J2

8

9

J3

8

9

P3 2 (6) 2 (6) 2

J4

8

9

2

PA

12

12

PAR

12

12

P1 8

P2 9

J2

8

9

J3

8

9

J4

8

9

PA

12

(3)

9

P5 8

6

8

9

8

6 (4)

8

8

P4 8

P5 8

8

12

8 (5) 12

8 (3) 8 (6) 8 (3) 12

0

7

0

8

W 6 9 6 8

RESOURCE ALLOCATION BY HYBRID RESOURCE ALLOCATION METHOD

J1

PAR

P4 8 (1) 8 (9) 8 (2) 8

RESOURCE ALLOCATION BY VAM USING TORA

TABLE XI.

12 12

P3 2 (2) 2 (3) 2 (3) 2 (4) 12

P4 8 (2) 8

0

6

8 8 (4) 12

P5 8 (2) 8 (3) 8 (3) 8

W

S

6

3

9

3

6

2

8

2

12 4

Table XI shows the detailed resource allocation by the Hybrid Resource Allocation method. Firstly, this method divides the job into equal sized tasks. Its division mechanism (described in step 1 of Section IV) is the same as that employed by the DLT method. Its allocation strategy is different from DLT method. Its allocation strategy is based on LCM. It searches for the cell with the least allocation cost. For example, ‘2’ is the least allocation cost, as shown in column P3. This number appears four times in the resource allocation table. The Hybrid Resource Allocation method favors selection of the cell which can host the task of maximum size. Firstly task T41 is mapped to P3, and then T21 is assigned to P3 and so on. The allocation sequence of each task to a processor is shown below:

RESOURCE ALLOCATION BY NWCM USING TORA P1 8 (6) 8 (6) 8

J2

P3 2 (5) 2

J\P

As we stated above, Table VI shows the result of applying the DLT method to our task allocation problem. Tables VIII to XI show the result of applying the four other methods to the same problem. The tables correspond to NWCM, LCM, VAM, and the Hybrid Resource Allocation method, respectively.

J1

J1

P2 9

TABLE X.

Computational Cost 162

W

P1 8

COMPARATIVE RESULTS

Resource Allocation Methods

TABLE VIII.

RESOURCE ALLOCATION BY LCM USING TORA

W

P2 9

P3 2

P4 8

P5 8

6

2

8

8

9

2

8

8

6

2 (5) 12

8

8

8

12

12

T41 Æ P3, T21 Æ P3, T31 Æ P3, T11 Æ P3, T42 Æ P4, T22 Æ P1, T23 Æ P5, T32 Æ P5, T12 Æ P4, T13 Æ P5

7

0

0

The overall computation cost is calculated as follows:

430

2x2+8x2+8x2+8x3+2x3+8x3+2x3+8x3+ 2 x 4 + 8 x 4 = 160 Fig. 2 shows a graph derived from Table VII. It compares the different Grid resource allocation methods. VAM and the Hybrid Resource Allocation method produce the least computational cost. VAM is a good choice; however, it is iterative in nature and, because of this, is computationally slow.

[2] [3] [4]

[5]

[6]

[7]

[8]

[9]

[10] Fig.2. Comparison of Grid resource allocation methods

VI.

[11]

CONCLUSIONS

In this paper, we formulated an LP model for Grid resource allocation. We also propose the Hybrid Resource Allocation method for Grid resource allocation and performed a comparative analysis of it with DLT and other well known resource allocation methods. The Hybrid Resource Allocation method favors the equal distribution of jobs. We conclude that the Hybrid Resource Allocation method produces encouraging results as compared to other well known resource allocation strategies. The Hybrid Resource Allocation method is a potential candidate for resource allocation in a computational Grid environment.

[12]

[13]

[14]

[15]

REFERENCES [1]

[16]

D. Fernandez-Baca, “Allocating modules to processors in a distributed system”, IEEE Trans, Software Eng., 1989

431

Taha, Hamdy A. Operations Research: An Introduction, 8th Edition. Prentice Hall, Inc. 2007. I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. S. Chapin, J. Karpovich, A. Grimshaw, The legion resource management system, Proceedings of the Fifth Workshop on Job Scheduling Strategies for Parallel Processing, Springer, Berlin, 1999. Quinn Snell, Kevin Tew, Joseph Ekstrom, Mark Clement, “An Enterprise-Based Grid Resource Management System”, Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing HPDC-I1, 2002. Thanasis Loukopoulos, Petros Lampsas, Panos Sigalas, "Improved Genetic Algorithms and List Scheduling Techniques for Independent Task Scheduling in Distributed Systems," pdcat, Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007), 2007 J. Palmer, I. Mitrani, “Optimal Tree Structures for Large-Scale Grids”, Proceedings of the UK e-Science All Hands Meetings, Nottingham, 2004. M.K. Dhodhi et al, “An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems” J. of Parallel and Distributed Computing, 2002. S.Y. Lee, C.H. Cho, “Load balancing for minimizing execution time of a target job on a network of heterogeneous workstations”, In D.G. Feitelson and L. Rudolph, editors, JSSPP’00, 2000. K. Ranganathan, I. Foster, “Simulation studies of computation and data scheduling algorithms for data Grids”, Journal of Grid Computing, 2003. Y.M. Teo, X. Wang, J.P. Gozali, “A Compensation-based Scheduling Scheme for Grid Computing”, Proceedings of the Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region (HPCAsia’04). Liang-Teh Lee, Chin-Hsiian Liang, Hung-Yuan Chang, “An Adaptive Task Scheduling System for Grid Computing”, in: Proceedings of The Sixth IEEE International Conference on Computer and Information Technology (CIT'06). Zhihong Xu, Xiangdan Hou, Jizhou Sun, “Ant Algorithm-based Task Scheduling in Grid Computing”, CCECE 2003 - CCGEl2003, Montreal, IEEE, May 2003. D. Yu, T.G. Robertazzi, “Divisible load scheduling for grid computing”, in: Proc. Int'l Conf. on Parallel and Distributed Computing Systems, Nov, 2003. G. Murugesan, C. Chellappan, : An Economical Model for Optimal Distribution of Loads for Grid Applications in International Journal of Computer and Network Security, Vol. 1, No. 1, October 2009 TORA Optimization System, Windows®-version 1.00, available via http://prenhall.com/taha/