Deadline constraint heuristic-based genetic ... - Semantic Scholar

96

Int. J. Grid and Utility Computing, Vol. 5, No. 2, 2014

Deadline constraint heuristic-based genetic algorithm for workflow scheduling in cloud Amandeep Verma* Department of Information Technology, University Institute of Engineering and Technology, Panjab University, Chandigarh 160014, India Email: [email protected] *Corresponding author

Sakshi Kaushal Department of Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh 160014, India Email: [email protected] Abstract: Task scheduling and resource allocation are the key challenges of cloud computing. Compared with grid environment, data transfer is a big overhead for cloud workflows. So, the cost arising from data transfers between resources as well as execution costs must also be taken into account during scheduling based upon user’s Quality of Service (QoS) constraints. In this paper, we present Deadline Constrained Heuristic based Genetic Algorithms (HGAs) to schedule applications to cloud resources that minimise the execution cost while meeting the deadline for delivering the result. Each workflow’s task is assigned priority using bottom-level (b-level) and top-level (t-level). To increase the population diversity, these priorities are then used to create the initial population of HGAs. The proposed algorithms are simulated and evaluated with synthetic workflows based on realistic workflows. The simulation results show that our proposed algorithms have a promising performance as compared to Standard Genetic Algorithm (SGA). Keywords: cloud computing; workflow; scheduling; DAG; genetic algorithm; grid computing. Reference to this paper should be made as follows: Verma, A. and Kaushal, S. (2014) ‘Deadline constraint heuristic-based genetic algorithm for workflow scheduling in cloud’, Int. J. Grid and Utility Computing, Vol. 5, No. 2, pp.96–106. Biographical notes: Amandeep Verma is an Assistant Professor (IT) in UIET, Panjab University, Chandigarh, India. She is currently pursuing her PhD in workflow scheduling in cloud computing. She received her BTech from Punjab Technical University, Jalandhar (Punjab), India in 2002 and MTech degree in Computer Science Engineering from Punjabi University, Patiala (Punjab), India in 2004. Her area of interest includes parallel and distributed computing, computer networks and cloud computing. Sakshi Kaushal is currently an Associate Professor at Department of Computer Science and Engineering in UIET, Panjab University, Chandigarh, India. She received her PhD in Computer Science and Engineering from Thapar University, Patiala, India in 2009. Her research interests include computer networks and cloud computing. She has published various papers in the field of mobility in wireless networks, analysis and implementation of mechanisms to optimise network performance in high speed networks and cloud environment.

1

Introduction

Cloud computing is emerging as the latest distributed computing paradigm that provides a dynamically scalable service delivery and consumption platform facilitated through virtualisation of hardware and software with the provision of consuming various services on demand over

Copyright © 2014 Inderscience Enterprises Ltd.

Internet (Foster et al., 2008). It attracts increasing interests of researchers in the area of Distributed and Parallel Computing and Service Oriented Computing (Buyya et al., 2009). Cloud computing delivers hardware infrastructure and software applications as services (Verma and Kaushal, 2011). It adopts a market-oriented business model where users are charged for consuming cloud services such as

Deadline constraint heuristic-based genetic algorithm computing, storage and network services like conventional utilities in everyday life (e.g., water, electricity, gas and telephony) (Gabriel et al., 2011), on a pay-as-you-go basis. Meanwhile, cloud service providers are obligated to provider satisfactory Quality of Service (QoS) based on business service contracts (Geelan, 2008). Workflow scheduling is one of the key issues in the workflow management especially in the grid and cloud workflow systems. It is a process that maps and manages the execution of inter-dependent tasks on the distributed resources (Taylor et al., 2007). It allocates suitable resources to workflow tasks such that the execution can be completed to satisfy objective functions imposed by users. Proper scheduling can have significant impact on the performance of the system (Deelman et al., 2008). However, in general, the problem of mapping tasks on distributed services belongs to a class of problems known as NP-complete problems (Yu and Buyya, 2005a). For such problems, no known algorithms are able to generate the optimal solution within polynomial time (Wieczorek et al., 2007). There are two major types of workflow scheduling, besteffort based and QoS constraint based scheduling, primarily for grid workflow management systems (Yu and Buyya, 2008). These scheduling algorithms attempt to minimise the execution time, ignoring other factors such as the monetary cost of accessing resources. But, as cloud computing adopts ‘market-oriented business model’, so there is another important parameter other than the execution time, i.e. cost of accessing resources (Liu, 2009; Pandey 2010). Usually, faster resources are more expensive than the slower ones. Therefore, scheduling algorithms applicable in grid workflow scheduling cannot be directly applied over cloud workflow applications (Fida 2008; Pandey et al., 2011). In this paper, we proposed deadline constraint Heuristic-based Genetic Algorithms (HGAs) to schedule applications to cloud resources that minimise the execution cost while meeting the deadline for delivering the result. The remaining paper is organised as follows: Section 2 gives the related work in the area of workflow scheduling. Section 3 presents the workflow scheduling model. The Standard Genetic Algorithm (SGA) and proposed HGAs are discussed in Sections 4 and 5, respectively. The proposed HGAs are evaluated and compared with SGA in Section 6 and Section 7 concludes the paper.

2

Related work

Workflow scheduling is classical NP-complete problem (Yu and Buyya, 2005a). The major grid workflow scheduling algorithms have been classified into two basic categories which are best-effort based scheduling and QoS constraint based scheduling (Yu and Buyya, 2005b). In traditional community based computing paradigms, best-effort based scheduling strategies based are often applied to only minimise the execution time without considering the monetary cost since resources are shared freely among

97 system users. Many heuristic algorithms such as Minimum Execution Time, Minimum Completion Time, Min-min and Max-min are used as candidates for best-effort based scheduling strategies (Yu and Buyya, 2008). Yong et al. (2011) has proposed a Deadline and Budget Constrained (DBC) scheduling heuristics to deal with sequential workflow applications in grids, considering completion time, budget and Relative Cost (RC) together as QoS constraints. Yu and Buyya (2005a) proposed a cost-based workflow scheduling algorithm to schedule scientific workflow applications on utility grid that minimises execution cost while meeting the deadline for delivering results. It also handled the delays of service executions by rescheduling unexecuted tasks. A budget constraint, priority rule-based iterative heuristic, Maximum Profit with Serial Complexity (MPSC) for grid workflows, has also been proposed to improve iteratively the initial feasible solution within a few iterations and less computation time (Yuan et al., 2009). Many researchers used Genetic Algorithm (GA) for task assignment. Yu and Buyya (2006a, 2006b) proposed a modified GA as the crossover and mutation operations of SGAs that focused on homogenous and non reservationenabled multiprocessor systems are unable to be applied on the grid directly. The fitness function of GA is developed to encourage the formation of the solutions to achieve the budget and deadline constraint time minimisation of workflow execution in grids while meeting a specified budget for delivering results (Florin et al., 2009). Similarly, Xue and Wenhua (2010) had proposed an improved genetic algorithm for grid workflows in which, chromosomes of poor fitness make secondary preferential hybridisation and mutation with the overall best individual to increase the convergence rate of population. Fatima and Mona (2010) developed Critical Path Genetic Algorithm (CPGA), based on rescheduling Critical Path Nodes (CPNs) in the chromosome through different generations and Task Duplication Genetic Algorithm (TDGA), based on task duplication techniques to overcome the communication overhead and improved the scheduling performance. Andrew et al. (2010) proposed a multi-heuristic evolutionary task algorithm to dynamically map tasks to processor in a heterogeneous distributed system utilising GA, combined with eight heuristics in an effort to minimise the total execution time. A Hierarchic Genetic Scheduler (Joanna and Samee, 2012) is developed for improving the effectiveness of the single population genetic based scheduler in the dynamic grid environment for scheduling independent jobs. The authors considered the bi-objective independent batch job scheduling problem with makespan and flowtime minimised in hierarchical mode. Furthermore, a two phase algorithm, called H2GS (Mohammad and Nawwaf, 2011) has been proposed for task scheduling in heterogeneous processor networks. The first phase implements a heuristic list based algorithm, called LDCP to generate a high quality schedule. In the second phase, this schedule is injected into the initial population of GA, which proceeds to evolve shorter schedules. Amir and Mohammad (2008) embedded the elitism method

98

A. Verma and S. Kaushal

into GA to generate the shorter schedules as well as to decrease the computation time to find the sub-optimal schedule as compared to basic GA. The authors further extend their work by improving sub-optimal results using Simulated Annealing (SA) (Amir and Mohammad, 2009) by considering the multiprocessor systems. Only few researches addressed the workflow scheduling on the clouds. For cloud workflow systems, mainly QoSconstraint based scheduling strategies based on marketoriented business model are required. A compromised-timecost scheduling algorithm has been proposed to accommodate transaction-intensive (Yang et al., 2008) and instanceintensive (Ke et al., 2010) cost-constrained workflows in cloud respectively by compromising execution time and cost with user input enabled on the fly. The algorithm cut down the mean execution cost by over 15% whilst meeting the userdesignated deadline or shortens the mean execution time by over 20% within the user-designated execution cost. A market-oriented hierarchical scheduling strategy (Zhangjun et al., 2011) has been developed for instance intensive workflow applications, to do the workflow scheduling at two levels in cloud environment. A package based random scheduling algorithm has been presented as the candidate service-level scheduling algorithm and three representative metaheuristic based scheduling algorithms including GA, Ant Colony Optimisation (ACO) and Particle Swarm Optimisation (PSO) were adapted, implemented and analysed as the candidate tasklevel scheduling algorithms. However, all these works do not consider different pricing model of cloud environment. Saeid et al. (2013) proposed two workflow scheduling algorithms for cloud environment: one-phase algorithm, IC-PCP and two-phase algorithm, IC-PCPD2. Both algorithms have a polynomial time complexity for scheduling large workflows under deadline constrained. The author considered different type of pricing model for simulation. So we used Genetic Algorithm to schedule workflow applications to cloud resources by considering the different pricing model. In this paper, we focus on minimising the execution cost while meeting the deadline for delivering the result.

3

Workflow scheduling model

A workflow application is modelled by a Directed Acyclic Graph (DAG), defined by a tuple G (T, E), where T is the set of n tasks {t1, t2,, tn} and E is a set of e edges, represent the dependencies. Each ti ε T, represents a task in the application and each edge (titj) ε E represents a precedence constraint, such that the execution of tj ε T cannot be started before ti ε T finishes its execution (Verma and Kaushal, 2012). If (ti, tj) ε T, then ti is the parent of tj and tj is the child of ti. A task with no parent is known as an entry task and a task with no children is known as exit task.

3.1 Basic definitions Bottom level (b-level): The b-level of a task is the length of the longest path from the task to a leaf task (Amir and Mohammad, 2009). The b-level of node is calculated as:

 

blevel  ti   wi  max {dij  blevel t j } t j  succ  ti 

(1)

where wi is the average execution time of the task on the different computing machines. succ(ti) includes all the children tasks of ti. dij is the data transmission time from a task ti–tj. If a task has no children, its b-level is equal to the average execution time of the task on the different computing machines. Top-level (t-level): The t-level of a task of DAG is defined to be the length of the longest path from the task to the entry task without considering the execution time of that task (Amir and Mohammad, 2009) and is given by the following equation: tlevel  ti  

 

max {dij  tlevel t j  wi }

t j  pred  ti 

(2)

where wi is the average execution time of the task on the different computing machines. pred(ti) includes all the parent tasks of ti. dij is the data transmission time from a task ti–tj. For entry task i.e. a task has no parent, its t-level is equal to zero. Estimated completion time (ECT): The ECT is a n x m matrix where ECTi,j shows the estimated completion time of a task ti on the machine mj. The users are charged based upon the number of time intervals that they have used the particular machine. All computation and storage services of service provider are assumed to be in the same physical region, so the average bandwidth between the different available machines is roughly equal (Saeid et al., 2013).

3.2 An illustrative example Consider a DAG with 11 tasks as shown in Figure 1. Each edge weight of DAG represents the data transmission time between the tasks. Figure 1

A sample DAG

T1

T3

T2

3

5

T4

1

3

1

2 T6

T5

2

5

3

T7

1 2

3

1 T9 T8

T10

T11

Deadline constraint heuristic-based genetic algorithm Table 1 shows the expected completion time of various tasks on three different machines. b-level and t-level of all tasks is calculated using equations (1) and (2), respectively. Then the tasks are sorted in descending order of b-level and in ascending order of t-level to decide the order of execution of all the tasks (Table 2). Table 1

99 Figure 3

M1:

ECT matrix

Tasks/Machines

M1

M2

M3

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11

3 2 3 2 2 2 2 4 3 2 5

5 3 5 3 3 3 3 6 5 3 7

1 1 1 1 1 1 1 2 1 1 3

Table 2

b-level and t-level of DAG

Parameters AvgECT tasks T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11

3 2 3 2 2 2 2 4 3 2 5

Order of execution b-level t-level according to b-level 16 17 14 13 11 8 10 4 3 2 5

0 0 0 0 5 6 7 12 11 10 12

Order of execution according to t-level

2 1 3 4 5 7 6 9 10 11 8

1 2 3 4 5 6 7 10 9 8 11

The tasks are sent to different machines according to their order of execution for completion of workflow application. Figures 2 and 3 show the schedules generated according to b-level and t-level of DAG, respectively. Figure 2

M1:

M2:

M3:

T2

T3

T4

T7

T5

T10

T6

T1

M2:

T2

M3:

T3

T4

T7

T5

T10

T6

T8

T11

T9

Standard genetic algorithm (SGA)

Genetic algorithms provide robust search techniques that allow a high-quality solution to be derived from a large search space in polynomial time, by applying the principle of evolution. GA is a specific class of evolutionary algorithms inspired by evolutionary biology. A genetic algorithm combines the exploitation of best solutions from past searches with the exploration of new regions of the solution space. Any solution in the search space of the problem is represented by an individual or chromosome (Yu and Buyya, 2006a). A genetic algorithm maintains a population of individuals that evolves over generations towards the better solutions through a repetitive application of genetic operators such as crossover, mutation and selection (Yu and Buyya, 2006b). The quality of an individual in the population is determined by a fitnessfunction. The fitness value indicates how good the individual is compared to others in the population (Kumar and Verma, 2012).

4.1 Representation of individual in the population A two-dimensional coding scheme has been employed by many researchers (Yu and Buyya, 2006a; Yu and Buyya, 2006b) for scheduling tasks in distributed systems. As illustrated in Figures 2 and 3, each schedule is simplified by representing it as a 2D string. One dimension represents the numbers of machines while the other dimension shows the order of tasks on each machine.

4.2 Fitness function

Schedule according to b-level T1

4

Schedule according to t-level

A fitness function is used to measure the quality of the individuals in the population. The fitness function should encourage the formation of the solution to achieve the objective function. For deadline constrained scheduling problem, GA defines a fitness function as:

T8

T11

T9

F I  

t I  D

(3)

where t(I) is the completion time of an individual I and D is the user specified deadline for scheduling the workflow application. An individual is fit if the value of F(I) < 1, otherwise the individual is not included into the population.

100


4.3 Crossover operation The idea behind the crossover is that it may result in an even better individual by combining two fittest individuals. Crossovers are used to create new individuals in the current population by combining and rearranging parts of the existing individuals, selected with a crossover probability Cr as shown in Figure 4. A crossover point is selected according to height of tasks (Edwin et al., 1994) that will cut the ordered list of tasks for a particular machine into two halves.

selects a resource and swaps the positions of two randomly selected tasks on the resource. Replacing mutation reallocates an alternative resource to a task in an individual. It randomly selects a task and replaces its current resource assignment with a resource randomly selected in the resources which are able to execute the task (Edwin et al., 1994). Mutation operation is applied using mutation probability Mr. A random number between 0 and 1 is generated. If that number is less than Mr, then swap mutation is applied otherwise replace mutation is applied.

4.4 Mutation operation

4.5 Selection operation

There are two types of mutation operations, swapping mutation and replacing mutation. Swapping mutation randomly

The selection operation is implemented using the roulette wheel (Edwin et al., 1994).

Figure 4

Applying crossover to two fittest individuals Parent 1:

Parent 2:

M1

T2

T4

M2:

T1

T5

M3:

T3

T7

T6

T9

T10

T8

M1 :

T1

T4

T6

M2:

T2

T5

T8

M3:

T7

T3

Crossover points

T11

T10 Crossover points

T11

T9

(a) Two Fittest individuals along their crossover points

Child 1:

M1:

T2

T6

T4

T10

M2: T1 M3:

Child 2:

M1:

T3

T1

T5

T7

T4

T11

T8

T9

T6

T9

T10

T11

M2: T2

T5

M3: T3

T7

T8

(b) Two children after applying crossover operation on (a)

Deadline constraint heuristic-based genetic algorithm The pseudocode for SGA is given as: Pseudocode for SGA

101 Pseudocode for TGA 1

BEGIN

2

Calculate the t-level of all the tasks of workflow using the equation (2).

3

Create the initial population of TGA as the first individual is created by assigning the tasks according to their t-level in ascending order to the available machines. The rest of the individuals are created using random assignment of the tasks to the available machines. Each individual is encoded using the 2-d encoding.

1

BEGIN

2

Create an initial population consists of randomly generated solutions.

3

While termination criteria are not met do

4

Evaluate the fitness of the individual in the population using equation (3).

5

Apply the selection operator to select the parent from the population

4


6

Apply the crossover operator on the selected parent using crossover probability Cr to create the children.

5


7

Apply the mutation operator with probability Mr on the newly created children.

6


8

Validate each child according to the fitness function.

7


9

Add the valid child to create the new population

10

end while.

8


11

END

9


To increase the population diversity of SGA, we proposed deadline constraint heuristic-based genetic algorithms.

5

Proposed algorithms

5.1 Bottom-level GA (BGA) A deadline constraint heuristic-based BGA is proposed in which each workflow task’s priority is assigned using bottom level. Then, these priorities are used to create the initial population of BGA. The pseudocode for the BGA is given below: Pseudocode for BGA 1

BEGIN

2

Calculate the b-level of all the tasks of workflow using the equation (1).

3

Create the initial population of BGA as the first individual is created by assigning the tasks according to their b-level in descending order to the available machines. The rest of the individuals are created using random assignment of the tasks to the available machines. Each individual is encoded using the 2-d encoding.

10


11

end while.

12

END

5.2 Top-level GA (TGA) In TGA, the first individual of initial population is created by assigning the tasks according to their t-level in ascending order to the available machines. The rest of the individuals are created using random assignment of the tasks to the available machines. The pseudocode for TGA is presented above.

5.3 Bottom-level and top-level GA (BTGA) BTGA creates the first individual of the initial population by assigning the tasks according to their b-level in descending order to the available machines. For the rest of the individuals, firstly, the priority of each task is set equal to the total of its b-level and a random number which is generated in the range of its (t-level/2, –t-level/2). Then all the tasks are assigned to the available machines according to their priority. The pseudocode for BTGA is given below: Pseudocode for BTGA

4


1

5


2

6


7


8


9


10


11

end while.

12

END

3

4 5

BEGIN Calculate the b-level and t-level of all the tasks of workflow using the equations (1) and (2). Create the initial population of BTGA as the first individual is created by assigning the tasks according to their b-level in descending order to the available machines. For the rest of the individuals, firstly, the priority of each task is set equal to the total of its b-level and a random number which is generated in the range of its (t-level/2, –t-level/2). Then all the tasks are assigned to the available machines according to their priority. Each individual is encoded using the 2-d encoding. While termination criteria are not met do Evaluate the fitness of the individual in the population using equation (3).

102 6 7 8 9 10 11 12

6

A. Verma and S. Kaushal Apply the selection operator to select the parent from the population Apply the crossover operator on the selected parent using crossover probability Cr to create the children. Apply the mutation operator with probability Mr on the newly created children. Validate each child according to the fitness function. Add the valid child to create the new population end while. END

6.1 Experiment setup

Performance evaluation

In this section, we present our simulations of all the proposed algorithms. To evaluate the workflow scheduling algorithm, we used five synthetic workflows based on realistic workflows from diverse scientific applications, which are: 

Montage: Astronomy



CyberShake: Earthquake



Epigenomics: Biology



LIGO: Gravitational physics



SIPHT: Biology

The detailed characterisation for each workflow including their structure and data and computational requirements can be Figure 5

found in Bharathi et al. (2008). Figure 5 shows the approximate structure of a small instance of each workflow. These workflows have different structural properties in terms of their basic components, i.e. pipeline, data aggregation, data distribution and their composition. For each workflow, tasks with same colour are of similar type. The Directed Acyclic Graph in XML (DAX) format for all these workflows are available at website (http://confluenece.peagasus.isi.edu/ display.peagasus/WorkflowGenerator), from which we have chosen three sizes for our experiment, i.e. Small (about 50), Medium (about 100 tasks) and large (about 1000 tasks).

For our experiment, we have simulated a cloud environment in java which consists of a data centre. The Virtual Machines (VMs) are created over the physical resources in a data centre. So, VMs in a data centre includes the resources with different processing speed and hence with different pricing models. The processor speeds of VMs are selected randomly in the range of 1000–5000 MIPS and price of using these VMs is set within a range of 2–10 basic units such that fastest VM is roughly five times more expensive than the slowest one. The average bandwidth between these VMs is assumed to be 20 Mbps (Saeid et al., 2013). The network transfer within a data centre is assumed to be free and only the cost for storage is considered during experiment.

Structure of various workflows (see online version for colours)

(a) Montage

(b) Epigenomics

(d) LIGO

(c) SIPHT

(e) CyberShake

Deadline constraint heuristic-based genetic algorithm The other important parameter of the experiments is the time interval. Most of the current commercial clouds charge users based on a long time interval equal to one hour like Amazon (www.amaazon.com). Here a user has to pay for the whole last time interval even if the user has used a fraction of it. Therefore, the users prefer shorter time intervals in order to pay close to what they have really used like CloudSigma (www.cloudsigma.com). To evaluate the impact of short and long time interval on our algorithms, we consider two different time intervals: a long one equal to one hour and a short one equal to five minutes (Saeid et al., 2013).

6.2 Performance metrics The performance metric chosen for the comparison is Normalised Schedule Cost (NSC). The NSC of a schedule is calculated as: NSC 

Total ScheduleCost Cc

(4)

where Cc is the cost of executing the same workflow with the cheapest strategy i.e. to schedule all workflow tasks on the cheapest VM, according their precedence constraints. For assigning the deadline, first we define the fastest schedule as scheduling each workflow task on the fastest VM, according their precedence constraints, considering all data transmission time as zero. Thus the makespan of this fastest schedule, denoted by Mf, is a lower bound for the makespan Figure 6

103 of executing workflow. So, the deadline for whole workflow is defined as: Deadline= α * Mf where α is a deadline factor in range from 1.5 to 5 (Saeid et al., 2013). For GA, the following parameters are set: Parameter Initial population Crossover probability (Cr) Mutation probability (Mr) Maximum generation

Value 10 0.7 0.1 50

6.3 Experiment result As GA is a stochastic algorithm, so each algorithm is run ten times for each workflow and the average value of NSC is used for comparing BGA, TGA, BTGA and SGA. Figure 6 shows the average NSC of scheduling large workflows with BGA, TGA, BTGA and SGA with a time interval of 5 minutes. It shows that BTGA outperforms than the other three algorithms in all cases. It is clear from Figures 6 (a) and (b) that average NSC for Montage and Cybershake is high as compared to other workflow structures as both of these workflow structure consist of large number of tasks with smaller execution time on fastest VM at the second row, thus increasing the overall execution cost of the workflow. In case of SIPHT, the performance of SGA and TGA is very similar as its structure consists of large number of top level tasks having priority zero.

The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version for colours)

(a) Montage

(c) Epigenomics

(b) CyberShake

(d) LIGO

104 Figure 6

A. Verma and S. Kaushal The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to 5 min (see online version for colours) (continued)

(e) SIPHT Source: http://confluenece.peagasus.isi.edu/display.peagasus/ WorkflowGenerator

Figure 7 shows the average NSC of scheduling large workflows with BGA, TGA, BTGA and SGA with a time interval of one hour. The overall results are almost the same as Figure 6, except the value of NSC is increased as expected. It shows that BTGA outperforms than the other Figure 7

three algorithms in all cases. As Montage and CyberShake consist of larger number of smaller jobs, their NSC value is very high for a time interval of one hour as compared to other three workflows. For scheduling small and medium size workflow, we are getting the similar graphs.

The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour

(a) Montage

(c) Epigenomics

(b) CyberShake

(d) LIGO

Deadline constraint heuristic-based genetic algorithm Figure 7

105

The NSC of scheduling workflows with BGA, TGA, BTGA and SGA with the time interval equals to one hour (continued)

(e) SIPHT

7

Conclusion and future work

In this paper, we have presented deadline constrained heuristic-based genetic algorithms (HGAs) to schedule applications to cloud resources that minimise the execution cost while meeting the deadline for delivering the result. Each workflow’s task is assigned priority using bottom level and top level. These priorities are then used to create the initial population of BGA, TGA and BTGA. The proposed algorithms are evaluated with synthetic workflows that are based on realistic workflows with different structures and different sizes. These algorithms also consider the pay-asyou-use pricing model of current commercial clouds. The comparison of proposed algorithms is done with SGA under same deadline constraint and pricing model. The simulation results show that our proposed algorithms have a promising performance as compared to SGA. From the three proposed algorithms, BTGA is outperformed in all cases. In future we intend to improve our work for the real cloud environment including other QoS constraints and comparison can be made with other meta-heuristic techniques like PSO and ACO, etc.

References Amir, M.R. and Mohammad, A.V. (2008) ‘A novel task scheduling in multiprocessor systems with genetic algorithm by using elitism stepping method’, INFOCOMP – Journal of Computer Science, Vol. 7, No. 2, pp.58–64. Amir, M.R. and Mohammad, A.V. (2009) ‘A novel genetic algorithm for static task scheduling in distributed systems’, Journal of Computer Theory and Engineering, Vol. 1, No. 1, pp.1–6. Andrew, J.P., Thomas, M.K. and Thomas, J.N. (2010) ‘Multiheuristic dynamic task allocation using genetic algorithms in a heterogeneous distributed system’, Journal of Parallel and Distributed Computing, Vol. 70, No. 7, pp.758–766. Bharathi, S., Lanitchi, A., Deelman, E., Mehta, G., Su, M.H. and Vahi, K. (2008) ‘Characterization of scientific workflows’, 3rd Workshop on Workflows in Support of Large Scale Science, 17 November, Austin, TX, USA, pp.1–10. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I. (2009) ‘Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility’, Journal of Future Generation Computer Systems, Vol. 25, No. 6, pp.599–616.

Deelman, E., Gannon, D, Shields, M. and Taylor, I. (2008) ‘Workflows and e-science: an overview of workflow system features and capabilities’, Journal of Future Generation Computer Systems, Vol. 25, No. 5, pp.528–540. Edwin, S.H, Nirwan, A. and Hong, R. (1994) ‘A genetic algorithm for multiprocessor scheduling’, IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 2, pp.113–120. Fatima, A.O. and Mona, M.A. (2010) ‘Genetic algorithms for task scheduling problem’, Journal of Parallel and Distributed Computing, Vol. 70, No. 1, pp.13–22. Fida, A. (2008) Workflow Scheduling For Service Oriented Cloud Computing, Unpublished PhD Thesis, University of Saskatchewan, Saskatoon, Canada. Florin, P., Ciprian, D. and Valentin, C. (2009) ‘Genetic algorithm for DAG scheduling in grid environments’, IEEE ICCP 2009: Proceeding of International Conference on Intelligent Computer Communications and Processing, 27–29 August, Cluj-Napoca, Romania, pp.299–305. Foster, I., Zhao, Y., Raicu, L. and Lu, S. (2008) ‘Cloud computing and grid computing 360-degree compared’, Proceeding of Grid Computing Environment Workshop, 12–16 November, Austin, TX, USA, pp.1–10. Gabriel, M., Wolfgang, G. and Calvin, J.R. (2011) ‘Hybrid computing—where hpc meets grid and cloud computing’, Journal of Future Generation Computer Systems, Vol. 27, No. 5, pp.440–453. Geelan, J. (2008) ‘Twenty one experts define cloud computing: virtualization’, Electronic Magazine. Available online at: http://virtualization.sys-con.com/node/612375 Joanna, K. and Samee, U.K. (2012) ‘Multi-level hierarchic genetic-based scheduling of independent jobs in dynamic heterogeneous grid environment’, Journal of Information Sciences, Vol. 214, No. 12, pp.1–19. Ke, L., Hai, J., Jinjun, C., Xiao, L., Dong, Y. and Yun, Y. (2010) ‘A compromised-time-cost scheduling algorithm in SwinDeW-C for instance-intensive cost-constrained workflows on cloud computing platform’, International Journal of High Performance Computing Applications, Vol. 24, No. 4, pp.445–456. Kumar, P. and Verma, A. (2012) ‘ Scheduling using improved genetic algorithm in cloud computing for independent tasks’, Proceeding of International Conference on Advances in Computing, Communications and Informatics, 3–5 August, Chennai, India, pp.137–142. Liu, K. (2009) Scheduling Algorithms for Instance Intensive Cloud Workflows, Unpublished PhD Thesis, Swinburne University of Technology, Melbourne, Australia. Mohammad, I.D. and Nawwaf, K. (2011) ‘A hybrid heuristicgenetic algorithm for task scheduling in heterogeneous processor networks’, Journal of Parallel and Distributed Computing, Vol. 71, No. 11, pp.1518–1531.

106


Pandey, S. (2010) Scheduling And Management Of Data Intensive Application Workflows In Grid And Cloud Computing Environment, Unpublished PhD Thesis, University of Melbourne, Melbourne, Australia. Pandey, S., Karunamoorthy, D. and Buyya, R. (2011) ‘Workflow engine for clouds’, Cloud Computing: Principles and Paradigms, Chapter 12, Weliy STM. Saeid, A., Mahmoud, N. and Dick, H.J.E. (2013) ‘Deadlineconstrained workflow scheduling algorithms for infrastructure as a service clouds’, Journal of Future Generation Computer Systems, Vol. 29, No. 1, pp.158–169. Taylor, I., Deelman, E., Gannon, D. and Shields, M. (2007) Workflows for E-Science: Scientific Workflows for Grid, 1st ed., Springer. Verma, A. and Kaushal, S. (2011) ‘Cloud computing security issues and challenges: a survey’, Proceeding of 1st International Conference on Advances in Computing and Communications, Part-IV, 22–24 July, Kochi, India, pp.445–454. Verma, A. and Kaushal, S. (2012) ‘Deadline and budget distribution based cost-time optimization workflow scheduling algorithm for cloud’, IJCA Proceeding of International Conference on Recent Advances and Future Trends in IT, iRAFIT(7), 13–17 April, Patiala, India, pp.1–4. Wieczorek, M., Prodan, R. and Hoheisel, A. (2007) Taxonomies of the Multi-Criteria Grid Workflow Scheduling Problem, CoreGRID Technical Report Number TR-0106, 30 August. Xue, Z. and Wenhua, Z. (2010) ‘Grid workflow scheduling based on improved genetic algorithm’, Proceeding of International Conference on Computer Design and Applications, 25–27 June, Qinhuangdao, China, Vol. 5, pp.270–273. Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D. and Jin, H. (2008) ‘An algorithm in swindew-c for scheduling transactionintensive cost-constrained cloud workflows’, e-Science08: Proceeding of 4th IEEE International Conference on eScience, 7–12 December, Indianapolis, IN, USA, pp.374–375. Yong, W., Bahati, R.M. and Bauer, M. (2011) ‘A novel deadline and budget constrained scheduling heuristics for computational grids’, Journal of Central. South University of Technology, Vol. 18, No. 2, pp.465−472.

Yu, J. and Buyya, R. (2005a) ‘Cost based scheduling of scientific workflow application on utility grid’, Proceeding of 1st IEEE International Conference on e-Science and Grid Computing, 1 July, Melbourne, Australia, pp.8–147. Yu, J. and Buyya, R. (2005b) ‘A taxonomy of workflow management systems for grid computing’, Journal of Grid Computing, Vol. 3, Nos. 1/2, pp.171–200. Yu, J. and Buyya, R. (2006a) ‘A budget constraint scheduling of workflow application on utility grid using genetic algorithm’, HPDC 2006: Proceeding of 15th IEEE International Symposium on High Performance Distributed Computing, 19–23 June, Paris, pp.1–10. Yu, J. and Buyya, R. (2006b) ‘Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms’, Scientific Programming Journal, Vol. 14, No. 3, pp.217–230. Yu, J. and Buyya, R. (2008) ‘Workflow scheduling algorithms for grid computing’, in Xhafa, F. and Abraham, A. (Eds): Metaheuristics for Scheduling in Distributed Computing Environment, Springer, Berlin. Yuan, Y-C., Wang, K-J., Sun, X-S. and Guo, T. (2009) ‘An iterative heuristic for scheduling grid workflows with budget constraints’, Proceeding of 8th International Conference on Machine Learning and Cybernetics, 12–15 July, Baoding, China, pp.1700–1705. Zhangjun, W., Xiao, L., Zhiwei, N., Dong, Y. and Yun, Y. (2013) ‘A market-oriented hierarchical scheduling strategy in cloud workflow systems’, Journal of Supercomputing, Vol. 63, No. 1, pp.256–293.

Websites http://confluenece.peagasus.isi.edu/display.peagasus/WorkflowGe nerator (accessed 20 July 2012). www.amazon.com (accessed on 10 August 2012). www.cloudsigma.com (accessed on 17 August 2012).