Hybrid Genetic Algorithm for Cloud Computing Applications

3 downloads 14928 Views 651KB Size Report
balancing in cloud computing has become complicated and it is difficult to achieve. Multi-agent genetic algorithm (MAGA) is a hybrid algorithm of GA, whose ...
Hybrid Genetic Algorithm for Cloud Computing Applications Kai Zhu†, Huaguang Song†, Lijing Liu‡, Jinzhu Gao†, Guojian Cheng‡ †

School of Engineering and Computer Science

University of the Pacific, Stockton, CA 95211 Email: kzhu, hsong, [email protected]

School of Computer Science Xi'an Shiyou University, Dian Zi 2nd Road 18, Xi'an, Shaanxi P.R. China 710065 Email: [email protected], [email protected]

Abstract—In the cloud computing system, the schedule of computing resources is a critical portion of cloud computing study. An effective load balancing strategy is able to markedly improve the task throughput of cloud computing. Virtual machines are selected as a fundamental processing unit of cloud computing. The resources in cloud computing will increase sharply and vary dynamically due to the utilization of virtualization technology. Therefore, implementation of load balancing in cloud computing has become complicated and it is difficult to achieve. Multi-agent genetic algorithm (MAGA) is a hybrid algorithm of GA, whose performance is far superior to that of the traditional GA. This paper demonstrates the advantage of MAGA over traditional GA, and then exploits multi-agent genetic algorithms to solve the load balancing problem in cloud computing, by designing a load balancing model on the basis of virtualization resource management. Finally, by comparing MAGA with Min_min strategy, the experiment results prove that MAGA is able to achieve better performance of load balancing. Keywords-cloud computing, load balance, multi-agent genetic algorithm, virtualization technology

I. INTRODUCTION Cloud computing is an inevitable trend in the future computing development of technology. Its critical significance lies in its ability to provide all users with high performance and reliable calculation. Cloud computing is the evolution of distributed computing, grid computing, and multiple other techniques. One of the primary differences between cloud computing and previous large-scale cluster computing is that cloud computing uses distributed computing and grid computing as the fundamental processing unit. In cloud computing, by using virtualization technology [8], one physical host can be virtualized into multiple virtual hosts and use these hosts as a basic computing unit. By adopting virtualization technology, cloud computing in tandem with conventional cluster computing will greatly improve utilization of hardware and also achieve automatic monitoring for all hosts. Virtualization technology not only has brought a lot of convenience to cloud computing, but it also has made a large number of virtual resources available in the cloud. The quantity of these virtual

resources is both enormous and dynamically changing. Therefore, load balancing of the host in cloud computing is one of the primary concerns in research. Proposed by Professor J. H. Holland from the University of Michigan in the early 1960s, GA was the first evolutionary computation algorithm [7], extracting, simplifying, and abstracting the basic ideas from Darwin’s theory of evolution and Mendel’s laws of inheritance. Using the evolution theory of biosphere as a reference, the algorithm utilizes computers to simulate the natural selection mechanism of parent gene recombination and “survival of the fittest” in the process of species reproduction. It can be exploited to solve complicated problems in science and engineering. In recent years, research and application of GA has been rapidly developed and widely utilized. It can be applied to solve the complicated issues in science and engineering. GA brings a remarkable impact to many growing fields, such as artificial intelligence, knowledge discovery, pattern recognition, image processing, decision analysis, product process design, resource scheduling, and stock market analysis. However, there exist some restricting conditions in solving high-dimensional function optimization problems, rendering GA less effective in cloud computing. When using classic GA to solve problems, such as coarse-grained, highdimensional, and large data set optimization problems, issues like imperfect convergence, slow convergence, and no convergence are inevitable. Therefore, scholars have proposed a variety of improved genetic algorithms. This paper mainly focuses on multi-agent genetic algorithm (MAGA) [10]. It is a hybrid algorithm combining GA and multi-agent techniques, which was originally proposed by Professor Licheng Jiao. MAGA is a kind of improved hybrid GA; execution demonstrates greatly enhanced convergence time and optimization results compared to that of traditional GA. MAGA has obvious superiority, especially when handling very large-scale, highdimensional, complex, and dynamic optimization problems. Therefore, this paper first introduces the strengths of MAGA by comparing it with GA. Then, a build model will be used

to convert it into a mathematical problem based on load balancing of virtualized cloud computing. At last, we use MAGA to solve load-balancing strategy, and compare the result with common Min_min algorithm [11]. II. RELATED WORK The basic principle of cloud computing is to distribute computing tasks across a large number of distributed computers rather than the local computer. Hu et al. [1] proposed a scheduling strategy for VM load balancing in cloud computing based on GA. This could effectively improve overall system reliability and availability, which is one of the primary concerns in cloud computing. Gong et al. [2] conducted an analysis for the features of cloud computing. He explored the applications of virtualization technology which stems from an advanced virtual host. And then, apply the virtualization technology into resource management and virtualization storage. In recent years, artificial intelligence methods such as evolutionary computation, especially its branch in genetic algorithms, has gradually drawn attention by people due to its intelligence and implied parallelism [13]. GA has been widely applied to solve the problem of resources scheduling in large-scale, nonlinear cluster systems, and has achieved ideal effects [3]. The core of resources scheduling technology lies in the scheduling algorithm. At present, there exist numerous algorithms for cluster resources scheduling, such as the round-robin algorithm, least connection scheduling, the minimum number of task algorithm, and the minimum response time algorithm [4]. Later, some other scholars proposed a series of dynamic algorithms, such as resources scheduling algorithm based on task priority, dynamic weighted resources scheduling algorithm, and queue resources scheduling algorithm [5]. Some scholars also applied AI methods into resources scheduling of large-scale cluster system, such as particle swarm optimization algorithm (PSO) [9] and genetic algorithm (GA) [6]. Experiments show that the artificial methods can achieve more optimal load balancing than traditional approaches. III. MULTIPLE AGENT GENETIC ALGORITHM A. Agent Genetic Algorithm From the agent perspective, treat an individual within GA as an agent. This agent is capable of local perception, competition, cooperation, self-learning, and reaching the purpose of global optimization through the interaction between both agent and environment, and agent and agent. This is the idea of MAGA [10]. The implementation mechanism of MAGA is quite different from GA’s, and is mainly manifested in the interaction, collaboration makes available, and self-learning among individuals.

B. Individual Survival Environment Like GA, MAGA still conducts manipulations on individual. In MAGA, each individual MAGA is considered an agent, capable of sensing, changing, and impacting its environment autonomously, thus possessing its own characteristics. All of the agents live in the agent grid environment, as shown in Figure 1 below: 1,

1,1

1,2

...

Lsize

2,1

2,2

...

Lsize

1,1

1,1

...

...

Lsize ,

Lsize ,

...

Lsize , Lsize

1

2

2,

Figure 1. Agent Grid

C. Genetic Operator In MAGA, the genetic operators mainly include: neighborhood competition operator, neighborhood orthogonal crossover operator, mutation operator, and the self-learning operator. Among these operators, the neighborhood competition operator realized the operation of competition among all agents; the neighborhood orthogonal crossover operator achieved collaboration among agents; the mutation and self-learning operators accomplished the behavior that agents exploit their own knowledge [10]. D. Comparison between MAGA and GA Table 1 illustrates the distinctions in the genetic operation between GA and MAGA. GA

MAGA

Individual

Isomorphic

Isomerous

Information interaction method

After selected, through crossover operation

Obtained four neighborhood information,and self updating

Selection, crossover, mutation

Neighborhood competition, Orthogonal crossover, mutation, self-learning

Self-learning

No

Yes

Evolution

Evolve without purpose

Evolve with purpose

Competition

Roulette selection

Interaction with neighborhood

Genetic operator

Table 1. The Operation Difference

Exemplifying the function optimization, compare the performance of MAGA and GA. Optimization function shown as follows, setting n = 20.

n

f ( x ) = ∑ xi sin

n

S = [ −500, 500] (1)

xi ,

i =1

Figure 2 shows the experiment results by comparing the optimal function values from ten times running results of MAGA and GA.

VM is able to allocate several groups. Each group can be described as: Group (ReqPerHrPerUser, ReqSize, ReqCPU, ReqMemory, Count). The third parameter within Group is the size of memory consumed to execute each request on average. Therefore, after completing a group of tasks, the memory load of the ith VM, VM_i, is:

Mli = M i + Vmpi / Vmi × 100% (3)

Optimizing value

Result for 10 times MAGA and GA operation 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

GA MAGA 1

2

3

4

5

6

7

8

9

10

In this formula, Vmpi and Vmi are constant, and Mi is the rest of the memory percentage before VM_i executing the tasks.

Figure 2. The Performance Difference

The forth parameter within group is the consumption of CPU needed to execute each request on average. Therefore, after accomplishing a group of tasks, the CPU load of the ith VM, VM_i, is:

On the basis of Figure 2, the optimization results achieved by MAGA are far superior to the traditional GA’s.

Cli = Ci + Vmci / Vci × 100% (4)

number of times

IV. THE ESTABLISHMENT OF LOAD BALANCING MODEL The main parameters required from a single user include: User (ReqPerHrPerUser , ReqSize , ReqCPU , ReqMemory , Count). Among them, ReqPerHrperUser refers to the number of online users in every hour on average in a user group. ReqSize represents the size of the request sent by each user in the user group. ReqCPU indicates the quantity of CPU use needed to execute the request, relative to a 2.4GHZ single-core CPU; the unit is a percentage. ReqMemory means the size of the memory (in M) consumed to execute the request. Count indicates the number of sent requests in every minute. In order to solve the issue of exploding dimension, group strategy is exploited to set up a resource scheduling model. The group strategy is based on the user requested parameters, and the parameters inside the group all have the maximum value. The sum of all users’ parameters inside each group is no more than the maximum value set by the group. According to a user request time sequence, we tend to divide all user requests satisfying first arrival and maximum value into one group, and reset all the parameters inside the group. The resetting rule is shown in following formula: n

∑ User

ij

gj =

i =1

n

, 0≤ j≤4

(2)

A. Establish Load Balancing Model The establishment of the load balancing model mainly refers to the design of fitness function. On the basis of group strategy, all of the VM (virtual machine) virtual resources on a physical resource’s host correspond to the user group strategy request. One host contains several VM, and each

In this formula, Vmpi and Vmi are constant, and Ci is the rest of the CPU percentage before executing the tasks. On the basis of consumption of memory and CPU, the overall load Vli on VM_i can be calculated according to following formula:

Vli = w × Mli + v × Cli (5) In this formula, w and v are weighting factors, and meet w+v=1. Thus, the overall load, Hlj on the host is: mj

Hl j =

mj

∑ Vl

ji

=

i =0

∑ ( w × Ml

ji

+ v × Cl ji ) (6)

i =0

In this formula, Vlji represents the load of the ith MV of the jth host. Mlji indicates the memory load of the ith MV of jth host, and Clji is the CPU load of ith MV on the jth host. M represents the number of VM activated on the jth host’s physical machine. Calculate the average load El of all of the hosts within the DC. o

o

∑ Hl El =

j =0

o

mj

o

∑∑ Vl

j

=

j =0 i =0

=

o

mj

∑∑ ( w × Ml

ji

ji

+ v × Cl ji )

j =0 i =0

(7)

o

In this formula, O is the number of the host physical resources, and mj represents the number of MV resources. The load difference between each host and system average load is: |Hlj – El|. Therefore, the fitness function can be set as: o

f =

∑ j =0

Restriction condition:

Hl j − El (8)

Ml ji < 1

&

Cl ji

< 1 (9)

The goal is to make the function f as small as possible under the restriction condition. B. Encoding There are several encoding methods in GA, such as one dimensional encoding, multidimensional encoding, binary encoding, decimal encoding, and floating-point encoding. All these encoding approaches are also suitable to MAGA. For the sake of convenient operation, we exploit binary encoding [12] which is the simplest and most commonly used. Suppose we have 10 user groups {Group0,Group1, …,Group9}, and simultaneously have 30 VMs {VM_0, VM_1,…,VM_29}. Each user group is treated as one dimension. These ten dimensions are set as {x0, x1, …, x9}, respectively. xi corresponds to ‘Group i’. In that way, there are 30 alternative VMs available for xi, 30 < 25. We take each xi encoding length as five. Among 00001→VM_0, 00010→VM_1, …, 11110→VM_29, if adopting binary multiple dimensions encoding fashion, then the encoding for the entire system is {00000, 00000, …, 00000}. A possible solution for a system with 10 user groups and 30 VMs resources is as follows:

{00001, 00100,10010, 00110,10010, 11000,11100, 00010, 01000,10000} Therefore, for a system with n user groups and M VMs virtual resources, the number of solution dimensions of the issue is n, and the encoding length of each dimension for each individual is log2M.

Step 7: If it meets the termination conditions, output Best t and terminate; otherwise, set t ← t+1 and then resume at Step 2. Lt represents the tth generation agent network, and L and Lt+2/3 is the middle generation agent network of Lt and Lt+1. Best t is the optimal agent among L0, L1, …, Lt, and CBest t represents the optimal agent among Lt. The parameters, Pc and Pm, are preset, and represent the executive probability of neighborhood orthogonal crossover operator and mutation operator, respectively. t+1/3

VI. SIMULATION EXPERIMENT RESULT AND ANALYSIS In the experiment, the parameters of the MAGA are set as follows: Lsize=5 represents agent grid size or population size; Po=0.25 represents occupation strategy of neighborhood competition operator; Pc=0.1 represents execution probability of neighborhood orthogonal crossover operator; Pm=0.1 represents execution probability of mutation operator; in self-learning, sLsize=2 refers to population size; searching radius is 0.2; sPm=0.05 represents mutation rate, and the number of iterations is 10. The experiment has been divided into three parts. Each part will apply Min_min scheduling and MAGA scheduling respectively, and then compare and analyze the utilization rate of CPU and memory. In the first part of the experiment, set total 20 heterogeneous VMs to 10 hosts. The number of user groups is 100 with weighting factors [ ] of w=0.5 and v=0.5. In the second part, we adjust weighting factors to w=0.01and v=0.99. The last part of the experiment is for testing singlepoint failure rate, where weighting factors are w=0.01 and v=0.99. Figure 3, 4, 5, 6 shows the first part of the experiment.

V. ALGORITHM PROCEDURE Algorithm execution flow is shown as follows: Step 1: Randomly generate Lsize 2 agents, and initiate L0, and then update Best 0, assuming t←0.

Min_min Algorithm CPU Sampling Result 100.00% 80.00%

MAX

60.00%

MIN

40.00%

AVG

20.00% 0.00%

Step 2: Execute neighborhood competition operator for each agent in Lt, and then obtain Lt+1/3.

1

3

5

7

9

11

13

15

17

19

21

Figure 3. Sampling Result of CPU by Using Min_min

Step 3: If U (0, 1) < Pc, then apply neighborhood orthogonal crossover operator into each agent in Lt+1/3 to generate Lt+2/3.

MAGA Algorithm CPU Sampling Result 100.00%

Step 4: If U (0, 1) < Pm, then applies the mutation operator into each agent in Lt+2/3, and then achieves Lt+1.

80.00%

Step 5: Determine the CBest t+1 from Lt+1and apply the self-learning operator into CBest t+1.

0.00%

Step 6: If Energy (CBest t+1) > Energy (CBest t), then assume Best t+1 ﹁ CBest t+1; otherwise, Best t+1 ﹁ Best t, CBest t+1 ﹁ Best t.

MAX

60.00%

MIN

40.00%

AVG

20.00%

1

3

5

7

9

11

13

15

17

19

21

Figure 4. Sampling Result of CPU by Using MAGA

the actual situation for achieving the desired load balancing state.

Min_min Algorithm Memory Sampling Result 100.00% 80.00%

MAX

60.00%

MIN

40.00%

AVG

20.00% 0.00%

1

3

5

7

9

11

13

15

17

19

21

Figure 9 shows a comparison of two algorithms of single-point failure rate when weighting factors are w=0.01and v=0.99. It can be seen from Figure 9 that the single-point failure rate of MAGA is much smaller than Min_min algorithm.

Figure 5. Sampling Result of Memory by Using Min_min

Number of failure of two scheduling policys 80

MAGA Memory Sampling Result

60 Min_min

40

100.00% 80.00%

MAX

60.00%

0

MIN

40.00%

1

AVG

20.00%

MAGA

20

3

5

7

9

11

13

15

17

19

0.00% 1

3

5

7

9

11

13

15

17

19

21

Figure 9. Number of Single-Point of Failure

Figure 6. Sampling Result of Memory by Using MAGA

When w= 0.5, v= 0.5, MAGA has a significant advantage over the Min_min algorithm on the load balancing of CPU utilization, but not on the load balancing of RAM usage. For this reason, we adjust the weight to w=0.01 and v=0.99 in part 2. In this part, Figure 7 and 8 show the sampling result of CUP and memory usage by use of MAGA. MAGA CPU Sampling Result 100.00% 80.00%

MAX

60.00%

MIN

40.00%

AVG

20.00% 0.00% 1

3

5

7

9

11

13

15

17

19

21

Figure 7. Sampling Result of CPU by Using MAGA

MAGA Memory Sampling Result 100.00% 80.00%

For high performance cloud computing, it mainly considers the efficiency of request and can ignore the influence of the memory, and then the value of w can be set to be bigger. Some cloud computing systems do not need high computing power, but need to consume a large amount of memory. At this time the value of v should be bigger. VII. CONCLUSION This paper experimentally proves that MAGA is more appropriate than GA to handle high-dimensional function optimization problems. Then, establishing a cloud computing load balancing model, Min_min and MAGA algorithms were applied for resource scheduling respectively. By adjusting the parameters, the scheduling results show that both CPU utilization and memory load balancing for MAGA is much better than Min_min scheduling on average, and the comprehensive load balancing effect can be achieved by adjusting weighting factor. Moreover, the MAGA scheduling algorithm can result in a smaller single-point failure rate. This shows that this method, used for solving load balancing strategy based on virtualized cloud computing, is feasible and effective.

MAX

60.00%

REFERENCE

MIN

40.00%

AVG

20.00% 0.00% 1

3

5

7

9

11

13

15

17

19

[1]

Jinhua Hu; Jianhua Gu; Guofei Sun; Tianhai Zhao; , "A Scheduling Strategy on Load Balancing of Virtual Machine Resources in Cloud Computing Environment," Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on , pp.89-96, 18-20 Dec. 2010

[2]

Chunye Gong; Jie Liu; Qiang Zhang; Haitao Chen; Zhenghu Gong; , "The Characteristics of Cloud Computing," Parallel Processing Workshops (ICPPW), 2010 39th International Conference on , pp.275-279, 13-16 Sept. 2010

[3]

Zhongni Zheng; Rui Wang; Hai Zhong; Xuejie Zhang; , "An approach for cloud resource scheduling based on Parallel Genetic Algorithm," Computer Research and Development (ICCRD), 2011 3rd International Conference on, vol.2, pp.444-447, 11-13 March 2011

[4]

Shirero, S.; Takashi, M.; Kei, H.; , "On the schedulability conditions on partial time slots," Real-Time Computing Systems and

21

Figure 8. Sampling Result of Memory by Using MAGA

From Figure 7 and 8 we can see that MAGA can achieve effective load balancing of both CPU and memory usage when weighting factors are w=0.01 and v=0.99, and its degree of load balancing is still better than Min_min algorithm. Generally, we usually consider CPU utilization as a focus in practical applications, and disregard the memory usage condition, or only consider memory utilization rate and disregard CPU utilization. At this point, we can adjust corresponding parameter values according to

Applications, 1999. RTCSA '99. Sixth International Conference on, pp.166-173, 1999 [5]

Kant Soni, V.; Sharma, R.; Kumar Mishra, M.; , "An analysis of various job scheduling strategies in grid computing," Signal Processing Systems (ICSPS), 2010 2nd International Conference on, vol.2, pp.V2-162-V2-166, 5-7 July 2010

[6]

Alizadeh, G.; Baradarannia, M.; Yazdizadeh, P.; Alipouri, Y.; , "Serial configuration of genetic algorithm and particle swarm optimization to increase the convergence speed and accuracy," Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on, pp.272-277, Nov. 29 2010-Dec. 1 2010

[7]

John H. Holland. 1992. Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA, USA.

[8]

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY, USA, 164-177.

[9]

Deb, K, and H G Beyer. “Self-adaptive genetic algorithms with simulated binary crossover.” Evolutionary Computation 9.2 (2001) : 197-221.

[10] Zhong Weicai; Liu Jing; Xue Mingzhi; Jiao Licheng; , "Global numerical optimization using multi-agent genetic algorithm," Computational Intelligence and Multimedia Applications, 2003. ICCIMA 2003. Proceedings. Fifth International Conference on ,pp. 165- 170, 27-30 Sept. 2003. [11] Yuxia Du, Fangai Liu, and Lei Guo. “Research and improvement of Min-Min scheduling algorithm”. Computer Engineering and Applications,2010,46(24):107-109. [12] R. Caruana and J. D. Schaffer, “Representation and hidden bias: Gray vs. binary coding for genetic algorithms,” in Proceedings of the 5th International Conference on Machine Learning, ICML 1998, 1988, pp. 153–161. [13] Baowen Xu; Yu Guan; Zhenqiang Chen; Leung, K.R.P.H.; , "Parallel genetic algorithms with schema migration," Computer Software and Applications Conference, 2002. COMPSAC 2002. Proceedings. 26th Annual International ,pp. 879- 884, 2002