Computational Intelligence for Cloud Management ... - IEEE Xplore

2 downloads 18634 Views 920KB Size Report
at in literature for cloud management, and reviews recent research on algorithms for large scale computing resource optimization. We also offer a discussion on ...
2013 IEEE Congress on Evolutionary Computation June 20-23, Cancún, México

Computational Intelligence for Cloud Management Current Trends and Opportunities Alexandru-Adrian Tantar, Anh Quan Nguyen, Pascal Bouvry

Bernab´e Dorronsoro, El-Ghazali Talbi

Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg, Luxembourg 6, rue Richard Coudenhove-Kalergi, L-1359 Luxembourg {alexandru.tantar, anh.nguyen, pascal.bouvry}@uni.lu

LIFL – UMR LILLE 1 / CNRS 8022 University of Lille 1, France Cit´e Scientifique, M3, 59655 Villeneuve d’Ascq, France {bernabe.dorronsoro diaz, el-ghazali.talbi}@inria.fr

Abstract—The development of large scale data center and cloud computing optimization models led to a wide range of complex issues like scaling, operation cost and energy efficiency. Different approaches were proposed to this end, including classical resource allocation heuristics, machine learning or stochastic optimization. No consensus exists but a trend towards using many-objective stochastic models became apparent over the past years. This work reviews in brief some of the more recent studies on cloud computing modeling and optimization, and points at notions on stability, convergence, definitions or results that could serve to analyze, respectively build accurate cloud computing models. A very brief discussion of simulation frameworks that include support for energy-aware components is also given.

I.

I NTRODUCTION

The expanding need for large scale computing available at all times led to new challenges. Even if used in an indirect way, e.g. via a search engine, social media or web services, or in a direct manner, as for trading and in the simulation of complex systems (banks, research centers), large scale computing became an important part of our everyday life. Without entering into all the details and implications of Moore’s vs Koomey’s law [1], [2], [3], or Landauer’s limit [4], it became obvious that the increasing power and, inherently, increasing energy consumption, hit limitations we previously ignored. Recalling concerns about the consequences on the environment, energy consumption ultimately is a major economical factor in operating large scale computing infrastructures like data centers or clouds. Different studies indicated an increasing trend of the greenhouse gas emissions associated with computing, with numbers comparable to the ones listed for the airline industry [5], [6], [7]. Other problems stem from the intrinsic nature of this new wave of computing, including reliability, scalability, data integrity or security issues. An economic impact study on clouds is available in [8]. We focus in this paper on the different algorithms and models that address data centers and cloud related problems, with a particular attention paid to optimization issues in infrastructure as a service. The study is built on our strong belief that, additional to the in-depth analysis of, among many others, load balancing, scalability or security aspects, i.e. as independent criteria, a significant improvement can be attained by adopting a (more) holistic perspective; our understanding of holistic, in this context, is about dealing, in a same model, with all or most of the issues related to prediction, scheduling or resource allocation. The stochastic nature of a cloud, in particular, also calls for solutions that combine results from queuing theory, optimization

978-1-4799-0454-9/13/$31.00 ©2013 IEEE

1286

TABLE I.

N OTATION

SUMMARY.

n m k t

number of available physical machines or servers number of virtual machine instance types no. of distinct resources available, e.g. memory, storage, etc. time line index, slot based system

i j l

virtual machine instance type or type-i job index, 1 ≤ i ≤ m physical machine or server index, 1 ≤ j ≤ n resource type index, 1 ≤ l ≤ k

ril cjl wit dti qit

amount of type-l resource required by a type-i instance maximum amount of type-l resource allowed by a server j time needed to complete (stochastic) all the type-i incoming jobs number of type-i jobs served by the cloud at a given time slot t number of type-i backlogged jobs, new arrivals not counted

αj βj γjt

lower utilization thresholds at server j upper utilization thresholds at server j computational load of the server j at a given time slot t γjt (v ∈ Vjt ) processing power allocated to a VM v ∈ Vjt γ ˜jt (v ∈ Vjt ) processing power initially requested by v ∈ Vjt

y

load configuration with y = (yi )1≤i≤m , specifying the number of type-i instances to deploy; feasible if it fits the available resources

V Vjt

set of all virtual machine sets set of VMs allocated to a server j at time t, Vjt ∈ V P(Vjt ) ⊂ V power set of Vjt , i.e. set of all subsets of Vjt set of all feasible configurations of a given server j

Yj

and machine learning, only to name a few areas. Understanding what factors are worthwhile considering, what an algorithm can deal with and to what extent sub-optimal solutions suffice, is thus a first step we need to investigate on. This article provides an introduction to some of the main directions looked at in literature for cloud management, and reviews recent research on algorithms for large scale computing resource optimization. We also offer a discussion on what are, from our perspective, the relevant factors to look at for future study. The rest of the paper is organized as follows. A discussion of algorithms and models is given in Section II, including topics from the studies already cited above, i.e. on heuristics, machine learning, stochastic optimization and nature inspired paradigms. Section III looks at frameworks in cloud computing, with a short outline of the enclosed models and algorithms. Last, concluding remarks are given, offering our view on future directions of interest and with a particular consideration for evolutionary and nature inspired paradigms.

TABLE II.

A

SUMMARY OF THE DIFFERENT STUDIES DEALING WITH OPTIMIZATION IN CLOUD COMPUTING .

Authors & References

Model

Approach

Experimental Environment

Results and Remarks

Beloglazov et al., 2012 [9]

semi deterministic static snapshots

modified best fit decreasing, selection policies for minimizing migrations, ensure potential growth or at random

academic, CloudSim [10]

empirical analysis of three reallocation schemes, trade-offs between, e.g., energy consumption and SLA violations

Berl et al., 2010 [7]

survey article

low-level to infrastructure details, energy efficient cloud and data-center operation, wireless and wired networks power minimization, energy efficient hardware, energy-aware scheduling in multiprocessor and grid systems, virtualization, impact of communication oriented applications

Berral et al., 2011 [11]

stochastic, heuristic

integer linear programming exact solver, first fit and ordered best fit, ad-hoc λ-Round-Robin, machine learning for adaptive cloud scheduling

Calheiros et al., 2011 [10]

framework

simulation of large scale data centers and (federated) clouds, stop and resume simulations, dynamic changes, userdefined allocation and provisioning policies, support for energy-aware computational resources

Goudarzi et al., 2012 [12]

non-linear mixed integer

dynamic programming, convex optimization, heuristic similar to first fit decreasing

Kliazovich et al., 2012 [50]

framework

energy-aware cloud computing packet-level simulator; consumption of servers, switches, and links; realistic packetlevel communication patterns; two-tier, three-tier, and three-tier high-speed designs with dynamic shutdown, voltage and frequency scaling

Lee et al., 2012 [13]

stochastic

analytic model for long-term optimal resource reservation, dynamic subscription model, hidden Markov model for resource demand prediction

academic, simulation

derivation of an upper bound for optimal long-term resource reservation by defining an analytic model for the expected penalty

Mezmaz et al., 2011 [14]

integer array, evolutionary

energy-aware scheduling in clouds, biobjective hybrid metaheuristic with execution time and energy consumption as objectives; energy conscious heuristic [15]

academic, simulation

scheduling in precedence-constrained parallel applications on heterogeneous systems; deployments with (asynchronous migrations) island and multistart models; experiments on 9000 instances with up to 120 tasks and 64 processors; improvements in both energy consumption and makespan

Phan et al., 2012 [16]

integer decision vector, evolutionary

NSGA-II based evolutionary multiobjective optimization algorithm

academic, simulation

analysis of the algorithm on a realistic setup

academic, simulation based

simulation

statistical analysis for implemented algorithms, improvement for VMs migration

comparative results with per client parameters derived from the Amazon EC2, incremental approach

part of the Green Monster framework, aiming to operate a federation of data centers as to maximize renewable energy consumption and savings from cooling; DOE/ORNL heat pump design model, free-cooling; simulations for different centers across Europe, temperature variations modeled by data from the European Climate Assessment & Dataset Prevost et al., 2011 [17]

auto-regressive

multi-layer feedforward neural network; prediction based on a linear auto-regressive model

NASA and the US Environmental Protection Agency logs

network load prediction exclusively based on data extracted from logs

Qiang Li, 2012 [18]

stochastic

integer programming, Gr¨obner basis

academic, simulation based

resource scheduling, four basic simulation based case studies, exponential time for even small size instances

Romanov et al., 2011 [19], [20]

stochastic

branching processes, data flow model; it is assumed that clients inside a cloud could exchange information about services in an epidemic like manner

Russian IaaS market

model verified by analyzing logs

Theja et al., 2012 [21], [22]

stochastic

heuristic scheduling and resource allocation: best-fit, max weight allocation, join-the-shortest-queue, power-of-twochoices, pick-and-compare

academic, simulation based

capacity of clouds, analysis of heuristics and throughput optimal scheduling

Ying et al., 2012 [23]

direct representation

genetic algorithm dealing with makespan and energy consumption; mono-objective approach

academic, CloudSim [10]

not a classical multi-objective approach, objectives are aggregated in a first approach or, second, selection screens solutions by independently looking at makespan or at energy consumption alone

Younge et al., 2010 [24]

framework

scheduling and management of virtual machines, custom images

academic, OpenNebula

improvements in power consumption and boot time with custom-designed virtual images

1287

II.

A LGORITHMS AND M ODELS IN C LOUD C OMPUTING

A look at the different algorithms and models used in large scale computing resource optimization or management problems is given in the following. We specifically aim at providing an introductory outline of the more recent research in the field of data centers and cloud computing, on modeling and optimization, with references given for further reading. To this end, only a few major directions are discussed, the study being knowingly limited and to no extent exhaustive. Several surveys available in the literature already deal with specific aspects that cover from the intrinsic nature and structure of a cloud to particular topics in, for instance, mobile cloud computing, virtualization vulnerabilities, federation of clouds, data access and integrity or low resolution security issues [25], [26], [27], [28], [29], [30]. A taxonomy is also available in [31]. A very concise study on details that run from low-level to infrastructure related aspects is given in Berl et al. [7]; among others it reviews recent advancements and research on methods or technologies for energy efficient cloud and data-center operation. Naming only a few of the areas looked at, the authors point at wireless and wired networks power minimization, energy efficient hardware, energy-aware scheduling in multiprocessor and grid systems, state, e.g. slowing down the processor or powering off different parts of the chip or hardware, dynamic voltage and frequency scaling, virtualization, consolidation and protocols. Full, hosted and operating system layer virtualization are also mentioned. Last, an overview of energy efficient cloud computing and energy-aware data centers is given, ending with the impact communication oriented applications have. When looking at large scale computing from an algorithmic and model oriented perspective, a wide range of studies can be found in literature on, among many others, heuristics [9], [12], machine learning, e.g. neural networks, auto-regressive stochastic state transition models [17], branching processes [19], [20], stochastic programming and optimization [18], [21], [22], [13], or nature inspired algorithms [23], [32]. Among the different criteria taken into account, one can mention, performance, availability, energy efficiency, elasticity, bandwidth or Quality of Service (QoS). A large part of the existing studies rely on discrete models alone [18], [14], [12], [16], with very few exceptions bringing continuous attributes into discussion [19], [20]. A trend towards focusing more on highlevel aspects in queuing, load balancing or scheduling can be observed, also with the additional support of technologies such as frequency scaling, e.g. SpeedStep [33], Cool’n’Quiet [34], or virtualization. Complementary to these works, we aim at having a closer look at the interface algorithms and models have as well as at identifying elements that can make the object of a holistic model, e.g. including prediction, thermal distribution, scheduling, load balancing, among others, in a more unitary manner. The notation proposed in the following is built to offer a unified view, trading at times rigorousness for the clarity of the presentation. Also, where considered necessary, operators are introduced, i.e. in addition to the original model or specifications, for a better overall integration. Last but not least, no reproduction or discussion of proofs, where applicable, is made; for details, please refer to the corresponding works. Note also that it was not always possible to offer a

1288

clear separation with respect to paradigms; a few heuristic and machine learning based studies are first presented, creating a first connection with multi-objective optimization, followed by optimization and stochastic processes, and ending with nature inspired algorithms. For the remainder of this section, we consider a setup where a set of annotated virtual machine instances need to be placed on a given number of physical machines; resource constraints may apply. Information about demands and utilization of resources is available, the goal being to ensure a low overhead (close to) optimal resource allocation. A notation summary can be found in Table I. A list and outline of the different articles discussed in the following is given in Table II. A. Heuristics and Machine Learning The need to serve (almost) real time incoming requests, depending on the specific application type, and also the NPhard nature of provisioning, scheduling or scaling in cloud computing, pushed research towards an extensive analysis of heuristics. Stochastics, the need to cope with a large number of objectives or constraints, only to name a few aspects, made exact approaches impossible to use within a real-life setup. A few studies look at optimal solutions in academic benchmarks but, as it will be later described, even for such an analysis environment, it quickly becomes prohibitive to deal (in an exact manner) with anything beyond small instances. An important question one needs to address when using heuristics is the extent to which the considered algorithm approaches the optimum, i.e. convergence and performance guarantees. And, with even more impact, one may question stability. An important number of studies rely on empirical results to answer these questions despite the numerous theoretical results in queuing on, for example, convergence. To this end, we first look at a modified Best Fit Decreasing (mBFD) energyaware resource allocation algorithm [9] that draws inspiration from previous bin packing results [35], [36]. The algorithm implements a fast, almost optimal reallocation scheme, offering a polynomial time approximation; virtual machines are sorted in decreasing CPU utilization order and allocated to hosts such as to ensure a minimal power consumption increase. Furthermore, depending on host specific thresholds, the model requires virtual machines to allow migration, e.g. when activity falls below or peaks above some pre-specified limit, with a reallocation procedure that works by (i) selecting a set of machines to migrate based on some specific policy, and (ii) redeploying the selected machines using the mBFD algorithm. An outline of the different selection policies is given in the following. Assuming V to be an abstract set that describes all virtual machine sets, let Vjt ∈ V be the set of virtual machines allocated to a server j at time t, and P(Vjt ) ⊂ V the set of all subsets of Vjt . Furthermore, let αj and βj stand for the lower, respectively the upper utilization thresholds at server j, and γjt , γjt (v ∈ Vjt ), the current computational load of j, respectively the amount of processing power allocated to a given virtual machine v ∈ Vjt , e.g. fraction of the total computational power of the physical machine. Also, let γ˜jt (v ∈ Vjt ) be the amount of processing power initially requested by a virtual machine v ∈ Vjt , γjt (v) ≤ γ˜jt (v). Next, let Λ : V/{∅} → V be an

Sz operator which, given A = r=1 {Ar } ∈ V/{∅}, recursively selects a set with minimal cardinality:  z−1 [   {Az } if z = 1 ∨ |{Az }| < |Λ( {Ar })|,   r=1 Λ (A) = z−1 z−1 [ [     Λ( {A }) if z > 1 ∧ |{A }| ≥ |Λ( {Ar })|. r z  r=1

r=1

A migration minimization policy, as defined in [9], acts by relocating virtual machines from hosts where utilization is outside bounds, all while ensuring that a minimal number of machines are moved. Given a specific server j, the following formal model can be derived for constructing the set of machines to migrate:  X  Λ({V ∈ P(Vjt ) : γjt− γjt (v) < βj }) if γjt > βj ,   v∈V Vjt if γjt < αj ,    {∅} if αj ≤ γjt ≤ βj . The first case corresponds to a violation of the upper bound threshold, i.e. the server is overloaded; all machines are selected in the second case (underused resources); last case, no action is taken, utilization is within the specified bounds. A second policy, highest potential growth, is intended to minimize the (potential) computational increase impact. Considering an operator δ : V → R, with δ(V ∈ V) = P t γ (v)/˜ γjt (v), and using as notation V¯ = {V ∈ P(Vjt ) : j v∈V P t t γj − v∈V γj (v) < βj }), V¯ ⊂ V, a similar definition as for migration minimization can be given by adapting the first case:   arg min δ(V ), if γjt > βj ,   V ∈V¯ ⊂V  t t Vj if γj < αj ,    {∅} if αj ≤ γjt ≤ βj .

A third and last policy the authors analyzed, called random choice, works by selecting in uniform random manner a subset of the virtual machines hosted on a given overloaded server. A detailed presentation and discussion can be found by referring to [9]. A comparison study is provided, relying on a CloudSim [10] environment. Four performance metrics are included: overall energy consumption, number of virtual machine migrations, Service Level Agreement (SLA) violations as a percentage of the number of processed time frames, and average SLA violation (failing to allocate computational power on request with a subsequent performance degradation). The results, depending on the specific selection policy in use, show different trade-offs, e.g. raising the lower utilization threshold in migration minimization mode leads to a decrease of energy consumption but also results in more SLA violations. Energy consumption and average SLA violation metrics are found to be similar across policies with significantly more SLA violations when operating in a highest potential growth mode. Last, based on a set of comparative tests, the authors conclude that migration minimization offers the best trade-off, i.e. best energy consumption with the lowest number of SLA violations and virtual machine migrations. A similar study on power and migration costs with soft

1289

SLA constraints in data centers is available in Goudarzi et al. [12]. The model is intended to deal with computing-intensive scenarios where applications can be deployed over a large number of (heterogeneous) servers. The data center is assumed to be capable of operating in either (i) a dynamic mode, i.e. migration, admission or removal of Virtual Machines (VMs) under performance, power allocation and thermal constraints, or (ii) in a semi-static manner, considering the whole set of active VMs, with workload prediction inputs, power, thermal and performance sensor readings. A generalized processor and multiple queue sharing model dispatches (in a probabilistic manner) the incoming requests; servers can also be switched on or off. Furthermore, a per client description is used to specify expected response times, amount of memory needed, migration costs, request and service rates. A mixed integer non-linear programming model is used to describe the per (selected) client and the overall (to be minimized) costs. A second, convex optimization problem is formulated for energy costs and SLA violations minimization. With respect to these two problems, the authors propose a dynamic programming approach where VMs are individually assigned to servers. A final local search step is run to further optimize the resulting allocation, i.e. improve sharing and utilization, where servers with under or low utilization resources are examined in an iterative manner. The simulation setup, addressing a semi-static mode only, considers an arbitrary number of heterogeneous servers with different processor and memory specifications. In addition, different types of clients are modeled, with penalty factors derived from the on-demand rates of the Amazon EC2. A comparative study of the proposed approach shows the improvements obtained when analyzing the algorithms with respect to a lower bound on the overall cost and vs two other heuristics. For each simulation, the authors note, a number of at least 1000 trials was performed, with up to 4000 clients. A common (missing) point where most studies in the data center or cloud computing optimization field meet is performance (un)predictability and not considering any information about the likely evolution of future events, e.g. incoming requests, duration of tasks, failure rate, etc. A first way of dealing with these aspects is to rely on machine learning techniques; a more thorough discussion of stochastic models is given in the next section. Assuming that most of the processes associated to, for example, operation, profile of requests or cost fluctuation, depend on time homogeneous parameters, learning some underlying structure to describe these parameters should be a straightforward task. An example of such an approach can be found by referring to the work of Prevost et al. [17] on load prediction for task allocation and optimal processor state transition. A neural network model is used to anticipate network traffic with look ahead time intervals ranging from 1 to 90 seconds; in addition, an auto-regressive filter based linear predictor is implemented by regarding the observed data as a convolution result of an impulsive source with a given channel, in this case the number of accesses per discrete time index. A brief discussion and prediction results based on HTTP logs from NASA and the United States Environmental Protection Agency are given. A machine learning based approach can also be found in Berral et al. [11]. Several paradigms are combined into an adaptive and autonomic power-aware scheduling algorithm, allowing to maximize the profit while trading off revenue,

Quality of Service (QoS) and power consumption. A mixed integer linear programming exact solver is used to analyze the performance of several heuristics, i.e. first-fit, ordered bestfit and a power-aware greedy λ-Round Robin algorithm [37], with results showing that ordered best-fit stands as a good candidate in terms of results vs execution time. An M5P regression tree algorithm is used to predict computational load and memory usage, allowing to fix constraints inside the linear programming model. The experimental part discards memory usage however due to a need to address, in addition to past observed states, caching or session pooling in, for example, web servers, aspect postponed for later study. Among future work directions, the authors mention dealing with non-linear power consumption, predictive models or SLA constraints. A few other examples on machine learning and prediction in clouds, e.g. for content locality, operator or data flow, can be found by referring to [38], [39], [40], [41]. B. Stochastic Processes and Optimization The nature of a data center or of a cloud implicitly requires dealing with a mixture of stochastic processes. Discarding or not taking into account, for example, the arrival rate of tasks, failures and so forth, implies a high risk of implementing (on the long run) sub-optimal configurations with overprovisioning, performance penalties or denial of service hidden downsides. Similarly, energy efficient computing is strongly connected to being able to anticipate utilization, evolution of energy supplies or cost fluctuations, among others. Forecast data on meteorological conditions, for example, can be used to determine what will be the likely amount of solar energy produced (and used as an alternative source) at some specific moment of time. A few examples where stochastic models are used can be found in the following, with a brief look at results on stability and convergence. Theja et al. [21], [22] discuss resource allocation and scheduling, the study providing a set of stability results, previously introduced for constrained queuing systems [42]; an adaptation is made to analyze throughput and different scheduling policies. After defining cloud capacity and stability with respect to available resources, respectively waiting queues under given traffic patterns, the authors revisit the best-fit and max weight heuristics, with a short example proving the latter to be throughput-optimal. Working assumptions require (i) machines to allow being reconfigured at arbitrary moments, and (ii) jobs to support preemption and migration among servers. A non-preemptive algorithm is also proposed, ending with a decentralized, no throughput loss queuing model. A simulation based environment is used for algorithm analysis, including Amazon EC2 (http://aws.amazon.com/ec2) like instances, i.e. virtual machines described in terms of memory, CPU and storage: first generation standard extra large (15 GB, 8 EC2, 1690 GB), high-memory extra large (17.1 GB, 6.5 EC2, 420 GB), and high-CPU extra large (7 GB, 20 EC2, 1690 GB). A summary of the main results and basic definitions is given in the following; notation was adapted for a more simplified formalism. Please do refer to [21], [22] for additional details. Given m virtual machine instance types, n servers, k distinct resources, and ril , cjl , the amount of type-l resource required by a type-i instance, respectively maximum allowed by the server j, a configuration, defined as a vector y = (yi )1≤i≤m ,

1290

is said to be valid for a given server if it simultaneously allows hosting y1 type-1 VMs up to, respectively, ym type-m VMs. Formalized, the allocation of VMs, given a server j, is subject to the following constraint: m X

yi ril ≤ cjl , ∀l ∈ 1, k

i=1

A discrete time slot based simulation environment is considered. Given the overall amount of time wit required for all the type-i jobs to complete execution, defined as a stochastic process independent and identically distributed across time slots, and given dti the number of type-i jobs served by the cloud at time slot t, the number of backlogged jobs (new arrivals not included) can be recursively defined as follows: qit+1 = qit + wit − dti A direct stability condition is afterwards derived by imposing to have a finite overall number of backlogged Pm jobs when time goes to infinity, i.e. lim supt→∞ E [ i=1 qit ] < ∞. Furthermore, by assuming Yj to be the set of feasible configurations of a given server j, and Conv(Yj ) its convex hull, a cloud capacity region can be defined by considering the set of all valid configurations for the given cloud:   n   X C= y : y= yj , with yj ∈ Conv(Yj )   j=1

Note: yj ∈ Conv(Yj ) stands for one of the different configurations of server j, i.e. load in terms of instance types. A job scheduling algorithm, as defined by the authors, is called throughput optimal if, for any given load y ∈ C, it allows a scheduling of (1 + )y such that (1 + )y ∈ C, for some  > 0. Also, by relying on previous P results from [42], it m is shown that, for any y ∈ / C, limt→∞ E [ i=1 qit ] = ∞, i.e. the scheduling algorithm is not stable. Following results show that different queuing and scheduling policies, e.g. centralized dispatching, non-preemptive algorithms or load balancing, are throughput optimal when specific conditions are met. A similar study based on a stochastic integer programming approach, presented in [18], makes use of Gr¨obner basis to address the problem of optimal resource allocation inside clouds, i.e. minimize costs within some throughput, latency and SLA fixed bounds. A brief discussion of Gr¨obner basis is given along with several definitions, allowing to later introduce an optimal resource scheduling algorithm, convergence proof provided. A series of four basic simulation-based case studies are presented, showing an exponential increase of the time required to find an optimal solution when the number of services or resources grows. A look at hidden Markov models for resource demand prediction is given in a study conducted by Lee et al. [13] on optimal cloud resource subscription policies when offered with different pricing models. An approach based on (i) a long-term reservation of resources, as a first phase, and (ii) a dynamic subscription model, as a second phase, is proposed, additionally taking into account provisioning delays. An upper

bound on the optimal number of long-term resources to reserve is derived, taking the minimization of expected operation cost as criterion. An analytic solution to the optimal long-term resource reservation is next given. A straightforward reactive provisioning strategy is also defined, followed by an extension that relies on a hidden Markov model to predict resource demand. The last part of the article gives some numerical results for a simulation setup where pricing is specified by looking at the model in use at Amazon. From a quite different perspective, an introductory study on how branching processes can be used to model cloud computing demand and performance dynamics is given in Romanov et al. [19], [20]. A data flow prediction model is proposed where initial and secondary queries follow binomial distributions with exponentially distributed arrival times. A very specific assumption the study makes is that clients exchange information about the cloud in an epidemic-like manner, ultimately allowing to foster cloud utilization. At the same time, large cloud systems like the ones operated by Amazon, less sensitive to local or short-term perturbations, are not a direct target of the study, authors note. C. Nature Inspired Algorithms

al. [14]. The article deals with task scheduling for precedenceconstrained parallel applications in heterogeneous systems; a compromise between execution time and energy consumption is sought for, a hybrid design being proposed to this end. First, a cloud computing model is provided, with heterogeneous Dynamic Voltage Scaling (DVS) enabled processors, and applications described by directed acyclic task (data flow) graphs. Energy consumption is estimated by using a standard power dissipation model. Solutions are encoded as decision vectors where tasks are associated to processors which, in turn, are run at a specified voltage. The proposed approach relies on a multiobjective genetic algorithm to handle tasks, with an Energy Conscious Heuristic (ECS) [15] dealing with processor and voltage specifications. An island deployment model is used, with asynchronous migrations and with multi-start independent ECS runs. Experiments run on 9000 instances are discussed, with up to 120 tasks and 64 processors, using the Fast Fourier Transformation task graph for evaluation. Improvements in both energy consumption and makespan are observed with better results obtained when the number of islands is increased. A few other interesting examples can be found by referring to [47], [48] or [49], only to name a few. III.

The main difficulty one needs to face when dealing with optimization in data centers or cloud computing is the high complexity of the model. A specific trait these problems have comes from the very nature of the environment to deal with: complex systems subject to a large number of interdependent, ill defined factors and uncertainties. What is more, these systems do not always allow the construction of a coherent, simple mathematical model. Academic solutions can be discarded from the start. A complete discussion of what makes cloud computing difficult, as a single example, could make the object of an article per se; see [25], [26], [27], [28], [29], [30] for additional information. At this point, evolutionary and nature inspired algorithms, extensively studied and refined over the past few decades to efficiently solve a wide range of problems [43], [44], [45], offer a simple solution to most of the concerns one needs to deal with.

C LOUD S IMULATION F RAMEWORKS

Analyzing algorithms can be a tedious process and one may want to use simulations before moving to a real-world testing environment. We offer a few pointers in the following with a bias for recent works on simulation frameworks that deal with energy efficiency as main objective. Younge et al. [24] describe an energy-efficient resource management framework called Green Cloud. The study looks at two major topics in the efficient cloud computing resource management area: (i) scheduling systems for virtual machines, and (ii) the design of virtual machine images based on cloud computing service-oriented models. Several results on energy efficiency are presented, including improvements in both power consumption (when using the proposed scheduling technique), and boot time, for custom-designed virtual images.

A simplistic example of energy-aware task scheduling using evolutionary computing can be found in Ying et al. [23]. The authors propose a basic model where makespan and energy consumption need to be minimized. A description of resources, tasks, energy consumption model and scheduling is first given, followed by an outline of the algorithm. A particular aspect of this work is that the two objectives are aggregated, as part of a first approach. The main drawback of a linear fitness aggregation or when using a weighting method, resides in the fact that it does not allow finding unsupported solutions [46], assumed to provide a better compromise between objectives than supported solutions. A double fitness approach, cf. Ying et al.’s terminology, is also considered where a selection procedure screens solutions by independently looking at makespan or energy consumption alone. No archiving technique is employed and a single solution is provided as result, i.e. not a front of best compromise solutions. Experiments and results are last described, based on the CloudSim [10] simulator.

Phan et al. [16] propose a novel framework, called Green Monster, to operate a federation of Internet data centers from a sustainability standpoint, i.e. maximize renewable energy consumption and savings from cooling energy. A DOE/ORNL Heat Pump Design Model is used to calculate performance coefficients, also assuming a free-cooling system when outdoor temperature is inferior to indoor conditions. An NSGA-II based evolutionary algorithm is used to assign and migrate workloads while taking into account capacity constraints. The framework is evaluated by running simulations for centers in different locations in Europe, with temperature variations modeled by data from the European Climate Assessment & Dataset (http://eca.knmi.nl); additional data on the renewable energy ratio used in each center is also considered. A single run spans one year of (simulated) time with the algorithm being executed on a bi-weekly basis. The results of the simulation, as the authors note, show that the framework outperforms conventional placement algorithms with respect to conflicting objectives.

A more realistic bi-objective hybrid metaheuristic for energy-aware scheduling in clouds is described in Mezmaz et

GreenCloud [50] is a packet-level simulator for energyaware cloud computing data centers. The simulator models the

1291

consumption of servers, switches, and links, in a data center, also allowing to setup realistic packet-level communication patterns. Two-tier, three-tier, and three-tier high-speed designs are available, furthermore allowing the use of e.g. dynamic shutdown or voltage and frequency scaling for computing and networking components. Last but not least, CloudSim [10] offers support for the simulation of energy-aware computational resources in large scale data centers and (federated) clouds. The framework is also capable to stop and resume simulations and includes features that allow performing dynamic changes, specify userdefined allocation and provisioning policies or analyze different network topologies. IV.

C ONCLUSION

acknowledges the support offered by the National Research Fund, Luxembourg, AFR contract no 4017742.

National Research Fund, Luxembourg, http://www.fnr.lu. R EFERENCES [1]

[2]

[3]

An outline of the more recent research on large scale data center and cloud computing management using computational intelligence was given in this article. Among other important aspects to consider, from our point of view, when dealing with, for example, optimization in data centers or clouds, one could mention (i) the large number of (not always explicit) parameters and internal or external factors, e.g. user activity profiles, renewable energy usage and resources, etc., (ii) additional complexity issues related to stochasticity and predictability, or (iii) the essential real-life, applied nature of the problem. Exact paradigms are not a feasible solution; static designs lack realism and fail on the long run to effectively deal with even a simple evolution pattern of requests, task arrival or failures; looking at only load balancing, scheduling or the optimization of one very specific resource alone does not solve the (real, overall) problem and, using different solutions, that try to independently address one or more of these aspects, may limit efficiency and lead to sub-optimal results. Last but not least, performance guarantees are desired though, if too difficult or impossible to achieve, statistical risk assessment should, in our view, be sufficient for most cases. A holistic model, in terms of the different issues being looked at as a target, and the use of approximation algorithms, should alleviate most of the problems we pointed at. Even if only relying on out of the box classical paradigms from, e.g. machine learning, statistical modeling, scheduling or nature inspired computing, one should be able to easily provide a simple and yet comprehensive toolbox for tackling, in a unified manner, large scale data center or cloud computing platforms as a whole. Some other areas that need investigation, not covered in this article and left for future work, include the use of parallel and distributed algorithms, automated decision making, robustness and sensitivity analysis, or the construction of strong, representative simulation benchmarks.

[4] [5]

[6] [7]

[8]

[9]

[10]

[11]

[12]

[13]

ACKNOWLEDGEMENT This study is supported by the CNRS, France, and the National Research Fund, Luxembourg, Multi-Objective Metaheuristics for Energy-Aware Scheduling in Cloud Computing Systems INTER/CNRS/11/03 Green@Cloud. The main expected outcome of this project is to construct an energyaware scheduling framework, able to reduce the energy needed for high-performance computing and networking operations, namely for large-scale distributed systems. B. Dorronsoro also

1292

[14]

[15]

G. E. Moore, “Lithography and the future of moore’s law, copyright 1995 ieee. reprinted with permission. proc. spie vol. 2437, pp. 2–17,” Solid-State Circuits Newsletter, IEEE, vol. 11, no. 5, pp. 37–42, 2006. J. Koomey, S. Berard, M. Sanchez, and H. Wong, “Implications of historical trends in the electrical efficiency of computing,” Annals of the History of Computing, IEEE, vol. 33, no. 3, pp. 46–54, 2011. R. Schaller, “Moore’s law: past, present and future,” Spectrum, IEEE, vol. 34, no. 6, pp. 52–59, 1997. S. Moore, “Computing’s power limit demonstrated,” Spectrum, IEEE, vol. 49, no. 5, pp. 14–16, 2012. Accenture, “Data Center Energy Forecast Report,” Congress on Server Data Center Energy Efficiency, Tech. Rep., 2008. [Online]. Available: http://www.accenture.com/us-en/Pages/insight-data-centerenergy-forecast-report.aspx J. M. Kaplan, W. Forrest, and N. Kindle, “Revolutionizing Data Center Energy Efficiency,” McKinsey & Company, Tech. Rep., Jul. 2008. A. Berl, E. Gelenbe, M. Di Girolamo, G. Giuliani, H. De Meer, M. Q. Dang, and K. Pentikousis, “Energy-efficient cloud computing,” Comput. J., vol. 53, no. 7, pp. 1045–1051, Sep. 2010. [Online]. Available: http://dx.doi.org/10.1093/comjnl/bxp080 F. Etro, “The economic impact of cloud computing on business creation, employment and output in europe. an application of the endogenous market structures approach to a gpt innovation,” Review of Business and Economics, vol. LIV, no. 2, pp. 179–208, 2009. [Online]. Available: http://EconPapers.repec.org/RePEc:ete:revbec:20090204 A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing,” Future Generation Computer Systems, vol. 28, no. 5, pp. 755 – 768, 2012, special Section: Energy efficiency in large-scale distributed systems. R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Buyya, “Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms,” Softw. Pract. Exper., vol. 41, no. 1, pp. 23–50, Jan. 2011. J. L. Berral, R. Gavalda, and J. Torres, “Adaptive scheduling on power-aware managed data-centers using machine learning,” in Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, ser. GRID ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 66–73. [Online]. Available: http://dx.doi.org/10.1109/Grid.2011.18 H. Goudarzi, M. Ghasemazar, and M. Pedram, “SLA-based optimization of power and migration cost in cloud computing,” in Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), ser. CCGRID ’12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 172–179. [Online]. Available: http://dx.doi.org/10.1109/CCGrid.2012.112 W.-R. Lee, H.-Y. Teng, and R.-H. Hwang, “Optimization of cloud resource subscription policy,” in Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on, dec. 2012, pp. 449 –455. M. Mezmaz, N. Melab, Y. Kessaci, Y. C. Lee, E. G. Talbi, A. Y. Zomaya, and D. Tuyttens, “A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems,” J. Parallel Distrib. Comput., vol. 71, no. 11, pp. 1497–1508, Nov. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.jpdc.2011.04.007 Y. C. Lee and A. Y. Zomaya, “Minimizing energy consumption for precedence-constrained applications using dynamic voltage scaling,” in Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, ser. CCGRID ’09. Washington, DC,

USA: IEEE Computer Society, 2009, pp. 92–99. [Online]. Available: http://dx.doi.org/10.1109/CCGRID.2009.16 [16]

[17]

D. H. Phan, J. Suzuki, R. Carroll, S. Balasubramaniam, W. Donnelly, and D. Botvich, “Evolutionary multiobjective optimization for green clouds,” in Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, ser. GECCO Companion ’12. New York, NY, USA: ACM, 2012, pp. 19– 26. [Online]. Available: http://doi.acm.org/10.1145/2330784.2330788 J. Prevost, K. Nagothu, B. Kelley, and M. Jamshidi, “Prediction of cloud data center networks loads using stochastic and neural models,” in System of Systems Engineering (SoSE), 2011 6th International Conference on, june 2011, pp. 276 –281.

[18]

Q. Li, “Applying stochastic integer programming to optimization of resource scheduling in cloud computing,” JNW, vol. 7, no. 7, pp. 1078– 1084, 2012.

[19]

V. Romanov, A. Varfolomeeva, and A. Koryakovskiy, “Branching processes theory application for cloud computing demand modeling based on traffic prediction,” in CAiSE Workshops, ser. Lecture Notes in Business Information Processing, M. Bajec and J. Eder, Eds., vol. 112. Springer, 2012, pp. 502–510.

[20]

M. Bajec and J. Eder, Eds., Advanced Information Systems Engineering Workshops - CAiSE 2012 International Workshops, Gdask, Poland, June 25-26, 2012. Proceedings, ser. Lecture Notes in Business Information Processing, vol. 112. Springer, 2012.

[34]

[35] [36]

[37]

[38]

[39]

[21]

S. T. Maguluri, R. Srikant, and L. Ying, “Stochastic models of load balancing and scheduling in cloud computing clusters,” in INFOCOM, A. G. Greenberg and K. Sohraby, Eds. IEEE, 2012, pp. 702–710.

[22]

A. G. Greenberg and K. Sohraby, Eds., Proceedings of the IEEE INFOCOM 2012, Orlando, FL, USA, March 25-30, 2012. IEEE, 2012.

[23]

Y. Chang-tian and Y. Jiong, “Energy-aware genetic algorithms for task scheduling in cloud computing,” in ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh, sept. 2012, pp. 43 –48.

[24]

A. Younge, G. von Laszewski, L. Wang, S. Lopez-Alarcon, and W. Carithers, “Efficient resource management for cloud computing environments,” in Green Computing Conference, 2010 International, aug. 2010, pp. 357 –364.

[41]

[25]

I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud computing and grid computing 360-degree compared,” 2008 Grid Computing Environments Workshop, vol. abs/0901.0, no. 5, pp. 1–10, 2008. [Online]. Available: http://arxiv.org/abs/0901.0131

[42]

[26]

M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Commun. ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010. [Online]. Available: http://doi.acm.org/10.1145/1721654.1721672

[43]

[27]

S. Subashini and V. Kavitha, “A survey on security issues in service delivery models of cloud computing,” Journal of Network and Computer Applications, vol. 34, no. 1, pp. 1 – 11, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1084804510001281

[28]

[29]

[30]

R. Buyya, R. Ranjan, and R. Calheiros, “Intercloud: Utility-oriented federation of cloud computing environments for scaling of application services,” in Algorithms and Architectures for Parallel Processing, ser. Lecture Notes in Computer Science, C.-H. Hsu, L. Yang, J. Park, and S.-S. Yeo, Eds. Springer Berlin Heidelberg, 2010, vol. 6081, pp. 13–31. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-13119-6 2 Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: stateof-the-art and research challenges,” Journal of Internet Services and Applications, vol. 1, pp. 7–18, 2010. [Online]. Available: http://dx.doi.org/10.1007/s13174-010-0007-6 K. Kumar and Y.-H. Lu, “Cloud computing for mobile users: Can offloading computation save energy?” Computer, vol. 43, no. 4, pp. 51–56, 2010.

[31]

B. Rimal, E. Choi, and I. Lumb, “A taxonomy and survey of cloud computing systems,” in INC, IMS and IDC, 2009. NCM ’09. Fifth International Joint Conference on, Aug., pp. 44–51.

[32]

S. Benedict, R. Rejitha, and C. Bright, “Energy consumption-based performance tuning of software and applications using particle swarm optimization,” in Software Engineering (CONSEG), 2012 CSI Sixth International Conference on, sept. 2012, pp. 1 –6.

[33]

Intel White Paper 30057701, “Wireless Intel SpeedStep power manager:

1293

[40]

[44] [45]

[46] [47]

[48]

[49]

[50]

optimizing power consumption for the Intel PXA27x processor family,” Intel, Tech. Rep., 2004. AMD , “Cool’n’Quiet Technology,” AMD, Tech. Rep. [Online]. Available: http://www.amd.com/us/products/technologies/cooln-quiet/Pages/cool-n-quiet.aspx D. S. Johnson, “Fast algorithms for bin packing,” Journal of Computer and System Sciences, vol. 8, no. 3, pp. 272 – 314, 1974. A. Lodi, S. Martello, and D. Vigo, “Recent advances on twodimensional bin packing problems,” Discrete Applied Mathematics, vol. 123, no. 13, pp. 379 – 396, 2002. J. L. Berral, I. n. Goiri, R. Nou, F. Juli`a, J. Guitart, R. Gavald`a, and J. Torres, “Towards energy-aware scheduling in data centers using machine learning,” in Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, ser. e-Energy ’10. New York, NY, USA: ACM, 2010, pp. 215–224. [Online]. Available: http://doi.acm.org/10.1145/1791314.1791349 J. M. Tirado, D. Higuero, F. Isaila, and J. Carretero, “Multimodel prediction for enhancing content locality in elastic server infrastructures,” in Proceedings of the 2011 18th International Conference on High Performance Computing, ser. HIPC ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 1–9. [Online]. Available: http://dx.doi.org/10.1109/HiPC.2011.6152728 M. Maggio, H. Hoffmann, M. D. Santambrogio, A. Agarwal, and A. Leva, “Decision making in autonomic computing systems: comparison of approaches and techniques,” in Proceedings of the 8th ACM international conference on Autonomic computing, ser. ICAC ’11. New York, NY, USA: ACM, 2011, pp. 201–204. [Online]. Available: http://doi.acm.org/10.1145/1998582.1998629 P. Bodik, “Automating datacenter operations using machine learning,” Ph.D. dissertation, EECS Department, University of California, Berkeley, Aug 2010. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010114.html J. Tirado, D. Higuero, F. Isaila, and J. Carretero, “Predictive data grouping and placement for cloud-based elastic server infrastructures,” in Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, May, pp. 285–294. L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,” Automatic Control, IEEE Transactions on, vol. 37, no. 12, pp. 1936 –1948, dec 1992. T. Back, D. B. Fogel, and Z. Michalewicz, Eds., Handbook of Evolutionary Computation, 1st ed. Bristol, UK, UK: IOP Publishing Ltd., 1997. K. A. D. Jong, Evolutionary computation - a unified approach. MIT Press, 2006. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1989. R. Steuer, Multiple Criteria Optimization: Theory, Computation and Application. John Wiley & Sons, 1986. S. Pandey, L. Wu, S. Guru, and R. Buyya, “A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments,” in Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, April, pp. 400–407. A. K. M. K. A. Talukder, M. Kirley, and R. Buyya, “Multiobjective differential evolution for scheduling workflow applications on global grids,” Concurrency and Computation: Practice and Experience, vol. 21, no. 13, pp. 1742–1756, 2009. [Online]. Available: http://dx.doi.org/10.1002/cpe.1417 A. Gorbenko and V. Popov, “Task-resource scheduling problem,” International Journal of Automation and Computing, vol. 9, pp. 429–441, 2012. [Online]. Available: http://dx.doi.org/10.1007/s11633012-0664-y D. Kliazovich, P. Bouvry, and S. U. Khan, “Greencloud: a packet-level simulator of energy-aware cloud computing data centers,” The Journal of Supercomputing, vol. 62, no. 3, pp. pp. 1263–1283, 2012.