A Hierarchical Approach for the Resource ... - Semantic Scholar

5 downloads 30726 Views 2MB Size Report
each server hosts several VMs. ... errors and identifying the best fine-grained timescale for ..... maximum service rate of a unit capacity server dedicated to.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 10,

NO. 5,

SEPTEMBER/OCTOBER 2013

253

A Hierarchical Approach for the Resource Management of Very Large Cloud Platforms Bernardetta Addis, Danilo Ardagna, Barbara Panicucci, Mark S. Squillante, Fellow, IEEE, and Li Zhang Abstract—Worldwide interest in the delivery of computing and storage capacity as a service continues to grow at a rapid pace. The complexities of such cloud computing centers require advanced resource management solutions that are capable of dynamically adapting the cloud platform while providing continuous service and performance guarantees. The goal of this paper is to devise resource allocation policies for virtualized cloud environments that satisfy performance and availability guarantees and minimize energy costs in very large cloud service centers. We present a scalable distributed hierarchical framework based on a mixed-integer nonlinear optimization of resource management acting at multiple timescales. Extensive experiments across a wide variety of configurations demonstrate the efficiency and effectiveness of our approach. Index Terms—Performance attributes, performance of systems, quality concepts, optimization of cloud configurations, QoS-based scheduling and load balancing, virtualized system QoS-based migration policies

Ç 1

INTRODUCTION

M

ODERN cloud

computing systems operate in a new and dynamic world, characterized by continual changes in the environment and in the system and performance requirements that must be satisfied. Continuous changes occur without warning and in an unpredictable manner, which are outside the control of the cloud provider. Therefore, advanced solutions need to be developed that manage the cloud system in a dynamically adaptive fashion, while continuously providing service and performance guarantees. In particular, recent studies [18], [9], [25] have shown that the main challenges faced by cloud providers are to 1) reduce costs, 2) improve levels of performance, and 3) enhance availability and dependability. Within this framework, it should first be noted that, at large service centers, the number of servers are growing significantly and the complexity of the network infrastructure is also increasing. This leads to an enormous spike in electricity consumption: IT analysts predict that by the end of 2012, up to 40 percent of the budgets of cloud service centers will be devoted to energy costs [13], [14]. Energy efficiency is, therefore, one of the main focal points on

. B. Addis is with the Dipartimento di Informatica, Universita` degli Studi di Torino, C.So Svizzera 185, Torino 10149, Italy. E-mail: [email protected]. . D. Ardagna is with the Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano, Via Golgi 42, Milano 20133, Italy. E-mail: [email protected]. . B. Panicucci is with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, and the Dipartimento di Scienze e Metodi dell’Ingegneria, Universita` di Modena e Reggio Emilia, Via Amendola 2, Reggio Emilia 42122, Italy. E-mail: [email protected]. . M.S. Squillante and L. Zhang are with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598. E-mail: {mss, zhangli}@us.ibm.com. Manuscript received 15 July 2012; revised 16 Nov. 2012; accepted 2 Jan. 2013; published online 9 Jan. 2013. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TDSCSI-2012-07-0166. Digital Object Identifier no. 10.1109/TDSC.2013.4. 1545-5971/13/$31.00 ! 2013 IEEE

which resource management should be concerned. In addition, providers need to comply with service-level agreement (SLA) contracts that determine the revenues gained and penalties incurred on the basis of the level of performance achieved. Quality of service (QoS) guarantees have to be satisfied despite workload fluctuations, which could span several orders of magnitude within the same business day [23], [14]. Moreover, if end-user data and applications are moved to the cloud, infrastructures need to be as reliable as phone systems [18]: Although 99.9 percent up-time is advertised, several hours of system outages have been recently experienced by some cloud market leaders (e.g., the Amazon EC2 2011 Easter outage and the Microsoft Azure outage on 29 February 2012, [9], [44]). Currently, infrastructure as a service (IaaS) and platform as a service (PaaS) providers include in SLA contracts only availability, while performance are neglected. In case of availability violations (with current figures, even for large providers, being around 96 percent, much lower than the values stated in their contracts [12]), customers are refunded with credits to use the cloud system for free. The concept of virtualization, an enabling technology that allows sharing of the same physical machine by multiple end-user applications with performance guarantees, is fundamental to developing appropriate policies for the management of current cloud systems. From a technological perspective, the consolidation of multiple user workloads on the same physical machine helps to reduce costs, but this also translates into higher utilization of the physical resources. Hence, unforeseen load fluctuations or hardware failures can have a greater impact among multiple applications, making cost-efficient, dependable cloud infrastructures with QoS guarantees of paramount importance to realize broad adoption of cloud systems. A number of self-managing solutions have been developed that support dynamic allocation of resources among running applications on the basis of workload predictions. Published by the IEEE Computer Society

254

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

Most of these approaches (e.g., [15]) have implemented centralized solutions: A central controller manages all resources comprising the infrastructure, usually acting on an hourly basis. This timescale is often adopted [8] because it represents a compromise in the tradeoff between the frequency of decisions and the overhead of these decisions. For example, self-managing frameworks [35], [8] may decide to shut down or power up a server, or to move virtual machines (VMs) from one server to another by exploiting the VM live-migration mechanism. However, these actions consume a significant amount of energy and cannot be performed too frequently. As previously noted, cloud systems are continuously growing in terms of size and complexity: Today, cloud service centers include up to 10,000 servers [9], [25], and each server hosts several VMs. In this context, centralized solutions are subject to critical design limitations [9], [7], [53], including a lack of scalability and expensive monitoring communication costs, and cannot provide effective control within a 1-hour time horizon. In this paper, we take the perspective of a PaaS provider that manages the transactional service applications of its customers to satisfy response time and availability guarantees and minimize energy costs in very large cloud service centers. We propose a distributed hierarchical framework [39] based on a mixed-integer nonlinear optimization of resource management across multiple timescales. At the root of the hierarchy, a central manager (CM) partitions the workload (applications) and resources (physical servers) over a 24-hour horizon to obtain clusters with homogeneous workload profiles one day in advance. For each cluster, an application manager (AM) determines the optimal resource allocation in a centralized manner across a subsystem of the cloud platform on a hourly basis. More precisely, AMs can decide the set of applications executed by each server of the cluster (i.e., application placement), the request volumes at various servers inside the cluster (i.e., load balancing), and the capacity devoted to the execution of each application at each server (i.e., capacity allocation). AMs can also decide to switch servers into a low-power sleep state depending on the cluster load (i.e., server switching) or to reduce the operational frequency of servers (i.e., frequency scaling). Furthermore, because load balancing, capacity allocation, and frequency scaling require a small amount of time to be performed, they are executed by AMs every few (5-15) minutes. The periodicity of the placement decisions is investigated within the experimental analysis in Section 6. Extensive experiments across a wide variety of configurations demonstrate the effectiveness of our approach, where our solutions are very close to those achieved by a centralized method [3], [8] that requires orders of magnitude more time to solve the global resource allocation problem based on a complete view of the cloud platform. Our solution further renders net-revenue improvements of around 2530 percent over other heuristics from the literature [41], [50], [61], [57] that are currently deployed by IaaS providers [6]. Moreover, our algorithms are suitable for parallel implementations that achieve nearly linear speedup. Finally, we have also investigated the mutual influences and interrelationships among the multiple timescales (24 hours/1 hour/

VOL. 10, NO. 5,

SEPTEMBER/OCTOBER 2013

5-15 minutes) by varying the characteristics of the prediction errors and identifying the best fine-grained timescale for runtime control. The remainder of the paper is organized as follows: Section 2 reviews previous work from the literature, while Section 3 introduces our hierarchical framework. The optimization formulations and their solutions are presented in Sections 4 and 5, respectively. Section 6 is dedicated to experimental results, with concluding remarks in Section 7.

2

RELATED WORK

Many solutions have been proposed for the self-management of cloud service centers, each seeking to meet application requirements while controlling the underlying infrastructure. Five main problem areas have been considered in system allocation policy design: 1. application/VM placement, 2. admission control, 3. capacity allocation, 4. load balancing, and 5. energy consumption. While each area has often been addressed separately, it is noteworthy that these problem solutions are closely related. For example, the request volume determined for a given application at a given server depends on the capacity allocated to that application on the server. A primary contribution of this paper is to integrate all five problem areas within a unifying framework, providing very efficient and robust solutions at multiple timescales. Three main approaches have been developed: 1) control theoretic feedback loop techniques, 2) adaptive machine learning approaches, and 3) utility-based optimization techniques. A main advantage of a control theoretic feedback loop is system stability guarantees. Upon workload changes, these techniques can also accurately model transient behavior and adjust system configurations within a transitory period, which can be fixed at design time. A previous study [35] implemented a limited look-ahead controller that determines servers in the active state, their operating frequency, and the allocation of VMs to physical servers. However, this implementation considers the VM placement and capacity allocation problems separately, and scalability of the proposed approach is not considered. A recent study [55], [53] proposed hierarchical control solutions, particularly providing a cluster-level control architecture that coordinates multiple server power controllers within a virtualized server cluster [55]. The higher layer controller determines capacity allocation and VM migration within a cluster, while the inner controllers determine the power level of individual servers. However, application and VM placement within each service center cluster is assumed to be given. Similarly, coordination among multiple power controllers acting at rack enclosures and at the server level is done without providing performance guarantees for the running applications [53]. Machine learning techniques are based on live system learning sessions, without a need for analytical models of applications and the underlying infrastructure. A previous study [30] applied machine learning to coordinate multiple

ADDIS ET AL.: A HIERARCHICAL APPROACH FOR THE RESOURCE MANAGEMENT OF VERY LARGE CLOUD PLATFORMS

255

Fig. 1. Hierarchical resource management framework and interrelationship among the timescales T1 ; T2 , and T3 .

autonomic managers with different goals, integrating a performance manager with a power manager to satisfy performance constraints while minimizing energy expenses exploiting server frequency scaling. Recent studies provide solutions for server provisioning and VM placement [36] and propose an overall framework for autonomic cloud management [59]. An advantage of machine learning techniques is that they accurately capture system behavior without any explicit performance or traffic model and with little built-in system-specific knowledge. However, training sessions tend to extend over several hours [30], retraining is required for evolving workloads, and existing techniques are often restricted to separately applying actuation mechanisms to a limited set of managed applications. Utility-based approaches have been introduced to optimize the degree of user satisfaction by expressing their goals in terms of user-level performance metrics. Typically, the system is modeled by means of a performance model embedded within an optimization framework. Optimization can provide global optimal solutions or suboptimal solutions by means of heuristics, depending on the complexity of the optimization model. Optimization is typically applied to each one of the five problems separately. Some research studies address admission control for overload protection of servers [28]. Capacity allocation is typically viewed as a separate optimization activity that operates under the assumption that servers are protected from overload. VM placement recently has been widely studied [15], [13], [20]. A previous study [61] has presented a multilayer and multiple timescale solution for the management of virtualized systems, but the tradeoff between performance and system costs is not considered. A hierarchical framework for maximizing cloud provider profits, taking into account server provisioning and VM placement problems, has been similarly proposed [14], but only very small systems have been analysed. The problem of VM placement within an IaaS provider has been considered [31], providing a novel stochastic performance model for estimating the time required for VMs startup. However, each VM is considered as a black box and performance guarantees at the application level are not provided. Finally, frameworks for the colocation of VMs into clusters based on an analysis of 24-hour application workload profiles have been proposed [26], [23], but only the server provisioning and capacity allocation problems

have been solved without considering the frequency scaling mechanism available with current technology. In this paper, we propose a utility-based optimization approach, extending our previous works [39], [3], [8] through a scalable hierarchical implementation across multiple timescales. In particular, [39] has provided a theoretical framework investigating the conditions under which a hierarchical approach can provide the same solution of a centralized one. The hierarchical implementation we propose here is built on that framework and includes the CM and the AMs, whereas in [3], [8], only a CM was considered, which implied more time to solve the global resource allocation problem as well as a complete view of the cloud platform. To the best of our knowledge, this is the first integrated framework that also provides availability guarantees to running applications, because all foregoing work considered only performance and energy management problems.

3

HIERARCHICAL RESOURCE MANAGEMENT

This section provides an overview of our approach for hierarchical resource management across multiple timescales in very large scale cloud platforms. The main components of our runtime management framework are described in Section 3.1, while our performance and power models are discussed in Section 3.2.

3.1 Runtime Framework We assume the PaaS provider supports the execution of multiple transactional services, each representing a different customer application. The hosted services can be heterogeneous with respect to resource demands, workload intensities, and QoS requirements. Services with different quality and workload profiles are categorized into independent request classes, where the system overall serves a set R of request classes. Fig. 1 depicts the architecture of the service center under consideration. The system includes a set S of heterogeneous servers, each running a virtual machine monitor (VMM) (such as IBM POWER Hypervisor, VMWare or Xen) configured in non-work-conserving mode (i.e., computing resources are capped and reserved for the execution of individual VMs). The physical resources (e.g., CPU, disks, communication network) of a server are partitioned among

256

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

multiple VMs, each running a dedicated application. Among the many physical resources, we focus on the CPU and RAM as representative resources for the resource allocation problem, consistent with [47], [50], [15]. Any class can be supported by multiple application tiers (e.g., front end, application logic, and data management). Each VM hosts a single application tier, where multiple VMs implementing the same application tier can be run in parallel on different hosts. Under light load conditions, servers may be shutdown or placed in a stand-by state and moved to a free server pool to reduce energy costs. These servers may be subsequently moved back to an active state and reintroduced into the running infrastructure during peaks in the load. To go into details, we assume that each server has a single CPU supporting dynamic voltage and frequency scaling (DVFS) by choosing both its supply voltage and operating frequency from a limited set of values, noting that this single-CPU assumption is without loss of generality in heavy traffic [56] and can be easily relaxed in general. The adoption of DVFS is very promising because it does not introduce any system overhead, while hibernating and restoring a server both require time and energy. Following [43], [46], we adopt full system power models and assume that the power consumption of a server depends on its operating frequency/voltage as well as the current CPU utilization. Moreover, as in [35], [30], [24] but without limiting the generality, servers performance is assumed to be linear in its operating frequency. Finally, each physical server s has availability as , and to provide availability guarantees, the VMs running the application tiers are replicated on multiple physical servers and work under the so-called load-sharing configuration [10]. Our resource management framework is based on a hierarchical architecture [39]. At the highest level of the hierarchy, a CM acts on a timescale of T1 (i.e., every 24 hours) and is responsible for partitioning the classes and servers into clusters one day in advance (long-term problem). The objective is to obtain a set of clusters C, each element of which has a homogeneous profile to reduce server switching at finer-grained timescales. Furthermore, we denote by Rc the set of request classes assigned to cluster c 2 C and by S tc the set of physical servers in cluster c at time interval t. At a lower level of the hierarchy, AMs centrally manage individual clusters acting on a timescale of T2 (i.e., on an hourly basis). Each AM can decide (medium-term problem): 1. 2.

application placement within the cluster, load balancing of requests among parallel VMs supporting the same application tier, 3. capacity allocation of competing VMs running on each server, 4. switching servers into active or low-power sleep states, and 5. increasing/decreasing the operating frequency operation of the CPU of a server. Furthermore, because the load balancing, capacity allocation, and frequency scaling optimization do not add significant system overhead, they are performed by AMs more frequently (short-term problem) on a timescale of T3 (i.e., every 5-15 minutes).

VOL. 10, NO. 5,

SEPTEMBER/OCTOBER 2013

The decisions that take place at each timescale are obtained according to a prediction of the incoming workloads, as in [61], [4], [16], [39], [7]. Concerning this prediction, several methods have been adopted in many fields for decades [16] (e.g., ARMA models, exponential smoothing, and polynomial interpolation), making them suitable to forecast seasonal workloads, common at timescale T1 , as well as runtime and nonstationary request arrivals characterizing timescale T3 . In general, each prediction mechanism is characterized by several alternative implementations, where the choice about filtering or not filtering input data—a runtime measure of the metric to be predicted—and choosing the best model parameters in a static or dynamic way are the most significant. By considering multiple timescales, we are able to capture two types of information related to cloud sites [61]. The fine-grained workload traces at timescale T3 (see also the following section) exhibit a high variability due to the short-term variations of typical web-based workloads [16]. For this reason, the finer-grained timescale provides useful information to react quickly to workload fluctuations. At coarser-grained timescales T1 and T2 , the workload traces are more smooth and not characterized by the instantaneous peaks typical at the finer-grained timescale. These characteristics allow us to use the medium and long timescale algorithms to predict the workload trend that represents useful information for resource management. Note that a hierarchical approach is also beneficial to reduce the monitoring data required to compute workload predictions. Indeed, a centralized solution would require collection of central node application workloads at timescale T3 . Vice versa, with a hierarchical solution AMs need to provide aggregated data (i.e., the overall incoming workload served for applications) of the individual managed partitions to the CM at timescale T1 . Dependability and fault tolerance are also important issues in developing resource managers for very large infrastructures. There has been research employing peer-topeer designs [2] or using gossip protocols [58] to increase dependability. In [2], a peer-to-peer overlay is constructed to disseminate routing information and retrieve states/ statistics. Wuhib et al. [58] propose a gossip protocol to compute, in a distributed and continuous fashion, a heuristic solution to a resource allocation problem for a dynamically changing resource demand. In our solution, to provide better dependability for the management infrastructure, each cluster maintains a primary and a backup AM. The primary and the backup do not need to be tightly synchronized to the second level, as the decisions they are making are at a timescale of 5-15 minutes. Similarly, a backup CM is also maintained in our solution. We will investigate the peer-to-peer management infrastructure as part of future work. Finally, an SLA contract, associated with each service class, is established between the PaaS provider and its customers. This contract specifies the QoS levels that the PaaS provider must satisfy when responding to customer requests for a given service class, as well as the corresponding pricing scheme. Following [52], [11] (but without limiting the generality of our approach), for each class

ADDIS ET AL.: A HIERARCHICAL APPROACH FOR THE RESOURCE MANAGEMENT OF VERY LARGE CLOUD PLATFORMS

257

Fig. 2. Requests utility functions, system performance, and power models.

r 2 R, a linear utility function specifies the per request revenue (or penalty) U r incurred when the average end-toend response time Rr realizes a given value: U r ¼ " !r # Rr þ ur , (see Fig. 2a). The slope of the utility function is "!r such that !r ¼ ur =Rr > 0, where ur is the linear utility function y-axis intercept and Rr is the threshold identifying the revenue/penalty region, i.e., if Rr > Rr the SLA is violated and penalties are incurred. SLA contracts also involve multiclass average response time metrics for virtualized systems (whose evaluation is discussed in the next section) and the minimum level of availability Ar required for each service class r.

3.2 System Performance and Power Models The cloud service center is modeled as a queuing network composed of a set of multiclass single-server queues and a delay center (see Fig. 2b). Each layer of queues represents the collection of applications supporting the execution of requests at each tier. The delay center represents the clientbased delays, or think time, between the completion of one request and the arrival of the subsequent request within a session. There is a different set of application tiers Ar that support the execution of class r requests. This section focuses on the medium-term timescale T2 , noting that similar considerations can be drawn for the timescales T1 and T3 . The relationships among the timescales T1 , T2 , and T3 are illustrated in Fig. 1. We denote by nT2 (nT3 ) the number of time intervals at granularity T2 (T3 ) within the time period T1 (T2 ), which can be expressed as nT2 ¼ T1 =T2 (nT3 ¼ T2 =T3 ) and assumed to be an integer. In a given time interval, user sessions begin with a class r request arriving to the service center from an exogenous source with average rate "tr ðT2 Þ. Upon completion, the request either returns to the system as a class r0 request with P probability #r;r0 or departs with probability 1 " l2R #r;l . The aggregate arrival rate for class r requests, !tr ðT2 Þ, is given by the exogenous arrival rates and transition probabilities: P !tr ðT2 Þ ¼ "tr ðT2 Þ þ r0 2R !tr0 ðT2 Þ # #r0 ;r . Let $r;% denote the maximum service rate of a unit capacity server dedicated to the execution of an application of class r at tier %. We assume requests on every server at different application tiers within each class are executed according to the processor sharing scheduling discipline (a common model of time sharing among web servers and application containers [4]), where the service time of class r requests at server s follows a general distribution having mean ðCs;f # $r;% Þ"1 , with Cs;f denoting

the capacity of server s at frequency f [32], [19]. The service rates $r;% can be estimated at runtime by monitoring application execution (see, e.g., [40]). They can also be estimated at a very low overhead by a combination of inference and filtering techniques developed in [60], [34], [33]. Let "ts;r;% denote the execution rate for class r requests at tier % on server s in time interval t, and let &ts;r;% denote the fraction of VM capacity for class r requests at tier % on server s in time interval t. Once a capacity assignment has been made, VMMs guarantee that a given VM and hosted application tier will receive their assigned resources (CPU capacity and RAM allotment), regardless of the load imposed by other applications. Under the non-workconserving mode, each multiclass single-server queue associated with a server application can be decomposed into multiple independent single-class single-server M/G/1 queues with capacity equal to Cs;f # &ts;r;% . Then, the average response time for class r requests, Rr , is equal to the sum of the average response times for class r requests executed at tier % on server s working at frequency f, Rfs;r;% , each of which can be evaluated as 1 Cs;f # $r;% # &ts;r;% " "ts;r;% X ! " 1 Rr ¼ t "ts;r;% # Rfs;r;% : !r ðT2 Þ s2S;%2A

Rfs;r;% ¼

ð1Þ

r

Although system performance metrics have been evaluated by adopting alternative analytical models (e.g., [41], [1], [54]) and accurate performance models exist in the literature for web systems (e.g., [45], [28]), there is a fundamental tradeoff between the accuracy of the models and the time required to estimate system performance metrics. Given the strict time constraints for evaluating performance metrics in our present setting, the high complexity of analyzing even small-scale instances of existing performance models has prevented us from exploiting such results here. It is important to note, however, that highly accurate approximations for general queuing networks [22], [21], having functional forms similar to those above, can be directly incorporated into our modeling and optimization frameworks. Energy-related costs are computed using the full system power models adopted in [43], [46], which yield server energy consumption estimate errors within 10 percent. Servers in a low-power sleep state still consume energy (cost parameter cs ) [35], whereas its dynamic operating cost when

258

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 10, NO. 5,

SEPTEMBER/OCTOBER 2013

TABLE 1 Multitime Scale Resource Management Framework Parameters

running depends on a constant base cost cs þ ps;f (function of the operating frequency f) and an additional cost component that varies linearly with CPU utilization Us having slope bs;f [43] (see Fig. 2c). Following [35], [50], we further consider the switching cost css that includes the power consumption costs incurred when starting up the server from a low-power sleep state and, possibly, the revenue lost when the computer is unavailable to perform any useful service for a period of time. Finally, to limit the number of VM movements in the system, we add a cost associated with VM movement, cm, which avoids system instability and minimizes the total number of application starts and stops [35], [50]. The cost cm has been evaluated by considering the cost of the use of a server to suspend, move and startup a VM according to the values reported in [49], [29]. Table 1 summarizes the notation adopted in this paper.

4

FORMULATION OF OPTIMIZATION PROBLEMS

In this section, we introduce resource management optimization problems at multiple timescales (T1 , T2 , T3 ). The general objective is to maximize profit, namely the difference between revenues from SLA contracts and costs associated with servers switching and VM migrations, while providing availability guarantees. We formulate the optimization problem at the timescale of T2 (medium-term problem) from which, at the end of this section, will straightforwardly render the formulations at the other timescales. As previously explained, the CM determines one day in advance the class partition Rc and the server

assignment S tc for each cluster c and every time interval t ¼ 1; . . . ; nT2 . The AM controlling cluster c has to determine at timescale T2 : 1.

the servers in active state, introducing for each server s a binary variable xts equal to 1 if server s is running in interval t and 0 otherwise; 2. the operating frequency of the CPU of each server, given by the binary variable yts;f equal to 1 if server s is working with frequency f in interval t and 0 otherwise; 3. the set of applications executed by each server, introducing the binary variable zts;r;% equal to 1 if the application tier % for class r is assigned to server s in interval t and 0 otherwise; 4. the rate of execution for class r at tier % on server s in interval t (i.e., "ts;r;% ); 5. the fraction of capacity devoted to executing VM at tier % for class r at server s in interval t (i.e., &ts;r;% ). We also introduce an auxiliary binary variable wts;r equal to 1 if at least an application tier % for class r is assigned to server s in interval t and 0 otherwise. The revenue for executing class r requests is given by T2 U r !tr ðT2 Þ ¼ T2 # ður " !r # Rr Þ # !tr ðT2 Þ, whereas the total cost is given by the sum of costs for servers in the low-power sleep state, for using servers according to their operating frequency, and for penalties incurred when switching servers from the low-power sleep state to the active state while migrating VMs. Since the utilization of server s when it is operating at frequency f is

ADDIS ET AL.: A HIERARCHICAL APPROACH FOR THE RESOURCE MANAGEMENT OF VERY LARGE CLOUD PLATFORMS

Us ¼

X

"ts;r;%

!P

t f2F s Cs;f ys;f

%2Ar ;r2Rc

"

$r;%

"ts;r;%