DREAM: Dynamic Resource and Task Allocation for ... - IEEE Xplore

18 downloads 266 Views 2MB Size Report
the local devices to cloud servers where the objective is to minimize the cost, e.g., energy consumption or processing delay. The MCC is already extensively ...
2510

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

DREAM: Dynamic Resource and Task Allocation for Energy Minimization in Mobile Cloud Systems Jeongho Kwak, Member, IEEE, Yeongjin Kim, Student Member, IEEE, Joohyun Lee, Member, IEEE, and Song Chong, Member, IEEE

Abstract—To cope with increasing energy consumption in mobile devices, the mobile cloud offloading has received considerable attention from its ability to offload processing tasks of mobile devices to cloud servers, and previous studies have focused on single type tasks in fixed network environments. However, real network environments are spatio-temporally varying, and typical mobile devices have not only various types of tasks, e.g., network traffic, cloud offloadable/nonoffloadable workloads but also capabilities of CPU frequency scaling and network interface selection between WiFi and cellular. In this paper, we first jointly consider the following three dynamic problems in real mobile environments: 1) cloud offloading policy, i.e., determining to use local CPU resources or cloud resources; 2) allocation of tasks to transmit through networks and to process in local CPU; and 3) CPU clock speed and network interface controls. We propose a DREAM algorithm by invoking the Lyapunov optimization and mathematically prove that it minimizes CPU and network energy for given delay constraints. Trace-driven simulation based on real measurements demonstrates that DREAM can save over 35% of total energy than existing algorithms with the same delay. We also design DREAM architecture and demonstrate the applicability of DREAM in practice. Index Terms—Mobile cloud offloading policy, CPU/network speed scaling, resource and task allocation, energy minimization.

I. I NTRODUCTION

A

S HIGH processing-demand applications become popular and networking demands increase in mobile devices, energy consumption in processing and networking is consistently growing. Maximum CPU clock frequency is speeding up (e.g., 1.0 GHz-2.5 GHz from Samsung Galaxy S1 to S5) to meet increasing demand of applications. Operation at higher CPU Manuscript received March 29, 2015; revised July 20, 2015; accepted September 3, 2015. Date of publication September 14, 2015; date of current version November 16, 2015. This work was supported by the Institute for Information and Communications Technology Promotion (IITP) funded by the Korea government (MSIP) (Grant B0190-15-2017, Resilient/FaultTolerant Autonomic Networking Based on Physicality, Relationship and Service Semantic of IoT Devices). (Corresponding author: Yeongjin Kim.) J. Kwak was with the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Korea. He is now with the INRS-EMT, Montréal, QC, Canada, and also with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada (e-mail: [email protected]). Y. Kim and S. Chong are with the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Korea (e-mail: [email protected]; [email protected]). J. Lee was with the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Korea. He is now with the Department of Electrical and Computer Engineering, Ohio State University, Columbus, OH 43210 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSAC.2015.2478718

clock frequency results in higher amount of energy consumption due to the fact that CPU power is a superlinearly increasing function of clock frequency [1]. Also, according to the forecasts of Cisco [2], mobile traffic will increase nearly tenfold between 2014 and 2019, which implies that mobile devices will consume much higher networking energy. To reduce such energy consumption in mobile devices, mobile cloud computing and CPU/network speed scaling of mobile devices have recently attracted a lot of attentions. Mobile cloud computing (MCC), which is also called mobile code offloading, is to offload computation workloads from the local devices to cloud servers where the objective is to minimize the cost, e.g., energy consumption or processing delay. The MCC is already extensively utilized as a form of commercial cloud services such as Windows Azure [3]. However, because the mobile cloud offloading consumes network energy in transferring computing workloads to cloud servers whereas local computing consumes local CPU energy in mobile devices, an offloading policy is typically determined by comparing the energy efficiency between CPU and networking, which is defined as processing (CPU) or transmitting (networking) amount of workloads in bits divided by the energy consumption. CPU energy efficiency in processing the same amount of data can be influenced by two factors: types of workloads and CPU clock frequency. (i) As each workload generally requires different cycles in processing the same amount of data in bits1 , the energy consumption in processing the same amount of workloads in bits is different depending on the types of workloads. For example, because the computation amount for a face recognition application is very large compared to the size of images [4], applications with high processing density lead mobile devices to consume high local CPU energy in processing a unit workload (in bits). (ii) Because CPU processing energy is superlinearly increasing with clock frequency [1], the energy consumption in processing the same amount of workloads varies with clock frequency. Energy efficiency of networking is affected by not only the type of network interfaces (cellular/WiFi) but also wireless channel states. For example, let us assume that a mobile device is located within the coverage of WiFi networks with higher data rates than cellular networks. Then, the device will consume much lower networking energy per bit through WiFi networks compared to slow cellular networks. 1 We denote this notion by processing density in unit of cycles/bit. We will also define the notion in Section III.

0733-8716 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

2511

Fig. 2. Framework for a mobile cloud system.

Fig. 1. Plots for motivation of dynamic cloud offloading policy and CPU/network speed scaling.

Previous studies related to the mobile cloud offloading tried to minimize energy consumption and processing delay of mobile devices under a fixed size of workloads, static network environments and outdated device functionalities [4]–[6]. Under real network environments, however, workload arrivals and wireless channels are not static, but temporally dynamic, and WiFi networks are even partially available depending on the mobility of mobile users. Moreover, the mobile users can dynamically control CPU clock frequency of devices and select network interface between WiFi and cellular in practice. Therefore, the cloud offloading policy and CPU/network speed scaling (CPU clock speed control and network interface selection) should be jointly controlled by capturing the dynamics of real mobile network environment. A dynamic cloud offloading policy and CPU/network speed scaling policy exploits variations of workload arrivals, network conditions and processing density of workloads to maximize energy efficiency of mobile devices for given processing delay constraints. For example, as one extreme, if network condition is bad (e.g., only 3G connection with low data rates is available), processing density of a target workload is low and the workload arrival rate is low, then an offloading policy may prioritize local CPU resources with low clock frequency to reduce energy consumption. Also, as the other extreme, if network condition is good (e.g., WiFi networks with high data rates are available to users), processing density of a target workload is high and the workload arrival rate is high, then an offloading policy is likely to prefer to use cloud computing resources by transmitting the workloads through the energy efficient WiFi networks. Then, how effective is the dynamic mobile cloud offloading policy and CPU/network speed scaling in practical settings of CPU/network speed and energy consumption? To demonstrate this, we depict the average power consumptions of CPU and network interfaces (LTE/WiFi) as functions of their speeds (in Mbits/sec)2 in Fig. 1(a), which is based on [7] (for network power) and our measurements in Section V (for CPU power). Also, Fig. 1(b) illustrates temporal variations in energy efficiency (in bits per Watt), defined as the processing speed (CPU) or transmitting speed (networking) divided by the energy 2 The widely used unit of CPU clock speed is cycles/sec. By dividing the CPU clock speed (in cycles/sec) into given processing density (in cycles/Mbit), we can transform the unit of CPU clock speed into Mbits/sec in Fig. 1.

consumption, for different CPU clock speeds, processing densities and network interfaces. From these graphs, we observe that the most energy efficient selection among various CPU and network resources varies with several factors such as processing density of workloads, CPU clock speed and temporal channel/network connectivity (LTE or WiFi availability) variations. Therefore, there is no static optimal cloud offloading policy and CPU/network speed scaling policy, and a resourceefficient policy should dynamically change the resource allocation by considering workload arrival, processing density of workloads and the variations of network conditions. Moreover, CPU speed control and network interface selection problems are coupled with the cloud offloading problem in the practical range of CPU/network speed and energy consumption. Though dynamic resource allocation (i.e., a cloud offloading policy, selections of CPU speed and active network interface) can save more energy, if we only consider the offloadable workloads for the offloading policy, it may cause a resource interference problem with other types of workloads such as non-offloadable workloads and network traffic (see Fig. 2)3 ). For example, suppose that a mobile device simultaneously processes or transfers offloadable workloads (e.g., processing for chess game or face recognition [8]), non-offloadable workloads (e.g., processing in OS-level of the device) and uplink network traffic (e.g., transferring files to a cloud server such as Dropbox [9])4 . If a cloud offloading policy selects local CPU resources, the offloadable workloads would interfere with the non-offloadable workloads because the local CPU simultaneously make an attempt to process two tasks. If it selects network resources, offloadable workloads would interfere with network traffic because of a similar reason in the previous scenario. Therefore, the interference problem should be addressed in the dynamic resource allocation. In this paper, we propose a dynamic CPU/network resource and task allocation algorithm in mobile cloud systems, called DREAM to minimize energy consumption of a mobile device for given delay constraints considering network traffic and offloadable/non-offloadable workloads in a unified framework. To the best of our knowledge, the DREAM algorithm is first to jointly optimize cloud offloading policy and CPU/network speed scaling by invoking Lyapunov optimization technique [11] in dynamic mobile network environment so as to answer how much energy can be saved further by the joint optimization. 3 From now on, we call these workloads and network traffic as task for a unified terminology. 4 We assume that uplink and downlink are absolutely separated such as LTE system [10], so there is no interference between uplink traffic and downlink traffic.

2512

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

More specifically, DREAM controls (i) task (traffic/workload) type selection for CPU and networking resources, and (ii) clock speed scaling in CPU and interface selection in networking. We run trace-driven simulations over real measurements of 3G/LTE/WiFi coverage and data rates, energy consumption of CPU and 3G/LTE/WiFi interfaces in popular smartphone models (e.g., Galaxy Note and Nexus S). Also, we design DREAM architecture and implement it on an Android platform and run experiments using our server installed in a commercial Windows Azure mobile cloud system [3]. The contributions of this paper are summarized as follows. • The proposed DREAM algorithm is the first to jointly optimize dynamic cloud offloading policy and CPU/network speed scaling in dynamic mobile network environment by invoking the Lyapunov optimization method, and is shown to be near-optimal in the sense that it minimizes CPU and network energy consumption for given delay constraints. • The DREAM algorithm is the first to address the resource interference problem among different types of traffic/workloads arose from dynamic mobile cloud offloading. We mitigate the interference problem by introducing a joint scheduler considering all types of traffic/workloads. • Trace-driven simulations based on real measurements of data rates and energy consumptions demonstrate that the DREAM algorithm not only has higher energy saving than existing algorithms (over 35% more energy saving for 9 min. delay) but also guarantees the fairness among different types of tasks in terms of processing or transferring delay. • We design and implement DREAM architecture on an Android platform, and run it over a real mobile cloud system, Windows Azure. Experimental results using a popular smartphone elucidate why the DREAM algorithm has energy and delay gains in real network environments. In the rest of this paper, we begin with related work in Section II. In Section III, we describe the system model. We propose the DREAM algorithm in Section IV. Next, in Section V, we evaluate the DREAM algorithm by measurement and trace-driven simulation. In Section VI, we design and implement DREAM architecture, and show the experimental results. Finally, we conclude this paper in Section VII. II. R ELATED W ORK Mobile cloud computing architecture. There have been extensive studies on energy efficient mobile cloud computing [4], [5], [6]. For example, in [4], Cuervo et al. suggested a mobile architecture which offloads thread-level computing codes to a cloud server. The main idea in this work is that the CPU intensive and high power consuming workloads in the mobile devices such as interactive games can be offloaded to a cloud server by consuming lower networking energy than the processing energy to serve computation workloads locally. However, they suggested offloading policies under static network environments and did not deal with all types of tasks (i.e., offloadable/non-offloadable workloads and network traffic) in

the mobile devices. Moreover, these studies did not take into account a physical distance from cloud server and mobile devices, thus many works have suggested the cloudlet solutions (see e.g., [12] and references therein) in which computing capabilities are brought closer to the access point to reduce latencies from the cloud server and the mobile devices. Optimal mobile cloud computing policy. A recent work [13] suggested an optimal mobile cloud computing algorithm of which main objective is to minimize smartphone energy consumption with given delay constraints. However, they did not take into account real CPU energy model of modern smartphones as well as real wireless environments, but they only considered simple probability-based good/bad channel model. There have been studies to minimize costs (e.g., mobile energy or electricity bill) of mobile subscribers and cloud service providers by adopting pricing mechanisms between them [14], [15]. Especially, Kim et al. [14] revealed that if they cooperate with each other, dual-side optimization for mobile users and a cloud service provider makes them reduce total costs as compared to non-cooperation scenario. Practical mobile cloud system. Some papers [16], [17] studied the impact of mobile cloud system on the performance and energy efficiency of contemporary smartphones under current network environment. For example, by carrying out experiment under real mobile environment, Barbera et al. [17] found that there is a tradeoff between freshness of synchronization and battery lifetime due to tail energy effect of smartphones where the tail energy means energy consumption after transmitting or receiving data for a few seconds [18]. They made a contribution toward integrating synchronization and computation applications into single mobile cloud architecture. However, they did not optimize synchronization intervals as well as computation offloading policy in terms of energy minimization. CPU speed scaling and network interface selection. There have been extensive studies to energy-efficiently control the CPU clock speed and network interface [19]–[22]. For example, Ra et al. [19] suggested a delayed network interface selection algorithm by exploiting a tradeoff relationship between transmit power of network interfaces such as 3G or WiFi and transmission delay. Also, Kwak et al. [22] developed a joint CPU clock speed control and network interface selection scheme so as to minimize CPU and network interface energy. They demonstrated that joint optimization of CPU clock speed and network interface selection more saves 42% of total energy than independent optimization. However, they did not take into account mobile cloud computing environments in the system model, the latest network technology (4G, LTE) and service platforms in the simulation and experiment. III S YSTEM M ODEL A. Task and Traffic Arrival Model Fig. 2 illustrates our framework for a mobile cloud system. We classify smartphone tasks into three types: non-offloadable (processing) workload (PA), cloud-offloadable workload (CA) and network traffic (NA). PA and NA can exploit only one type of resources (CPU for PA and network for NA), whereas CA can exploit either CPU resources or network

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

resources5 . We assume a time-slotted system indexed by t = {0, 1, . . .} where the length of a time slot is t (in sec). At each time slot, A(t) = (A P (t), AC (t), A N (t)) (in bits) of workloads or traffic is arrived for PA, CA and NA, and A(t) is independent and identically distributed in every time slot and E [A P (t)] = λ P , E [ AC (t)] = λC , E [ A N (t)] = λ N . We assume that all arrivals every time slot are bounded as follows. A P (t) ≤ A P,max , AC (t) ≤ AC,max , A N (t) ≤ A N ,max

(1)

2513

the results after processing workloads can be either the local smartphone or the cloud server and the application code is assumed to be already present in both the mobile device and the cloud server. We denote the CA scheduling indicator for network at time slot t by θn (t) ∈ (0, 1), i.e., θn (t) = 1 when CA is scheduled, and 0 when NA is scheduled. We assume that the smartphone cannot schedule CA for both CPU and network resources at the same time, i.e., θc (t) + θn (t) ≤ 1. Note that this condition can be relaxed if CA can allow parallel computation. For the given model, we determine four control knobs (θc (t), θn (t), rc (t), l(t)) every time slot t.

B. Processing and Networking Model We consider a smartphone with single core CPU which handles multiple workloads sequentially6 . Typically, modern smartphone processors have the DVFS (Dynamic Voltage and Frequency Scaling) capability, so that processors can adjust CPU clock speed rc (t) ∈ {rc,1 , rc,2 , . . . rc,max } (in cycles/t) every time slot t. Each workload (in bits) requires a different amount of CPU processing resources per bit. For instance, a chess game application [4] has much higher computations per bit than an image retrieval application [8]. We denote this notion by processing density γ P for PA, γC for CA where the unit is cycles/bit7 . Also, either PA or CA is processed at each time slot in the CPU, and we denote the CA scheduling indicator for CPU at time slot t by θc (t) ∈ (0, 1), i.e., θc (t) = 1 when CA is scheduled, and 0 when PA is scheduled. We consider multiple network interfaces, say cellular and WiFi in a smartphone. The smartphone can select an interface to transfer data between cellular (L) and WiFi (W ), or do not use any interface (N ) for energy saving. We assume that cellular networks are always available whereas WiFi networks are intermittently available depending on the user mobility. Therefore, it chooses a decision among (W, L , N ) or (L , N ) depending on the location of the device, where we denote the network availability at slot t by B(t) ∈ {{W, L , N }, {L , N }}. The network speed at slot t, rn (l(t), t) (bits/t) is determined by the selected network interface l(t) ∈ B(t) and channel conditions8 . Note that the network speed rn (l(t), t) can be varied by dynamic channel condition even in the same network interface l(t). If CA workloads are served by network resources, the workloads are transmitted to the cloud data center9 . We assume that the computing speed of a cloud server is much faster than the maximum computing speed in a mobile device (rc,max  rcloud ) as in [6], so the delay for CA workloads is the queueing delay in the smartphone10 . We assume that the location of 5 CA can use cloud computing resources by transferring the computing workloads through wireless networks such as cellular or WiFi. 6 It can be easily extended to multi-core CPU. 7 We use homogeneous processing density for PA and CA for simplicity. 8 Note that the smartphone is not able to simultaneously transmit data through the cellular and WiFi networks, i.e., it does not support multi-homing technology. 9 In order to simplify the system model in terms of a mobile device, we ignore roundtrip delay between a mobile device and cloud server. For example, we can consider cloudlet systems [12]. 10 We can replace the infinite capacity assumption in cloud server into the finite capacity assumption by adding the processing delay in cloud server as a cost, but it may scarcely affect the cloud offloading results because we consider processing delay in a local computing.

C. Task Queue Model As illustrated in Fig. 2, there are three types of task queues (in bits) as follows:  + (1 − θc (t)) rc (t) + A P (t) , (2) Q P (t + 1) = Q P (t) − γP  + θc (t)rc (t) Q C (t+1) = Q C (t)− −θn (t)rn (l(t), t) +AC (t) , γC (3) Q N (t + 1) = [Q N (t) − (1 − θn (t)) rn (l(t), t) + A N (t)]+ , (4) where Q P (t), Q C (t) and Q N (t) denote the queue lengths of PA, CA, and NA at time slot t, respectively, and [x]+ = max(x, 0). The amount of served workloads from each queue (i.e., departure) at time slot t is determined by four control knobs (θc (t), θn (t), rc (t), l(t)). Because the unit of rc (t) in CPU part is cycles/t and that of queue lengths is bits, the amount of served workloads from the scheduled queue is rc (t) divided by processing densities γ P or γC for unit agreement. D. CPU and Network Energy Model Typical CPU energy consumption model is known as follows.   (5) E c (rc (t)) = αrc (t)x + β t where the exponent x ranges from 2 to 3, and α and β are parameters determined by the CPU model [23], [24]. To obtain realistic CPU power model parameters, we measure real CPU power consumption of contemporary 3G and LTE smartphones for various CPU clock frequencies in Section V. Most smartphones have both cellular and WiFi interfaces by default, so transmit energy of cellular or WiFi interface can be modeled as follows: E n (l(t)) = PL ,tx t for cellular interface (l(t) = L), E n (l(t)) = PW,tx t for WiFi interface (l(t) = W ) and E n (l(t) = 0 for no transmission (l(t) = N ) where PL ,tx , PW tx and t denote cellular transmit power, WiFi transmit power and duration of one time slot, respectively. Actually, transmit power of cellular or WiFi interface in mobile devices varies a little depending on the channel states. However, because it is tricky to model the exact power consumption of the network interfaces as a function of channel state, most of related studies to deal with network interface selection use constant transmit power model (see [19] and references therein).

2514

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

IV. DYNAMIC R ESOURCE AND TASK A LLOCATION A LGORITHM In this section, we formulate an optimization problem considering energy minimization with queue stability. Then, we develop an energy-efficient dynamic CPU and network resource and task allocation algorithm in mobile cloud systems, called DREAM. A. Problem Formulation Our objective for a mobile cloud system in Fig. 2 is to develop a joint control algorithm of scheduling indicators for CPU and network resources, CPU clock speed and network interface selection, (θc (t), θn (t), rc (t), l(t)), so as to minimize total energy consumption while processing or transmitting all workload arrivals within the capacity region11 . We formally state a long-term optimization problem as follows.   T −1 1  E {E c (rc (t)) + E n (l(t))} , (P): min lim (θ c ,θ n ,rc,l ) T →∞ T t=0

(6) s.t. lim sup t→∞

1 t

t−1 

E {Q P (τ ) + Q C (τ ) + Q N (τ )} < ∞, (7)

τ =0

θc (t) + θn (t) ≤ 1,

∀t

Next, we consider long term energy minimization problem. Then, we define Lyapunov drift plus penalty function where the penalty function is the sum of expected CPU and network energy consumption during time slot t (E{E c (rc (t))|Q(t)} + E{E n (l(t))|Q(t)}) as follows.  (L(t)) + V E {E c (rc (t)) + E n (l(t)) |Q(t)}

where V is an energy-delay tradeoff parameter. Then, our single objective is to minimize the objective function (11) every time slot t. Deriving an upper bound. We derive an upper bound using queueing dynamics (2)–(4) and bounds of workload arrivals (1), CPU speed and network speed assumed in Section III. Lemma 1: Under any possible control variables (θc (t), θn (t)) ∈ {(0, 0), (1, 0), (0, 1)}, rc (t) ∈ {rc,1 , . . . , rc,max } and l(t) ∈ B(t), we have: (L(t)) + V E {E c (rc (t)) + E n (l(t))|Q(t)} ≤ J + V E {E c (rc (t)) + E n (l(t))|Q(t)} − E {((1 − θc (t))rc (t)/γ P − A P (t)) Q P (t)|Q(t)}

 θc (t)rc (t) + θn (t)rn (l(t), t) − AC (t) Q C (t)|Q(t) −E γC − E {((1 − θn (t))rn (l(t), t) − A N (t)) Q N (t)|Q(t)} , (12)

(8)

where (θ c , θ n , rc,l )  (θc (t), θn (t), rc (t), l(t))∞ t=0 . The expectation in equation (6) is defined as the average value of the sum energy consumed in CPU and network interface during time slot t. The constraint (7) means that the average PA, CA and NA queue lengths should be finitely maintained12 , i.e., all arrived workloads should be served within finite time [11]. B. Algorithm Design We design the DREAM algorithm by invoking Lyapunov drift plus penalty technique [11] which has advantages in the sense that it does not require information about the distribution of workload arrivals and future network states, but only needs to know information of the current queue lengths and throughput. Making slot-by-slot objective. Our original objective is to minimize CPU and network energy consumption while stabilizing total task queues in (P). First, we define Lyapunov function and Lyapunov drift function as follows.

1 Q P (t)2 + Q C (t)2 + Q N (t)2 (9) L(t)  2  (L(t))  E {L(t + 1) − L(t)|Q(t)} (10) where Q(t) = {Q P (t), Q C (t), Q N (t)}. The Lyapunov function (9) is designed to fairly stabilize PA, CA and NA queues, i.e., the three queue lengths should be analogously maintained. 11 Note that the capacity region means a set of all workload arrivals of which the system is able to process or transmit within finite time. 12 In this formulation, we consider delay-tolerance of each task is the same for simplicity. We can easily take into account several types of delay tolerance by modifying this constraint.

(11)

where J =

1 2

2 rc,max γP

+

2 rc,max γC

2 + A2 . 2 + 2rn,max + A2P + AC N

Proof: The proof is presented in the Appendix A.  Deriving the DREAM algorithm. We demonstrate that the problem (11) for every time slot t has an optimal policy π ∗ from the following Theorem 1. Theorem 1: For any mean arrivals E{A P (t)} = λ P , E{AC (t)} = λC and E{A N (t)} = λ N within capacity region, (λ P , λC , λ N ) ∈ , there exists a stationary randomized control policy π ∗ which selects application schedulings θc (t), θn (t), CPU speed rc (t) and network interface l(t) every time slot t with satisfying the followings:   ∗ ∗ (1 − θc (t)π )rc (t)π E{A P (t)} = E (13) γP   ∗ ∗ θc (t)π rc (t)π π∗ π∗ + θn (t) rn (l(t) , t) (14) E{AC (t)} = E γC

∗ ∗ (15) E{A N (t)} = E (1 − θn (t)π )rn (l(t)π , t) ∗ ∗ ¯ P , λC , λ N ) E{E c (rc (t)π )} + E{E n (l(t)π )} = E(λ

(16)

¯ P , λC , λ N ) denotes a minimum of average CPU and where E(λ network energy to serve λ P , λC and λ N . Proof: It can be similarly proven with [22] using Caratheodory’s theorem [25].  Then, we develop the DREAM algorithm by finding controls (θc (t), θn (t), rc (t), l(t)) which minimize the left hand side of (12) every time slot, i.e., the algorithm makes the right hand side of (12) the smallest value which can be obtained from all possible stationary randomized control policies.

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

C. Dynamic REsource and task Allocation for energy minimization in Mobile cloud systems (DREAM) Our algorithm does not require to know any statistics about future wireless network states (available networks or channel states) and workload arrivals. The DREAM algorithm jointly controls task schedules in CPU and network  resources, CPU speed and network interface selection, θc∗ (t), θn∗ (t), rc∗ (t), l ∗ (t) every time slot t described as follows. DREAM Algorithm: At each time slot t. 1: if QγPP(t) > QγCC(t) then 2: if Q N (t) > Q C (t) then   3: Schedule PA and NA, θc∗ (t), θn∗ (t) = (0, 0). 4:  rc (t) rc∗ (t) = arg min V E c (rc (t)) − Q P (t) . γP rc (t)

(17)

5: l ∗ (t) = arg min {V E n (l(t)) − rn (l(t), t)Q N (t)} .

(18)

l(t)

6: 7:

else if Q N (t) ≤ Q C (t) then   Schedule PA and CA θc∗ (t), θn∗ (t) = (0, 1) and select CPU speed rc∗ (t) by (17).

8: l ∗ (t) = arg min {V E n (l(t)) − rn (l(t), t) Q C (t)} .

(19)

l(t)

9: else if QγPP(t) ≤ QγCC(t) then 10: if Q N (t) > Q C (t) then   11: Schedule CA and NA, θc∗ (t), θn∗ (t) = (1, 0). 12:  rc (t) ∗ rc (t) = arg min V E c (rc (t)) − Q C (t) . γC rc (t)

(20)

Select network interface l ∗ (t) by (18). else if Q N (t) ≤ Q C (t) then if Yc (t) + Yn (t) ≥ Z c (t) + Z n (t), then  Schedule PA and CA, θc∗ (t), θn∗ (t) = (0, 1). Select CPU speed rc∗ (t) by (17) and select network interface l ∗ (t) by (19). 18: else if Yc (t) + Yn (t) < Z c(t) + Z n (t) then 19: Schedule CA and NA, θc∗ (t), θn∗ (t) = (1, 0). 20: Select CPU speed rc∗ (t) by (20) and select network interface l ∗ (t) by (18). 21: end if 22: end if 23: end if

13: 14: 15: 16: 17:

where Yc (t), Yn (t), Z c (t) and Z n (t) are denoted as follows. Yc (t) = min (V E c (rc (t)) − (rc (t)/γC ) Q C (t)) , rc (t)

Yn (t) = min (V E n (l(t), t) − rn (l(t), t) Q N (t)) , l(t)

Z c (t) = min (V E c (rc (t)) − (rc (t)/γ P ) Q P (t)) , rc (t)

Z n (t) = min (V E n (l(t), t) − rn (l(t), t) Q C (t)) . l(t)

2515

The DREAM algorithm first schedules tasks in CPU and network resources, and then, determines CPU clock speed and network interface. Scheduling tasks is divided by two cases as follows:     (Case I) QγPP(t) > QγCC(t) ,Q N (t)>Q C (t) or QγPP(t) > QγCC(t) ,Q N (t)≤Q C   or QγPP(t) ≤ QγCC(t) ,Q N (t)>Q C (t) : The CA queue is not greater than both the PA and NA queues13 . The CA queue shares CPU resources and networking resources whereas the PA exploits only CPU resources and NA exploits only networking resources as shown in Fig. 2. The DREAM algorithm considers fairness among tasks in terms of queue lengths. In this case, therefore, urgent two tasks in terms of queue lengths are scheduled in CPU and network resources. For example, if the PA queue is greater than the CA queue and the CA queue is greater than the NA queue, PA is scheduled in CPU resources and CA is scheduled in network resources.   (Case II) QγPP(t) ≤ QγCC(t) ,Q N (t)≤Q C (t) : The CA queue is greater than both the PA and NA queues. Unlike (Case I), CA should be certainly scheduled because it is the most urgent task in terms of queue lengths. However, because CA is able to exploit either CPU resources or networking resources, we still need to choose one type of resources for CA between two resources. Because our objective is to minimize the objective function (11) at time slot t, we compare Yc (t) + Yn (t) and Z c (t) + Z n (t) in which the values are from the objective function for (CA,NA)14 schedules and (PA,CA) schedules. As a result, if Yc (t) + Yn (t) ≥ Z c (t) + Z n (t), we select (PA,CA) allocation, and otherwise, select (CA,NA) schedules. After task schedules, CPU clock speed and network interface are selected depending on the scheduled tasks. For example, if PA in CPU resources and NA in network resources are scheduled, CPU speed is determined by the simple optimization problem in (17). The first term with V of (17) can be interpreted as striving to reduce CPU energy consumption, and the second term without V of the objective function can be explained as trying to reduce PA queue. For the same V , if the queue lengths of processing workloads are relatively longer, i.e., the processing delay is longer, the algorithm from equation (17) would increase CPU clock speed in order to quickly reduce the queue lengths. Also, a network interface is selected by the optimization problem in (18). The controls of network part is similar with CPU part. The first term of (18) can be interpreted as striving to reduce network energy consumption and the second term of the objective function can be explained as trying to reduce NA queue. For the same V , if the queue lengths of network traffic are relatively shorter, the algorithm from equation (18) would select energy efficient interface rather than throughput efficient one. Also, an energy-delay tradeoff can be controlled by the single parameter V , i.e., as V becomes larger, energy consumption gets lower by trading longer delay15 . Proposed DREAM algorithm is able to outperform existing policies in both terms of energy consumption and delay because 13 We assume that the processing densities of PA and CA are the same (γ P = γC ) for ease of explanation. 14 (CA,NA): CA is for CPU resources, NA is for network resources. 15 We refer readers to [19], [26] for the practical V control. The authors in these works dynamically control V depending on the instantaneous delay.

2516

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

of following facts: (i) The DREAM algorithm jointly takes into account cloud offloading and CPU/network speed scaling (i.e., it considers all possible combinations of cloud offloading policies, CPU speeds and network interfaces) with dynamics of task arrivals. (ii) The existing cloud offloading policies determine the offloading only depending on the pre-determined CPU speed and network interface selection for fixed task arrivals.

TABLE I E NERGY M ODEL PARAMETERS IN N ETWORK AND CPU

TABLE II P ROCESSING D ENSITIES OF VARIOUS A PPLICATIONS

D. Theoretical Analysis The sum of PA, CA and NA queue lengths and the sum of average CPU and network energy consumptions of the DREAM algorithm can be upper bounded by the following Theorem 2. Theorem 2: Let t = {0, 1, . . . , T − 1}. Suppose there exist > 0 such that (λ P + , λC + , λ N + ) ∈ where

denotes the capacity region16 of PA, CA and NA arrival rates. Then, under DREAM algorithm, we have: lim sup T →∞

1 T

T −1 

E{Q P (t) + Q C (t) + Q N (t)} ≤

t=0

J + V E ∗ ( ) (21)

lim sup T →∞

T −1 1  J E{E c,D (t) + E n,D (t)} ≤ E ∗ ( ) + T V

(22)

t=0

where E c,D (t) and E n,D (t) denote CPU and network energy consumptions over time slot t when the proposed DREAM algorithm is adopted, and E ∗ ( ) denotes the optimal lower bound of sum of CPU and network energy consumption. Proof: The proof is presented in the Appendix B.  The equations (21), (22) can be explained as the energy-delay tradeoff. As the energy-delay tradeoff parameter V becomes smaller, the sum of average queue lengths gets shorter whereas the average energy consumption gets higher. On the other hand, as V becomes larger, the average energy consumption gets lower whereas the average queue lengths get longer. In practice, however, cellular interface consumes tail energy after transferring data for a few seconds whereas the WiFi interface does not consume the tail energy. Therefore, we demonstrate not only the performance results of proposed DREAM algorithm under real energy model including tail energy, but also the impact of various parameters on the tail energy at the next simulation and experiment sections.

V. T RACE AND DATASET D RIVEN S IMULATION A. Measurement, Traces and Datasets Real energy measurement. To obtain realistic parameters of the CPU and network energy models in Section III, we measure the energy consumptions of four smartphones (two LTE and two 3G smartphones). We measure the energy consumptions of CPU with various clock speeds17 and network 16 The set of PA, CA and NA arrival rates which the mobile device can transmit or process within a finite time. 17 To manually control CPU clock frequencies, the smartphone OS attains privileged control (i.e., root access) within the subsystem of Android.

interfaces, say 3G, LTE and WiFi using a Monsoon power monitor [27], which is widely used for energy measurements of mobile devices [7], [18], [22]. Because the power monitor is only able to measure the total power consumption of a mobile device, we disable other components such as GPS, display (and network in CPU measurement) to measure energy consumption of CPU or network interfaces. In CPU measurement, we keep the CPU in 100% utilization by injecting infinite-loop computing workloads. Then, we fit the measurement with the CPU energy function in (5) and summarize the parameters in Table I. We also summarize the transmit energy, tail energy and time in 3G, LTE and WiFi in Table I. Trace 1 (Throughput and WiFi availability). We develop a measurement application for smartphones to acquire network throughputs and WiFi availability. This application transfers 2 MB files to our server every 20 seconds and records WiFi availability. We only record WiFi APs to which users can transmit data to our server. The server calculates uplink throughputs by dividing the file size by the elapsed time. We get around the metropolitan areas for two weeks by carrying three smartphones; 3G only enabled, LTE only enabled, and WiFi only enabled, and obtain network throughputs and WiFi availability traces. Average uplink throughputs of 3G, LTE and WiFi are 0.76 Mbps, 5.85 Mbps and 3.01 Mbps, respectively, and the average WiFi temporal coverage is 63% in active hours (9:00 AM to 9:00 PM). Dataset 1 (File size of workloads and processing densities). To generate workload/traffic arrivals of each task, we use a YouTube video size distribution dataset in [28], where the average file size is 6.249 MB. We create the workload/traffic arrivals by generating Bernoulli arrivals with size given by a distribution taken from the YouTube dataset. Also, we calculate the processing densities of various computing applications in Table II, which are used in several studies [4]–[6], [22].

B. Simulation Setup We consider two scenarios: multi-type, where three types of tasks (PA, CA and NA) are simultaneously running on a smartphone, and single-type, where a single type of task (CA or NA) is running. We set the average arrival rates of PA, CA and NA as

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

2517

TABLE III C HARACTERISTICS OF E VALUATION A LGORITHMS

1 Mbps or 0.5 Mbps and assume that files arrive as a Bernoulli process. Minimum offloading unit for offloadable workloads is set to be 100 KB. We run simulations under several arrival rates and file size distribution scenarios based on the Dataset 1. We use Galaxy Note for LTE and Galaxy Nexus for 3G in Table I. We assume that both smartphones can use WiFi networks under WiFi coverage. The control intervals are set to be one second for task schedules (θc (t), θn (t)) and CPU clock speed selection (rc (t)), and 20 seconds for network interface selection (l(t))18 . It is reasonable setting because the associated network cannot be quickly changed due to the vertical handover delay between WiFi and cellular interfaces. We assume that the smartphone is able to exactly know the uplink throughput of the current time slot in this simulation. We refer readers to [19] for an example of the rate estimation. The method to estimate current uplink throughput in practice will be addressed in our experiment in Section VI. We run the simulation of a DREAM algorithm by taking into account the networking energy consumed in only current time slot including tail energy. We consider various metrics to analyze the performance of the DREAM algorithm: average energy consumption and delay in processing or transferring one file, delay fairness among different types of tasks, and proportion of the energy consumption. As a metric of delay fairness, We use Jain’s fairness index [30] which is defined as follows. 2 n i=1 ai (23) J(a1 , a2 , . . . an ) = n n · i=1 ai2 We compare existing mobile cloud offloading and mobile cloud backup algorithms [4], [6], [13], [17] with the DREAM algorithm. Characteristics of the DREAM and existing algorithms are demonstrated in Table III19 . The energy and delay constraints in Table III indicate that an algorithm offloads the workloads only when the constraints are satisfied. All algorithms except for DREAM select the CPU speed by a conventional DVFS scheme [1]20 . We set the default DVFS threshold as 10 MB. For MAUI, network resources are selected only when the all current workloads should be served within a delay constraint. For ThinkAir (one of four policies in [6]), network resources are selected only when the both delay and energy of network resources are less than that of CPU 18 Actually, WiFi scanning energy also consumes significant amount of energy if we continuously search the WiFi networks [19], [29]. However, due to this interval (20 seconds) for the network interface selection, we can ignore the WiFi scanning energy. 19 We denote by OAEP the optimal task execution policy in [13]. 20 In this scheme, CPU speed is set to be maximum when the amount of workloads is greater than a threshold, and proportional to workloads when it is less than the threshold which can be manually controllable.

Fig. 3. Comparison with existing algorithms in the multi-type case: avg. arrival rates (PA,CA,NA) = 0.5 Mbps, LTE+WiFi, processing densities (PA,CA) = 1000 cycles/bit, tradeoff parameter V = 0 ∼ 105 .

resources. OAEP selects resources between CPU and network of which energy efficiency (speed/energy) is greater than other resources. CDroid transfers the stacked traffic after each backup duration. Because the DREAM algorithm only compares (# of CPU clock speeds + # of network interfaces) of values every time slot21 , the overload to a mobile device (e.g., energy consumption) from the computation can be negligible. We run simulations during activity time (6:00 AM to 9:00 PM) everyday from Trace 1, and take the average of the results. C. Simulation Results We present our results by summarizing the key observations. 1) Comparison with existing algorithms. We compare existing schemes with the DREAM algorithm under both multi-type and single-type scenarios. Fig. 3 depicts the energy-delay tradeoff for the DREAM and existing algorithms in the LTE+WiFi (i.e., an LTE smartphone) and multi-type scenario22 . We find that the average energy saving of DREAM is 50% (60% for PA, 40% for CA, 40% for NA) by trading only 10 minutes delay. The energy saving comes from the facts that (a) CPU energy model is a super-linearly increasing function of CPU clock speed, and by smoothing CPU speed along with time slots, CPU energy can be saved, and (b) wireless channel conditions or network availability are time-varying, and by waiting for better 21 Indeed, a typical smartphone have 4-5 numbers of network interfaces at most (e.g., WiFi, LTE, 3G, Bluetooth, etc.), and 11 discrete levels of CPU clock speeds [22]. 22 For ThinkAir and OAEP, we run the simulation for different DVFS threshold (The ranges of threshold values are shown in Fig. 3).

2518

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

Fig. 4. Impact of dynamic resource selection on the energy-delay tradeoff in the single-type case (CA): avg. arrival rate CA=1Mbps, LTE+WiFi, V = 0 ∼ 105 .

Fig. 5. Impact of processing density on energy and delay performance, LTE+WiFi.

channel conditions or energy-efficient networks, the network energy can be saved. However, because they make the system be insensitive to the queue variation, average delay would be longer. DREAM outperforms other algorithms in terms of total energy saving, where the additional energy savings in DREAM compared to existing algorithms are 35% (9 min. delay) for MAUI, 46% (39 min. delay) for ThinkAir, 35% (29 min. delay) for OAEP, and 40% (12 min. delay) for CDroid. The reason why DREAM outperforms other algorithms is that DREAM jointly optimizes task schedules, CPU clock speed and network interface selection considering the time-varying wireless channel states, network availability and all types of workload arrivals. Moreover, because DREAM tries to fairly stabilize three tasks queues, delays in task types are more fairly distributed in DREAM than existing algorithms as shown in Fig. 3(b). Even in the single-type scenario, DREAM outperforms other algorithms in terms of energy and delay as shown in Fig. 6. This is mainly due to the fact (b) in the above paragraph. We obtain similar results in the 3G + WiFi case, and omit for brevity. 2) Impact of dynamic resource selection. To analyze the impact of dynamic resource selection on the performance, we consider two additional policies: (i) CA only selects CPU resources (i.e., θn (t) = 0, ∀t), and (ii) CA only selects networking resources (i.e., θc (t) = 0, ∀t). Other controls are the same as DREAM. As shown in Fig. 4, DREAM outperforms two modified policies for various processing densities in terms of energy and delay. This is due to the fact that in DREAM, CA workloads are dynamically served by either CPU or network

resource. It can be also seen in Fig. 5. As the processing density increases, the energy consumption of CPU part gets higher. Therefore, CA would more frequently select network resources than CPU resources because of energy-efficient selection, thus the CA consumes higher networking energy than CPU energy. The reason why the performance of DREAM with only network crosses over that of DREAM with only CPU for higher processing density is that higher processing density makes the smartphone consume higher CPU energy in processing unit bit whereas network energy does not be influenced by the processing density. For the same reason, the network energy even becomes lower than CPU energy for higher processing density in Fig. 5(b). We also observe that the performance of DREAM with only network crosses over that of DREAM with only CPU for higher processing density in 3G + WiFi case due to the higher network energy consumption per bit. 3) Impact of processing density. We verify an impact of processing density in multi-type and single-type scenarios (only CA is running) in Fig. 5. The description of Fig. 5(a) implies that as processing density is higher, PA and CA consumes higher CPU energy in processing same amount of workloads, thus the total energy and delay increase for all energy-delay tradeoff parameters. 4) Impact of file size and tradeoff parameter. Fig. 7(a) depicts the impact of arrival file size on the consumed proportions of energy types under the same arrival rate in the 3G + WiFi. As the normalized average file size becomes larger and the inter-arrival time becomes shorter, the proportion of network energy gets larger due to the increment of tail energy.

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

Fig. 6. Comparison with existing algorithms in the single-type case (CA/NA): avg. arrival rate (CA,NA) = 0.5 Mbps, LTE+WiFi, processing density (CA) = 1000 cycles/bit, tradeoff parameter V = 0 ∼ 105 .

2519

Fig. 8. Impact of WiFi temporal coverage on average energy consumption in the multi-type case: avg. arrival rate (PA,CA,NA) = 0.5 Mbps, processing density (PA,CA) = 1000 cycles/bit, 3G + WiFi, V = 0.

Fig. 7. Impact of file size and the tradeoff parameter on energy proportion in the multi-type case: avg. arrival file size (PA,CA,NA)= 6.249 MB when normalized avg. file size (S) is 1, avg. arrival rate (PA,CA,NA) = 0.5 Mbps, 3G + WiFi.

However, DREAM reduces the portion of tail energy by trading delay (i.e., using larger V ) as shown in Fig. 7(b) and reducing the number of tails from file bundling. We obtain similar simulation results in the LTE+WiFi case and omit for brevity. 5) Impact of WiFi temporal coverage. Fig. 8 demonstrates the impact of WiFi temporal coverage on the energy consumption in the 3G + WiFi. (i) As WiFi temporal coverage increases, NA consumes lower network energy (3G tx. + 3G tail + WiFi tx.). This is because that the WiFi interface is more energy efficient than the 3G interface in transferring the same amount of workloads. (ii) As WiFi temporal coverage increases, CA consumes higher network energy compared to CPU energy. This implies that CA more frequently offloads computation workloads to the cloud server because transferring the same amount of computation workloads through WiFi interface is more energy efficient than computing in local CPU. Because the transfer energy per bit of LTE is much lower than that of 3G, the impact of WiFi temporal coverage is smaller in the LTE+WiFi case than the results in the 3G + WiFi case. Note that the tail energy can be vaired by WiFi coverages or simulation traces because we do not optimize tail energy consumption in our algorithm.

VI. I MPLEMENTATION AND E XPERIMENT In this section, we design and implement DREAM architecture on an Android platform and analyze the performance of DREAM through real experiments using a Galaxy Note 2 smartphone under the Windows Azure cloud system.

Fig. 9. DREAM Architecture. TABLE IV E XPERIMENTAL PARAMETERS

A. Implementation of DREAM Architecture We design DREAM architecture which enables to run the proposed the DREAM algorithm on a real smartphone under a mobile cloud system as shown in Fig. 9. DREAM architecture consists of four major components: task manager (task classifier and task queue), CPU manager (CPU module), network manager (WiFi, cellular module, tail-time monitor and rate/connectivity monitor) and the DREAM algorithm. The task manager classifies tasks into PA, CA and NA, and manages their queues. The CPU manager adjusts CPU clock speed. The network manager measures rate and connectivity information and changes wireless connection between cellular and WiFi. Data rates are estimated by the weighted average throughputs

2520

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

Fig. 10. Comparison of experimental traces: queue lengths for all apps., network selection, CPU clock and measured power using a power monitor.

from past history of transmissions. After receiving input parameters (e.g., processing density, queue information and rate estimation) from other components, the DREAM algorithm finds a solution and informs the solution to other components. We implement our DREAM architecture on an Android platform. We use the Gaxaxy Note 2 (Android 4.4.2 and Linux kernel 3.0.31) with 1.6 GHz quadcore CPU and 2GB RAM. The smartphone attains root access to manually control the CPU clock frequency. We first implement the network manager (e.g., wireless connection selector, tail time and rate/connectivity monitors), and then integrate it with other components including the DREAM algorithm, the task manager and the CPU manager using an Android software development kit (SDK). Fig. 11. Experimental results.

B. Real Experiment In order to show that how much energy and delay gains are practically achieved compared to the case of the highest energy consumption, we run the real experiment using an Android smartphone (Galaxy Note 2) which adopts our DREAM architecture. Setup. We install DREAM architecture on an LTE-equipped Galaxy Note 2 smartphone and implement a virtual cloud server on the Windows Azure cloud system [3]. We connect the smartphone with a private WiFi AP, and the WiFi interface turns on and off periodically (e.g., every 10 min.) to capture intermittent availability. The smartphone is able to transfer the computationoffloadable workloads or network traffic to the cloud server. We use the CPU and network energy models in Table I. Because the DREAM algorithm is designed under a single core system, we activate only one CPU core in our experiments. The experimental parameters are summarized in Table IV. We measure the power consumption using a Monsoon power monitor. For all experiments, displays are turned off. Through the experiments, we analyze the performance gains of the DREAM algorithm by comparing with a baseline algorithm: (i) CPU alternately schedules PA and CA by round-robin and network schedules NA without delaying transmissions, (ii) CPU speed is always

set to be the maximum (1.6 GHz), and (iii) if WiFi network is available, WiFi is selected, and otherwise, cellular (LTE) is selected. We run experiments 10 times and take the average. Observations. We present our results by summarizing the key observations. Fig. 10 illustrates the traces of queue lengths, network selection, CPU clock and power measurement in four cases (baseline and DREAM with V = 0, 3 × 103 , 104 ) for one hour, by selecting traces with similar throughputs. (i) The DREAM algorithm with any V value (V = 0, 33 × 103 , 104 ) outperforms the baseline algorithm in terms of average queue lengths, queue stability and fairness among different types of tasks. This is due to the fact that DREAM tries to not only fairly reduce queue lengths but also fully exploit the variations of network conditions and workload arrivals. (ii) As V becomes larger, average queue lengths increase whereas CPU clock speed gets lower and no wireless interface (N ) is more frequently selected. As a result, measured power consumption becomes lower in larger V . These results demonstrate that our DREAM implementation on an Android platform is properly operating over the commercial cloud server which in turn validates our intention. Fig. 11 depicts the experimental results including power meter traces, average power and queue length of all tasks for

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

baseline and the DREAM algorithms with V = 0, V = 3 × 103 and V = 104 . Fig. 11(a) re-plots the power consumption traces in Fig. 10. (i) DREAM with V = 3 × 103 significantly reduces average energy consumption (by 24.4%) and delay (by 83%) compared to the baseline algorithm. This is because that task schedules, CPU speed control and network interface selection are adaptively adjusted by taking into account dynamics of workload arrivals and network environments. (ii) As the energydelay tradeoff parameter V becomes larger, more energy can be saved. Especially, we observe a good energy-delay tradeoff point where the energy consumption is drastically reduced (by 27%) with small increment of delay (about 20 sec.) (V = 3 × 103 ). This is due to the same reason with the simulation results 1) in Section V.

As a way to reduce high CPU and network energy consumption of contemporary smartphones, we studied energy efficient mobile cloud offloading policy and CPU/network speed scaling in mobile cloud systems. To tackle the complex resource interference problem due to the resource sharing in dynamic mobile cloud offloading, we proposed a joint CPU and network resource and task allocation, called DREAM, considering all types of tasks (processing, cloud-offloadable, and networking tasks). Through trace-driven simulation, we demonstrated that DREAM not only has high energy saving by trading small delay but also fairly schedules different types of tasks in terms of delay. We also demonstrated the applicability of DREAM in practice by implementing it on an Android platform. We believe that the joint optimization of cloud offloading and CPU/network speed scaling would be imperative, especially in terms of mobile energy saving, as mobile cloud computing gets prevalent and wireless networks become more heterogeneous in the future; moreover, the optimization can be an effective way toward making green network infrastructure by offloading the cloud-offloadable workloads and network traffic from energy-inefficient network to energy-efficient network in time and space. A PPENDIX A. Proof of Lemma 1. Proof: Let us consider queueing dynamics of PA in (2). By taking square on (2) and using the fact that ([X ]+ )2 ≤ X 2 , we have:

2 (1 − θc (t))rc (t) + A P (t) Q 2P (t + 1) = Q P (t) − γP

(1 − θc (t))rc (t) ≤ Q 2P (t) − 2 − A P (t) Q P (t) γP

(1 − θc (t)) 2 + A P (t) − γP

(1 − θc (t))rc (t) 2 ≤ Q P (t) − 2 − A P (t) Q P (t) γP + A2P,max +

γ P2

,

where the second inequality of (24) is from the bounds of arrival rate of PA and CPU speed. Then, by arranging (24), we have:

(1 − θc (t)) − A P (t) Q P (t) Q 2P (t + 1) − Q 2P (t) ≤ −2 γP + A2P,max +

2 rc,max

(25)

γ P2

Similarly, we obtain the followings by repeating for CA and NA. θc (t)rc (t) 2 2 Q C (t + 1) − Q C (t) ≤ −2 + θn (t)rn (l(t), t) γC

2 rc,max 2 2 + rn,max + AC,max (26) − AC (t) Q C (t) + γC Q 2N (t + 1) − Q 2N (t) ≤ −2 ((1 − θn (t))rn (l(t), t)

VII. C ONCLUSION

2 rc,max

2521

(24)

2 −A N (t)) Q N (t) + rn,max + A2N ,max

(27)

By summing over (25)–(27) and using CPU and network energy consumption at time slot t, we obtain the upper bound of following Lyapunov drift plus penalty function. (L(t)) + V E {E c (rc (t)) + E n (l(t), t)|Q(t)} ≤ J + V E {E c (rc (t)) + E n (l(t), t)|Q(t)}

 (1 − θc (t))rc (t) − A P (t) Q P (t)|Q(t) −E γP

 θc (t)rc (t) + θn (t)rn (l(t), t) − AC (t) Q C (t)|Q(t) −E γC (28) − E {((1 − θn (t))rn (l(t), t) − A N (t)) Q N (t)|Q(t)}

where

J = 21

2 rc,max γP

+

2 rc,max γC

2 2 +2rn,max +A2P,max +AC,max +A2N ,max

. From

A P (t) ≥ 0, AC (t) ≥ 0, A N (t) ≥ 0, this completes the proof of Lemma 1.  A. Proof of Theorem 2. Proof: First, we prove average queue bound (21). Because (λ P + , λC + , λ N + ) ∈ , following relationships can be shown using Theorem 1 that there exists a stationary and randomized policy π ∗ . ∗



E{E c (rc (t)π )} + E{E n (l(t)π )} = E ∗ ( ),   ∗ ∗ (1 − θc (t)π )rc (t)π − , E{A P (t)} = E γP  ∗ ∗ θc (t)π rc (t)π E{AC (t)} = E γC  ∗



+ θn (t)π rn (l(t)π , t) − , ∗



E{A N (t)} = E{(1 − θn (t)π )rn (l(t)π , t)} − ,

(29) (30)

(31) (32)

where E ∗ ( ) = E c (λ P + φλC + (1 + φ) ) + E n ((1 − φ)λC + λ N + (2 − φ) ) is the optimal average sum of CPU and network energy to serve (1 − φ)λC + λ N + (2 − φ) amount of

2522

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 12, DECEMBER 2015

arrival rate where φ denotes average CPU service rate portion of cloud-offloadable workloads (CA). The sum of processing rate of CPU part is λ P + + φ(λC + ) and the sum of networking rate of network part is λ N + + (1 − φ)(λC + ). By applying the above equations (29)–(32) in Lemma 1 and arranging, we have:

(L(t)) + V E E c,D (t) + E n,D (t)|Q(t) ≤ J + V E ∗ ( ) − E{Q P (t) + Q C (t) + Q N (t)|Q(t)}. (33) By taking expectations on (33), we can use the law of iterated expectations23 , we have: E{L(t + 1) − L(t)} + E{Q P (t) + Q C (t) + Q N (t)} ≤ J + V E ∗ ( ) − V E D (t),

(34)

where E D (t) = E c,D (t) + E n,D (t). By dividing both sides of (34) with and summing over t = {0, 1, . . . , T − 1}, we have: T −1

E{L(t) − L(0)}  E{Q P (t) + Q C (t) + Q N (t)} + t=0 T −1 D J T + V E ∗ ( )T − V t=0 E (t) ≤ .

(35)

By dividing both sides of (35) with T , and using the fact that L(t) ≥ 0, and taking as T → ∞, this completes the proof of queue bound (21). Next, we prove average energy bound (22). By applying (Q P (t) + Q C (t) + Q N (t)) ≥ 0 on (33), we obtain following inequality. (L(t)) + V E{E c,D (t) + E n,D (t)} ≤ J + V E ∗ ( ).

(36)

By summing (36) over t = {0, 1, . . . , T − 1} and using the fact that L(t) ≥ 0, and dividing it with V T , this completes the proof of energy bound (22). 

R EFERENCES [1] Droidphile. (2011, Nov.). Kernel Governors, Modules, I/O Scheduler, CPU Tweaks AIO App Configs [Online]. Available: http://forum.xdadevelopers.com/showthread.php?t=1369817 [2] Cisco, San Jose, CA. (2015). Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update [Online]. Available: http:// www.cisco.com/c/en/us/solutions/collateral/service-provider/visualnetworking-index-vni/white_paper_c11-520862.html [3] Microsoft. (2015). Cloud Computing Service, Windows Azure [Online]. Available: http://azure.microsoft.com/ [4] E. Cuervo, A. Balasubramanian, and D. Cho, “MAUI: Making smartphones last longer with code offload,” in Proc. ACM MobiSys, San Francisco, CA, USA, Jun. 2010, pp. 49–62. [5] B. Chun, S. Ihm, and P. Maniatis, “CloneCloud: Elastic execution between mobile device and cloud,” in Proc. ACM EuroSys, Salzburg, Austria, Apr. 2011, pp. 301–314. 23 E(Y ) = E{E(Y |X )}

[6] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “ThinkAir: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading,” in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 945–953. [7] J. Huang, F. Qian, A. Gerber, Z. Mao, S. Sen, and O. Spatscheck, “A close examination of performance and power characteristics of 4G LTE networks,” in Proc. ACM MobiSys, Low Wood Bay, U.K., Jun. 2012, pp. 225–238. [8] K. Kumar and Y. Lu, “Cloud computing for mobile users: Can offloading computations save energy?,” Computer, vol. 43, no. 4, pp. 51–56, 2010. [9] Dropbox. (2015). Cloud Application, Dropbox [Online]. Available: http://www.dropbox.com/ [10] E. Dahlman, S. Parkvall, and J. Skold, 4G LTE / LTE-Advanced for Mobile Broadband. New York, NY, USA: Academic, 2011. [11] M. Neely, “Stochastic network optimization with application to communication and queueing systems,” in Synthesis Lectures on Communication Networks. San Rafael, CA, USA: Morgan & Claypool, 2010, pp. 1–211. [12] M. Satyanarayanan, Z. Chen, K. Ha, W. Hu, W. Richter, and P. Pillai, “Cloudlets: At the leading edge of mobile-cloud convergence (invited paper),” in Proc. MobiCASE, Austin, TX, USA, Nov. 2014, pp. 1–9. [13] Y. Wen, W. Zhang, and H. Luo, “Energy-optimal mobile application execution: Taming resource-poor mobile devices with cloud clones,” in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 2716–2720. [14] Y. Kim, J. Kwak, and S. Chong, “Dual-side dynamic controls for cost minimization in mobile cloud computing systems,” in Proc. WiOpt, Bombay, India, May 2015, pp. 443–450. [15] S. Ren and M. Schaar, “Dynamic scheduling and pricing in wireless cloud computing,” IEEE Trans. Mobile Comput., vol. 13, no. 10, pp. 2283– 2292, Oct. 2014. [16] M. Barbera, S. Kosta, A. Mei, and J. Stefa, “Offload or not to offload? The bandwidth and energy costs of mobile cloud computing,” in Proc. IEEE INFOCOM, Turin, Italy, Apr. 2013, pp. 1285–1293. [17] M. Barbera, S. Kosta, A. Mei, V. C. Perta, and J. Stefa, “Mobile offloading in the wild: Findings and lessons learned through a real-life experiment with a new cloud-aware system,” in Proc. IEEE INFOCOM, Toronto, ON, Canada, Apr. 2014, pp. 2355–2363. [18] J. Lee, K. Lee, Y. Kim, and S. Chong, “PhonePool: On energy-efficient mobile network collaboration with provider aggregation,” in Proc. IEEE 11th Annu. Int. Conf. Sens. Commun. Netw. (SECON), Singapore, Jul. 2014, pp. 564–572. [19] M. Ra, J. Peak, A. Sharma, R. Govindan, M. Krieger, and M. Neely, “Energy-delay tradeoffs in smartphone applications,” in Proc. ACM MobiSys, San Francisco, CA, USA, Jun. 2010, pp. 255–270. [20] P. Shu et al., “eTIME: E energy-efficient transmission between cloud and mobile devices,” in Proc. IEEE INFOCOM, Turin, Italy, Apr. 2013, pp. 195–199. [21] L. Xiang, S. Ye, Y. Feng, B. Li, and B. Li, “Ready, set, go: Coalesced offloading from mobile devices to the cloud,” in Proc. IEEE INFOCOM, Toronto, ON, Canada, Apr. 2014, pp. 2373–2381. [22] J. Kwak, O. Choi, S. Chong, and P. Mohapatra, “Processor-network speed scaling for energy-delay tradeoff in smartphone applications,” IEEE/ACM Trans. Netw., Apr. 2015. [23] K. Son and B. Krishnamachari, “Speedbalance: Speed-scaling-aware optimal load balancing for green cellular networks,” in Proc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 2816–2820. [24] M. Andrews, A. Anta, L. Zhang, and W. Zhao, “Routing for energy minimization in the speed scaling model,” in Proc. IEEE INFOCOM, San Diego, CA, USA, Mar. 2010, pp. 1–9. [25] L. Georgiadis, M. Neely, and L. Tassiulas, “Resource allocation and cross-layer control in wireless networks,” Found. Trends Netw., vol. 1, no. 1, pp. 1–149, 2006. [26] Y. Cui, S. Xiao, X. Wang, M. Li, H. Wang, and Z. Lai, “Performanceaware energy optimization on mobile devices in cellular network,” in Proc. IEEE INFOCOM, Toronto, ON, Canada, Apr. 2014, pp. 1123– 1131. [27] Monsoon Solutions Inc., “Power meter device: Monsoon power monitor,” [Online]. Available:https://www.msoon.com/LabEquipment/ PowerMonitor/. [28] X. Cheng, C. Dale, and J. Liu. (2008). Dataset for Statistics and Social Network of YouTube Videos [Online]. Available: http://netsg.cs.sfu.ca/ youtubedata/ [29] J. Jeong, Y. Yi, J. Cho, D. Eun, and S. Chong, “Wi-Fi sensing: Should mobiles sleep longer as they age?,” in Proc. IEEE INFOCOM, Turin, Italy, Apr. 2013, pp. 2328–2336. [30] R. Jain, The Art of Computer Systems Performance Analysis. Hoboken, NJ, USA: Wiley, 1991.

KWAK et al.: DREAM: DYNAMIC RESOURCE AND TASK ALLOCATION FOR ENERGY MINIMIZATION

Jeongho Kwak (S’11–M’15) received the B.S. degree (summa cum laude) in electrical and computer engineering from Ajou University, Suwonsi, South Korea, and the M.S. and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2008, 2011, and 2015, respectively. He joined the INRS-EMT, Montréal, QC, Canada, and the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada, in October 2015, where he is currently a Postdoctoral Researcher. Previously, he was a Postdoctoral Researcher at KAIST in 2015. His research interests include mobile cloud offloading systems, energy efficiency in mobile systems, green cellular networks, and radio resource management in wireless networks. He was the recipient of the Samsung HumanTech Thesis Prizes in 2013, 2014, and 2015 (Bronze, Siver, and Gold Prizes in Communication and Networks Area, respectively), the KAIST-LG Electronics 5G Best Paper Award in 2014, and the Qualcomm Innovation Award in 2015.

Youngjin Kim (S’12) received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2011 and 2013, respectively. He is currently pursuing the Ph.D. degree at KAIST. His research interests include mobile opportunistic networks, collaborative sensing, and mobile cloud computing.

Joohyun Lee (S’11–M’14) received the B.S. and integrated M.S./Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2008 and 2014, respectively. He is currently a Postdoctoral Researcher with the Department of Electrical and Computer Engineering, Ohio State University, Columbus, OH, USA. His research interests include context-aware networking and computing, mobility-driven cellular traffic offloading, energy-efficient mobile networking, protocol design and analysis for delay-tolerant networks, and network economics and pricing.

2523

Song Chong (S’93–M’95) received the B.S. and M.S. degrees from Seoul National University and the Ph.D. degree from the University of Texas at Austin, Austin, TX, USA, all in electrical engineering. He is a Professor with the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, and is the Head of the Computing, Networking, and Security Group, School of Electrical Engineering. He is the Founding Director of KAIST-LGE 5G Mobile Communications and Networking Research Center funded by LG Electronics. Prior to joining KAIST, he was with the Performance Analysis Department, AT&T Bell Laboratories, Holmdel, NJ, USA, as a Member of Technical Staff. His research interests include wireless networks, mobile networks and systems, network data analytics, distributed algorithms, and cross-layer control and optimization. He is an Editor of the IEEE/ACM T RANSACTIONS ON N ETWORKING, the IEEE T RANSACTIONS ON M OBILE C OMPUTING , and the IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS. He was the Technical Program Committee Co-Chair of the IEEE SECON 2015 and has served on the Technical Program Committee of a number of leading international conferences including the IEEE INFOCOM, the ACM MobiCom, the ACM CoNEXT, the ACM MobiHoc, and the IEEE ICNP. He serves on the Steering Committee of WiOpt and was the General Chair of WiOpt 2009. He was the recipient of the IEEE William R. Bennett Prize in 2013, given to the best original paper published in the IEEE/ACM T RANSACTIONS ON N ETWORKING in 2011–2013, and the IEEE SECON Best Paper Award in 2013.