An Interposed 2-Level I/O Scheduling Framework ... - Semantic Scholar

An Interposed 2-Level I/O Scheduling Framework for Performance Virtualization ∗

Jianyong Zhang∗ Anand Sivasubramaniam∗ Alma Riska† Qian Wang∗ Erik Riedel† ∗ The Pennsylvania State University, University Park, PA † Seagate Research Center, Pittsburgh, PA {jzhang,anand}@cse.psu.edu, [email protected], {alma.riska,erik.riedel}@seagate.com Categories and Subject Descriptors: D.4.2 [Operating Systems]: Storage Management; D.4.8 [Operating Systems]: PerformanceOperational analysis, Monitors, Measurements.

interval of a pre-determined length, i.e., 1 second in our evaluation. Our approach achieves virtualization by introducing a layer on top of the storage utility which uses very little information about its underlying implementation. It treats the storage utility as a black box, which is referred to as an “interposed ” [1, 3, 4, 5] scheduler between the user requests and the underlying storage utility as depicted in Figure 1 (a). This additional layer acts as a QoS gateway between the user requests and the storage utility, and it can be located in a gateway that exists in a large storage system for traffic management. Gateway examples include the Logical Volume Manager and the SAN virtualization switch [1]. In our framework, the higher level mechanism uses a creditbased rate controller (called SARC), to regulate the streams of requests so that they are insulated from each other. In addition, it gets an estimation of whether the utility is being under-utilized (called spareness status) from the lower level mechanism (called AVATAR) and tries to distribute this spare bandwidth in a reasonably fair manner across the classes. The lower level scheduler AVATAR uses feedback from monitoring the underlying storage utility to provide latency guarantees by regulating the amount of requests dispatched to the storage utility. AVATAR uses an EDF queue to first order requests based on their latency requirement, and then based on the relative criticality of latency versus throughput optimization criteria, it dispatches requests from the EDF queue to the storage utility. The mechanisms used in AVATAR are rigorous and based on a careful combination of statistical evaluation of the system and queueing theory results, which increases its flexibility and adaptability and sets it apart from similar approaches [5]. Furthermore, AVATAR provides to SARC an estimation of the storage utilization level (i.e., the spareness status). Figure 1(b) depicts the architecture of our framework.

General Terms: Algorithms, Performance, Management, Design Keywords: Storage Systems, I/O Scheduling, Quality of Service, Virtualization, Performance isolation, Fairness

1.

INTRODUCTION

I/O logic

I/O Arrivals c 1 c 2

...

...

cm FIFO

SARC Spareness Status

SARC+ AVATAR

EDF

AVATAR

High Level Low Level

C1 C2 Cm

Gateway

∗ [6]

Workload

Networked storage and large disk arrays are enabling consolidated storage systems. While such consolidation is attractive economically, sharing of the underlying storage infrastructure can lead to interference between the different applications/users and possible violation of performance-based service level objectives (SLO). The data centers aim to insulate the users from each other (i.e., referred to as performance virtualization), giving each user the impression that the storage utility is dedicated to them. Several approaches exist for performance virtualization. We categorize them as: (1) schemes that use a proportional bandwidth sharing paradigm, such as SFQ(D), FSFQ(D) [3], and CVC [2]; and (2) schemes that use feedback-based control, such as Triage [4], SLEDS [1], and Facade [5]. These schemes have various advantages, but suffer from one or more of the following drawbacks: (i) rely on a fairly detailed performance model of the underlying storage system to estimate the service time of an individual request [2, 3], (ii) couple rate and latency allocation in a single scheduler making them less flexible [2, 3], (iii) may not always exploit the full bandwidth offered by the storage system [1, 4], or (iv) may not provide good performance isolation in the overloaded situations [5]. Our scheme aim to provide both throughput and latency guarantees, because each of them is critical to many (if not all) applications. We propose an interposed 2-level scheduling framework that separates rate allocation from latency control rather than couple them together. In our system, incoming I/O requests are classified into different classes, with an SLO pre-determined for each class. Without loss of generality, we assume that each user belongs to a different class. The SLO is a tuple < R, D > for each class. R represents the maximum arrival rate with a latency guarantee D for a given class. If the arrival rate for that class is higher than R, then its throughput should be at least R, but there is no latency guarantee. In our approach, we use a statistical latency guarantee, that is the SLO requiring x% (95% in our evaluation) of all requests to be bounded by a latency of D. This SLO is enforced in each time

I/O Queue

(a)

contains the full-length version of this poster.

Storage utility (Black Box)

(b)

I/O Completion Storage Utility

Figure 1: (a) Overall System Model and (b) architecture of the 2-level scheduling framework

Copyright is held by the author/owner. SIGMETRICS’05, June 6–10, 2005, Banff, Alberta, Canada. ACM 1-59593-022-1/05/0006.

406

2.

THE 2-LEVEL FRAMEWORK

In the higher level of our 2-level framework (Figure 1(b)), upon arrival, an I/O request is tagged with its class id. If the corresponding FIFO queue has any available credit for the request class, then it is dispatched without delay to the low level EDF queue and consumes one credit; otherwise it is queued in the FIFO queue waiting until the next credit replenishment time. In the lower level, AVATAR decides when to dispatch the requests from the EDF queue to the storage utility and how many. Finally, the storage utility services the outstanding requests based on its own service discipline. Note that our scheme operates outside the storage utility and does not require any changes in it.

2.1 The High Level Controller: SARC SARC controls rate allocation for each class via a credit-based approach. For each class the maximum credit amount is TSARC · R. R is the class’s rate requirement, and TSARC is the maximum length of the time interval between two credit replenishment events. Upon arrival, one request consumes only one of the its class’s available credits. During a replenishment event, every class gets its maximum credit amount regardless of whether or not any class is devoid of its credits. SARC initiates credit replenishment if - a time interval of length TSARC has elapsed since the last replenishment event, - the storage utility has spare bandwidth and a new request arrives from a class with no available credit, or - AVATAR indicates that storage utility spareness status has changed and spare bandwidth is available. SARC’s synchronous replenishment provides fairness in the allocation of any spare bandwidth across the classes, and sparenessaware makes the scheme work-conserving.

2.2 The Low Level Controller: AVATAR

3. EVALUATION RESULTS

We depict the architecture of the low level controller of our scheme in Figure 2. AVATAR controls the flow between the EDF queue and the storage utility queue. The deadline of a request is set as its class’s latency requirement D. AVATAR combines feedback-based control and Little’s law-based bound analysis to periodically re-set system parameters. The most critical parameter set by AVATAR is the threshold of the storage utility queue length, which controls the load in the storage utility. It also acts like an indicator of the spareness status at the storage utility. C1 C2

AVATAR

Spareness Status

Here, we give an intuitive explanation of how the storage utility queue length threshold is controlled by AVATAR. If there is abundant available bandwidth in the storage utility to accommodate the deadlines of all outstanding requests, we optimize the system for maximal throughput and increase its queue length threshold, because deadlines will be met regardless of the service order at the storage utility queue. On the other hand, when the system starts missing deadlines, the emphasis is on meeting the latency requirements by shortening the storage utility queue, so that requests with earlier deadlines in the EDF queue are given higher priority and dispatched to the storage utility for service. However, if the system becomes overloaded 1 , a shorter queue at the storage utility decreases the storage utility throughput, which causes further cascading of deadline misses. In such extreme cases, the preference is to considerably increase the queue threshold. Actually in the overloaded state, we choose to set the queue threshold equal to infinity so that the system is optimized for throughput and is able to quickly transition to the under-loaded state. The algorithm that sets the queue length threshold is based on bound analysis guided by the Little’s law, and it explicitly detects overloaded periods (refer to [6] for details). Carefully queueing threshold setting allows AVATAR to quickly detect overloaded state and better adapt to highly fluctuated workloads. As mentioned in Section 2.1, it is of critical importance for AVATAR to maintain the spareness status, so that our approach remains workconserving. AVATAR sets the queue threshold so that the storage utility is fully utilized and the SLO requirements are not violated. We consider the queue threshold as the degree of concurrency at the storage utility. Thus if the number of actual outstanding requests is less than the degree of concurrency, we consider the storage utility to have spare bandwidth. If the system is in the overloaded state, then it is clear that the storage utility has no spare bandwidth.

Deadline Setting

We evaluate our framework via simulation-based analysis driven by both synthetic and real workloads. We use Disksim 2.0 as the detailed storage system simulator. The simulated underlying storage utility is a RAID 5 system with 8 Seagate Cheetah9LP 10K RPM disks. Our experiments show [6]: - SARC effectively provides performance isolation for high fluctuated workloads; - AVATAR detects overloaded periods in a timely manner, and quickly recovers from the transient overloaded condition. - SARC+AVATAR provides high utilization of the underlying storage utilities, (i.e higher than several other similar approaches). - SARC+AVATAR provides fairness among classes.

EDF

4. REFERENCES [1] D. Chambliss, G. Alvarez, P. Pandey, D. Jadav, J. Xu, R. Menon, and T. Lee.

... C

m

Monitor Stats Collection IO stats Controller

Dispatcher [2]

Qthreshold Stats Collection

[3]

Storage Utility

[4]

Figure 2: The architecture of the low level scheme - AVATAR [5]

AVATAR consists of three components: (i) the Monitor which collects various statistics from the underlying storage utility and the EDF queue; (ii) the Controller, which makes the decision about the periodical update of the queue threshold at the storage utility; (iii) the Dispatcher, which forwards requests from the EDF queue to the underlying storage utility guided by the queue threshold.

[6]

Performance virtualization for large-scale storage systems. In Proceedings of the Symposium on Reliable Distributed Systems (SRDS), October 2003. L. Huang, G. Peng, and T.-C. Chiueh. Multi-dimensional storage virtualization. In Proceedings of SIGMETRICS’04, June 2004. W. Jin, J. Chase, and J. Kaur. Interposed proportional sharing for a storage service utility. In Proceedings of SIGMETRICS’04, June 2004. M. Karlsson, C. Karamanolis, and X. Zhu. Triage: Performance isolation and differentiation for storage systems. In Proceedings of the 9th International Workshop on Quality of Service (IWQoS 04), 2004. C. Lumb, A. Merchant, and G. Alvarez. Facade: Virtual storage devices with performance guarantees. In Proceedings of the Conference on File and Storage Technology (FAST’03), pages 89–102, April 2003. J. Zhang, A. Sivasubramaniam, A. Riska, Q. Wang, and E. Riedel. An interposed 2-level I/O scheduling framework for performance virtualization. Technical Report CSE-05-003, CSE, PSU. http://www.cse.psu.edu/˜jzhang/tr-2level.pdf.

1 During an overloaded period, the storage utility cannot serve all requests whose deadlines lie in that period under SLO demands.

407