Caching architectures and optimization strategies for ... - IEEE Xplore

◆

Caching Architectures and Optimization Strategies for IPTV Networks Bill Krogfoss, Lev Sofman, and Anshul Agrawal This paper looks at the issue of optimization for caching architectures for Internet Protocol (IP) router networks for unicast video. There are several optimization challenges when designing caching architectures. These include where the caches should be located, how much memory is needed per cache, which services should be cached where, and whether hierarchical or single level caching should be used. This paper investigates topology issues and the difficulty of caching long tail content. We propose some solutions to cache architecture optimization and partitioning and look at the issue of caching long tail content. © 2008 Alcatel-Lucent.

Introduction Today’s Internet Protocol (IP) networks are being driven by unicast video applications, both on the Internet and on private IP networks like cable and telco Internet Protocol television (IPTV). The consumer hunger for on-demand or time-shifted TV content is creating a beneficial effect for network vendors and operators—putting more traffic on the network. However, as operators are facing substantial growth of their IP networks, they are looking to reduce transport costs, and distributed caching is now being considered. Moving video content closer to the subscriber will alleviate the cost of transporting video from centralized storage at the cost of storing that content closer to the subscriber. Note that there is far less multicast traffic than unicast traffic in the aggregation network (because of the traffic replication feature of multicast), and possibly even less in the access network because of the limited number of multicast channels any subscriber is tuned to concurrently. Caching can provide a significant reduction in traffic and economic costs; however, it is not obvious

for a given network which caching architecture will produce the most economic benefit. This paper will investigate the many considerations and complex problems associated with optimizing caching designs that result in a low cost network. The location of caches and memory per cache, deploying hierarchical caching, and even methods to partition cache memory for multiple services are not easy problems. The benefit of caching depends on the cache effectiveness or hit rate—that is, the percentage of requests likely to be served from the cache. The popularity distribution of objects (e.g., channels or video titles) within a service governs which objects are most suitable for caching. Since popularity distributions will likely be different for each service, caching the same number of objects for two services will result in two different cache hit rates. The popularity of objects within a service is also a function of time—what is most popular today may not be as popular tomorrow—thus effective caching systems will need to measure the relative popularity of objects constantly. Nielsen* Media Research measures viewing behavior for

Bell Labs Technical Journal 13(3), 13–28 (2008) © 2008 Alcatel-Lucent. • DOI: 10.1002/bltj.20320

Panel 1. Abbreviations, Acronyms, and Terms CAPEX—Capital expenditure CDF—Cumulative distribution function CMF—Cumulative mass function CO—Central office DSLAM—Digital subscriber line access multiplexer DVR—Digital video recorder FCC—Fast channel change IO—Intermediate office IP—Internet Protocol IPTV—Internet Protocol television

United States (U.S.) broadcast TV channels and measures popularity in 15 minute intervals (thus intra show). In an IPTV network however, the inherent availability of real-time popularity measurements can be leveraged to predict more accurately (and in a timely manner) the most suitable objects of each service to cache, which in turn will lead to the highest possible hit rate. Ultimately, we hope to provide insight into designing and dimensioning networks with caches and identify the key considerations.

IPTV Network Video Traffic Growth Is Driving Caching Traffic forecasts for IPTV operators and other video operators (e.g., cable) show a dramatic growth of unicast video, such that it dominates all other traffic, at rates greater than 90 percent. Unicast video includes services such as fast channel change (FCC), video-on-demand (VoD), network-based personal video recording (NPVR), and pause live television (PLTV). Broadcast TV can be very efficiently delivered (in terms of bandwidth consumption) via the multicast mechanism. Since unicast video services offer subscribers the opportunity to watch what they want when they want, as opposed to being forced to choose from a channel programming lineup determined by broadcasters based on their own considerations, such growth forecasts for unicast video are not entirely surprising or counterintuitive. This growth of unicast video will be costly for operators, creating scalability issues with routers and exhausting interoffice fiber 14

Bell Labs Technical Journal

DOI: 10.1002/bltj

NPVR—Network-based personal video recording OTA—Over the air P2P—Peer-to-peer PDF—Probability density function PLTV—Pause live television PMF—Probability mass function Sc—Scenario STB—Set-top box VHO—Video hub office VoD—Video-on-demand ZM—Zipf-Mandelbrot

spans. A typical metropolitan area of 500,000 subscribers can generate 100 Gbps of traffic between aggregation offices, and multiple terabits per second of traffic at the video hub office (VHO), as shown in Figure 1. Today’s IPTV networks are deployed in a tree fashion with multiple aggregation levels that converge at a central location called the video hub office. This centralized location maintains all the video content and streams all unicast video requests. The centralized nature of this architecture is beneficial for operational reasons, but the obvious drawback of it is that all the downstream links are burdened with this video. Figure 1 illustrates the typical IPTV architecture, with three aggregation levels—digital subscriber line access multiplexer (DSLAM), central office (CO), and intermediate office (IO). Today, 100 percent of the video requests are served centrally from the VHO. However, carriers recognize that with the growing popularity of on-demand video, a distributed service model (as opposed to the current centralized one) will be needed. Such a distributed service model may use caches deployed farther out from the VHO to serve the most popular content. Caching can really be deployed at any level in the network from the IO to the DSLAM, but the trade-off between transport link and cache module costs must be carefully considered. As the cache is moved closer to the subscriber, the economic benefits grow due to the reduction in traffic carried by all links upstream

Tbps DSLAM R1

CO

IO

R2

VHO 100's Gbps

10's Gbps

Gbps

R3

Rn

N subs R1

R2

1000's

100s

10s

Ci

Ci

Ci

Video storage

CO—Central office DSLAM—Digital subscriber line access multiplexer

1

Push content To distributed caches

IO—Intermediate office VHO—Video hub office

Figure 1. Typical IPTV architecture.

of it. However, this comes at the price of increasing the number of cache modules—in some cases, operators have about 100 DSLAMs connected to each CO, leading to thousands of DSLAMs in a metro area served by a single VHO. Conversely, moving the caches back toward the VHO decreases caching costs rapidly but requires more traffic to be carried by all the links downstream of the caches. There are many possibilities to consider: single level caches deployed at DSLAM, CO, or IO versus caches deployed at multiple levels; the amount to cache at each location; and which service (or services) to cache. This “cache optimization” problem will be examined later when we investigate a method for determining the optimal cache solution for a given network. IPTV Unicast Video Services Four unicast services are considered in our subsequent analysis: fast channel change (FCC), networkbased personal video recording, video-on-demand, and pause live TV. FCC is a new service required for IPTV networks in order to provide a channel surfing experience comparable to that when viewing over the air (OTA) terrestrial TV channels. Cable and satellite subscribers may experience response times of several seconds to channel change requests. FCC responds to channel change requests “instantly,” putting the image

of the requested channel on the screen within a subsecond time frame, as a goal. The quick response is achieved with a unicast burst (which presents the “I” frame first) of the requested channel. This burst lasts a few to tens of seconds and consumes more bandwidth (e.g., 1.3 times a channel’s multicast stream), until the set-top box (STB) decoder can resynch with the channel’s multicast stream. So while this is an attractive feature, it does have a cost: the high bandwidth generated by the service. Unlike the broadcast TV service which is delivered by the very efficient multicast mechanism—where bandwidth consumed is fixed and proportional only to the total number of channels offered—the high bandwidth required by the FCC service is due to the likely high concurrency of requests for it as people tend to change channels at the same time (e.g., during 30 minute or 1 hour boundaries or during commercials). NPVR records broadcast content similarly to the way consumer digital video recorders (DVRs) operate today. Operators believe that centralizing the storage for NPVR will reduce operational costs and potentially capital expenditure (CAPEX) costs. A typical NPVR service will record all channels for a period of 7 to 14 days; this is contrasted with a consumer DVR that can only record 100 hours of programming. VoD is self-explanatory; in our analysis

DOI: 10.1002/bltj


15

we have assumed 5,000 titles are offered to the subscriber. PLTV is really a variant of NPVR. PLTV allows subscribers arriving home 15 minutes after a show begins to rewind to the beginning of the program or to interrupt their program for a phone call or popcorn run. PLTV enables subscribers to “pause” live shows, rewind, and/or fast-forward. However, it is different from NPVR in that shows are flushed from the buffer every 1 to 2 hours. Implementation of PLTV requires a circular buffer which keeps 1 to 2 hours of stored broadcast content for all channels. Conceptually, it does not look remarkably different from NPVR, which has a 7 or 14 day circular buffer. Of course PLTV storage requirements would be much smaller—about 500 GB are needed per hour of recording. It should be noted that the ability to rewind a 1 hour program would require 2 hours of buffering. For example, a subscriber who arrives home toward the end of a show and restarts the buffer will require an additional 1 hour of storage to complete. Modeling popularity of IPTV video using Zipf and Zipf-Mandelbrot. Cache effectiveness is defined as the hit rate, i.e., the ratio of traffic that is served from the cache to the total amount of requested traffic. Hit rate depends on size and throughput of the cache, statistical characteristics of the traffic (e.g., popularity distribution of different titles, their bit rate, size in memory), and caching algorithm. If an ideal caching algorithm is assumed, the hit rate can be directly obtained from the popularity distribution. Popularity is an important aspect in understanding the usage of services and how they will impact the network, and impact caching in particular. Broadcast television has a well-established popularity model which is based on the Zipf distribution. This is simply C/ka, where k is the numerical ranking of the objects (in this case channels) and (alpha) is a power value which describes the steepness of the curve, and C a normalization constant which depends on alpha and on the number of objects. Many studies in the research community look at the issue of popularity and how to model it properly; we have tried to correlate them to describe our services of interest. For examples, see references [2, 3, 5, 8].

16


DOI: 10.1002/bltj

While Zipf has worked well for describing traditional broadcast, as well as Web pages, other services such as video rentals and peer-to-peer (P2P) files based on real-world measurements do not follow the Zipf curve. This is explained by a behavior disincentive with video rental and P2P. Since subscribers have to pay for video rentals, they are selective and do not tend to watch movies more than once, while with P2P files there is no incentive to download them more than once [4]. However, broadcast channels or Web pages may be viewed multiple times as viewers “surf” or “browse” through before selecting. This latter behavior tends to increase the popularity of the more popular files, while the former behavior tends to flatten out the popularity of these most popular titles. While the browse services are adequately described by the Zipf distribution, the “watch or get once” services are better described by a modification of the Zipf distribution called the Zipf-Mandelbrot (ZM). ZM adds an additional “shift” factor q which accounts for the watch/get once behavior for the most popular titles. We describe the ZM probability mass function (PMF) using a normalization constant “C”—Zipf-Mandelbrot PMF C / (k q)a. Another factor that impacts the popularity profile (or curve) is the total number of objects to choose from. As the number of items to select from increases, the curve tends to be flatter as increasing user choice results in a more distributed selection. This has been described as the long-tail effect [1, 7]. Intuitively, if we imagine 100 subscribers are given 10 movies to choose from and then the same 100 subscribers are allowed to choose from 90 more titles (100 total, including the original 10), the latter popularity distribution is likely to be flatter than the former. We have categorized several services of interest (FCC, PLTV, NPVR and VoD) in Figure 2. Note that the Zipf distribution will be used to describe the services on the left half and ZM will describe the services in the right half. The characteristics of each service are summarized in Table I. Cache hit rate and popularity distributions. Cache hit rate is defined by the popularity cumulative mass function (CMF) for each service and the total number

”Behavior” Browse

Decreasing HR

Watch/get once

# Objects PLTV

FCC

ZM

WWW NPVR

P2P Files

Large

VoD

NPVR

FCC

Zipf-Mandelbrot PMF (Log-log scale) Zipf (Browse)

0.1 0.01 0.001

BB/NF VoD

ZM

0.0001 Decreasing α BB/NF—Blockbuster/Netflix HR—Hit rate FCC—Fast channel change NPVR—Network-based personal video recording P2P—Peer-to-peer

PLTV

1 Probability

Decreasing α

Zipf

Decreasing cache HR

VoD

Small/medium

1

10 100 1,000 Zipf Mandelbrot 1/(qx)␣

PLTV—Pause live television PMF—Probability mass function VoD—Video on demand ZM—Zipf-Mandelbrot

Figure 2. Classification of video services.

(percentage) of objects cached. If services have a different popularity distribution they will have a different hit rate for the same percentage of objects cached. Figure 3 illustrates this point with the cache hit rates for two different service popularities, a 1 and a 0.5. In this example, we have assumed 5,000 total objects— e.g., movies for a VoD Service. Also shown is the

percentage of total storage required in gigabytes based on the 5,000-object service and 2.7 GB of storage per object. Total storage for this service is then 13.5 terabytes (TB). We can see for this service that caching 2,000 objects is about 20 percent of the total, and this results in a cache hit rate of about 66 percent for one service (a 1) and 38 percent for the other (a 0.5).

Table I. Characteristics of each service. FCC service

FCC

VoD

NPVR (7d)

PLTV

Number of objects (e.g., channels/titles)

300

5,000

50,000

300

0.005

2.7

1.8

1.8

1.5

13,500

91,000

540

1

0.5

0.4

1

NA

100

NA

NA

Storage per object—CODEC mean 4 Mbps (GB) Storage required (GB) Popularity parameter (a) Shift parameter (q) CODEC—Coder decoder FCC—Fast channel change NPVR—Network-based personal video recording PLTV—Pause live television VoD—Video-on-demand

DOI: 10.1002/bltj


17

100.0%

61%

Alpha 0.5

77%

Alpha 1

38%

1.0%

1

1001

% Total storage 40%

10.0% % Total storage 20%

Cache hit rate

58%

2001

3001

4001

Figure 3. Cumulative mass function for two different popularity curves.

Network Cache Optimization In this section we will demonstrate that there is an “optimal” cache solution for any given network. This is significant as all operators will have different topologies, network architectures, costs, and service offerings. Determining a caching architecture that is most economical for a given network will require consideration of all these factors. To illustrate the problem analytically, we will have to simplify the problem by taking some liberties with equipment costs and configurations. In addition, the problem is solved for a single service. We present a simplified network cost model that illustrates the trade-off between transport and memory cost. The total network cost is the sum of transport cost and memory cost, and, assuming that transport cost is proportional to the amount of traffic that traverses the equipment and memory cost is proportional to the memory size, we have: Ntwk_Cost Eq_Cost Memory_Cost CtTe CmM (1) where Ct is a cost per unit of traffic in dollars per megabit per second, Te is a total amount of traffic

18


DOI: 10.1002/bltj

(in Mb/s) that traverses equipment, Cm is a cost per unit of memory in dollars per gigabyte, and M is the total size of memory (in GB) used. Note that the impact of linearization of transport and cache cost depends on the value of fixed transport and cache cost. Now, let us assume that we have a multilevel network with N levels and use Mi of memory per equipment shelf on i-th level, i 1, 2, . . . , N. Consider first the equipment cost. This cost depends on the total amount of traffic that traverses all equipment shelves. If T is the total amount of unicast traffic requested by subscribers, then the downstream traffic from all equipment shelves on the first level will be T, and the upstream traffic for these shelves will be k1T, where k1 1 f(M1/S), f is the hit rate function and S is the average content size (in GB). For equipment on the second level, total downstream traffic will be k1T and total upstream traffic will be k2T, where k2 1 f((M1 M2)/S). This is shown in Figure 4. We assume that VHO equipment does not use cache memory, so upstream and downstream traffic are the same ( kNT). The total equipment cost may be computed as

All subscribers

First-level memory M1

Second-level memory M2

R1

R2

Rn

RVHO

R1

R2

Rn

RVHO

k0T

k0 1

k 1T

N-th level memory MN

k2T

VHO

kNT

k1 1f(M1/S) k2 1f((M1M2)/S)

kNT

kN 1ΣMi /S

VHO—Video hub office

Figure 4. Traffic flow in a multilevel network with cache memory.

N

Eq_Cost CtTe CtT a a (ki1 ki ) 2kN b i1

(2)

N

CtT a1 2 a ki kN b

allocated memory, hit rate function, traffic, and equipment capacity. Summarizing equations 1, 4, and 5, we have the following simplified expression for network cost:

i1

N

As mentioned before, coefficients ki depend on memory Mi and the hit rate function f: n

i1

N

i1

(3)

i1

By substituting kn from equation 3 into equation 2, we have n

Eq_Cost CtT a2N 2 2 a f a a Mi兾Sb n1

n1

N

(6)

f a a Mi兾Sb) Cm a Li Mi

kn 1 f a a Mi兾Sb

N

n

Ntwk_Cost(M) CtT a2N 2 2 a f a a Mi兾Sb

ii

N

f a a Mi兾Sbb

(4)

i1

Network cost minimum is obtained when Mi values are on the boundary of the domain (when, e.g., M1 0 and we do not use cache at corresponding level) or at stable points, where d(Ntwk_Cost) 0 dMi

for

i 1,2, . . . , N

(7)

ii

The total memory cost may be computed as N

Memory_Cost CmM Cm a LiMi

(5)

i1

where Li is the number of equipment shelves in the i-th level. Parameters Li also depend on the amount of

Consider the case when N 3 (DSLAM, CO, and IO levels) and assume that the hit rate has a Zipf disdf p tribution (dx x a), where normalized coefficient p depends on number of titles and a, and numbers Li of equipment shelves do not depend on memory Mi, the solution of equation 7 is:

DOI: 10.1002/bltj


19

M1 a M2 a

M3 a

1兾a 2CtTp b Cm(L1 L2 )

S

a assumed equal to 0.5. We will assume a topology where we have 1,000 DSLAMs total, 100 COs, and 10 IOs, in other words, l1 1,000, l2 100, l3 10. Solving equation 7 for a single level cache solution at level 2, we plot the results in Figure 5. The cost of memory (Cm in the figure) is varied from $22/GB to $2.5/GB to illustrate the impact of changing one parameter. The memory costs are linear and are drawn in the set of gray curves (from Cm $22/GB to $2.5/GB), the solutions are shown in the purple set of curves, and the equipment costs are shown in black. So as expected there is a unique low-cost solution for each value of Cm. For Cm $22/GB, the optimal solution is 347 objects cached at each CO and for Cm $2.5/GB the solution is 3,050 objects cached per CO. We make a few observations here; first is that the optimal solution can be far better than the “intuitive” solution, where we assign a great deal of memory per cache. We also note that changing one parameter (in this case price) results in a very different solution. Finally, we note that as memory price decreases, the

(8)

1a

1兾a 1兾a 2CtTp 2CtTp b a 1a b Cm(L2 L3 ) S Cm(L1 L2 ) (9)

1a

S

1兾a 3CtTp 1兾a 2CtTp b a b S1aCmL3 S1aCm(L2 L3 )

(10)

In the case when N 1 (DSLAM caching only), similar calculations give us M1 a

3CtTp 1兾a b S1aCmL1

(11)

Equations 8 through 10 and equation 11 show that when memory becomes much cheaper compared to equipment, (Cm Ct) optimal memory distribution will be obtained with a larger memory size. We illustrate the previous results with the following example. Let us assume that we have a VoD service that has 5,000 titles to be offered, and each object has a size of 2.7 GB, and Zipf power parameter

$40,000,000 $35,000,000 $30,000,000 $25,000,000 $20,000,000

Total, Cm 22 347

697

1530

3050

$15,000,000 $10,000,000

Total, Cm 2.5

Cm 22

$5,000,000

Cm 2.5 $0 1

501

1001

1501

2001

2501


3501

4001

Equipcosts

Mem 22

Total Costs

Mem 11

Mem 5

Mem 2.5

Total, Mem 11

Total, Mem 5

Total, Mem 2.5

Figure 5. Central office cache solutions showing varying memory costs.

20

3001

DOI: 10.1002/bltj

4501

100% Blockbuster

90% 80%

Typical Friday

Cache hit rate

70% 60%

Off season

Weekday

50% 40% 30% 20% 10%

Alpha 0.2

Alpha 0.5

Alpha 1

Alpha 1.5

0% 1

51

101

151

201

251

301

351

401

451

VoD—Video-on-demand

Figure 6. Variability of VoD popularity by days of week and seasonal events.

total memory used increases and our total network costs continue to decrease. We note that this solution was based on a single service and a single level; for multiservice the problem becomes more complex. Ultimately, computer modeling is required to solve the problem for multiservice with hierarchical caching; this is discussed a bit later. Network Dimensioning With Caches Carriers are dimensioning their network on the basis of busy hour peak, generally with some confidence level—that is, they accept some “blocking” in their network based on a commercial decision. The exercise of dimensioning their network is one where the operator looks for the worst-case busy hour, and then network resources are applied to support this. For operators considering using caches, dimensioning becomes a complex exercise. Busy hour from a traffic standpoint is not necessarily the busy hour from a caching architecture standpoint. Caches add the uncertainty of cache hit rate, and subsequently, how much traffic will be served from the cache and how much needs to be dimensioned for in the network. We will

illustrate this point with an example of an online video-on-demand service. As mentioned, popularity will vary with time—daily, weekly, and seasonally—as shown in Figure 6. This range represents the release of a blockbuster that everyone will want to watch versus an off-season day when there has not been anything major released in a while. Figure 6 represents the cache hit rate as a function of the number of titles cached; it is based on a VoD rental service that has 500 titles. This graph illustrates that the varying popularity can have a substantial impact on cache hit rate. We see for 10 percent of objects cached (50) the hit rate will vary from 16 percent to over 90 percent (alpha 0.2 to alpha 1.5). Next, using this varying popularity distribution we will take a look at a dimensioning example. Let us define a high load as a Friday or Saturday night and posit that on these evenings we have five times the load of the next busy weekday night. Associated parameters are illustrated in Table II. This table is split into high load and medium; high load busy hour is 100 requests per minute versus 20 during medium load busy hour, which translates to a

DOI: 10.1002/bltj


21

Table II. Load parameters. Medium load Busy hour requests during minute

High load

20

100

Requests/duration of download

2,000

10,000

Transit busy hour VoD traffic (Mbps)

8,000

40,000

Alpha

0.2

Total percent objects cached

0.5

1

10%

Total objects cached

10%

50

Cache hit rate

15.6%

24.0%

1.5

50 66.2%

9.24%

Traffic served from cache

1,249.58

1,920.97

5,298.78

36,955.42

Traffic served at VHO

6,750.42

6,079.03

2,701.22

3,044.58

Worst case dimensioning

6,750.42

VHO—Video hub office VoD—Video-on-demand

40 Gbps peak high load and 8 Gbps medium load. Now we see that in the high load case, 92 percent of the requests will be served from the cache (37 Gbps); that means only 3 Gbps would come from the VoD store in the VHO. In the medium load example, we see in the worst case that only 16 percent of the objects are served from the cache (1.2 Gbps), with the remaining 6.7 Gbps served from the VHO. This is the worst case dimensioning. It illustrates that for network dimensioning the worst case busy hour is not necessarily the hour when most movies are ordered (Friday/Saturday night). To dimension the carrier’s network properly, we need to consider the traffic patterns throughout the week as well as during offseason and combine this with the traffic loads.

Modeling of Hierarchical, MultiService Caching Architectures At this point, we have described the challenges of optimizing cache deployment in IPTV networks, the impact of service popularity distribution on cache hit rates, and network dimensioning with caches. To investigate these issues further, a computer modeling tool has been developed to solve the complex problem of optimizing cache placement and memory configuration for a given network topology and given set of

22


DOI: 10.1002/bltj

services. This tool considers equipment dimensioning, cache constraints (memory and throughput), all economic costs, service popularity distributions and storage requirements, and traffic per service. As output, the tool will provide the cache location, dimensioning, and distribution of memory between services at each level of the network. The tool also provides information about how much traffic is served from each cache. This provides a perspective on how all these factors influence cache architectures and ultimately economics. Two different topologies are used in our analysis—one with small DSLAMs and one with larger DSLAMs. Typically metropolitan areas with longer local loops will have smaller DSLAMs and more of them, and metros with shorter loops will have larger DSLAMs (colocated at the central office with the aggregation router). In many countries in Europe, the loops are shorter than in the United States and as a consequence they use larger DSLAMs; however, this is not universally the case. Table III summarizes the two topologies that are considered for this exercise. The results for the two scenarios (Sc) are displayed in Table IV. Note that the values shown are in GBs and reflect the capacity used by each service at each cache location. We see that the long LL scenario

Table III. Topology scenarios. Longer local loops Sub/DSLAM

Shorter local loops

DSLAM

CO

IO

Sub/DSLAM

DSLAM

CO

IO

100

12

8

500

10

6

24

70 Total subscribers

672,000

Total DSLAM

Total subscribers

9600

Total CO

96

720,000

Total DSLAM

1440

Total CO

144

CO—Central office DSLAM—Digital subscriber line access multiplexer IO—Intermediate office

popularity distribution curves). Next, we now run a sensitivity analysis on the total traffic generated by each service and alpha values for VoD. Results are shown in Table V. In Table V, we see the sensitivity to traffic volume for short local loop scenario, and in Table VI, the sensitivity to traffic volume for the long local loop scenario. The total traffic per service is varied from 2.5 Gbps per CO to 40 Gbps per CO, as seen in the far left column. We make two observations: First, as the traffic increases, so does the total savings for both scenarios. This is intuitive since the cache costs do not change, but more demand is served by the cache. The cache throughput increases as the traffic per service increases, thus resulting in greater reduction in upstream transport bandwidth and overall costs.

(Sc1) is a hierarchical two-level solution, with caches deployed at the CO and IO, while the short LL scenario (Sc2) is a three-level solution, with caches deployed in the DSLAM, CO, and IO. This is intuitive and is clear when we refer to formula 8 in the earlier section on cache optimization. From equation 8, the optimal memory (M1) for level 1 (DSLAM) decreases as L1 (the number of DSLAMs) increases for a fixed total traffic (T). Another observation we make is that the items that are cached closer to the subscriber are FCC and PLTV, followed by VoD. We note from our earlier discussion on services that the storage requirement and popularity were also in this order. That is, the items cached closer to the subscriber are the ones with smaller storage and higher alpha values (steeper

Table IV. Long versus short local loop (traffic per service per CO at 10 Gbps). DSLAM

CO

IO

Sc

FCC

VoD

NPVR

PLTV

FCC

VoD

NPVR

PLTV

FCC

VoD

NPVR

PLTV

Total NW savings

Long LL

1

0.0

0.0

0.0

0.0

1.5

818.1

37.8

540.0

0.0

8,488.8

509.4

0.0

35%

Short LL

2

1.5

0.0

0.0

7.2

0.0

731.7

34.2

532.8

0.0

12,185.1

813.6

0.0

49%

CO—Central office DSLAM—Digital subscriber line access multiplexer IO—Intermediate office FCC— Fast channel change LL—Local loop NPVR—Network-based personal video recording NW—Network PLTV—Pause live television Sc—Scenario VoD—Video-on-demand

DOI: 10.1002/bltj


23

Table V. Short local loop traffic per service, per CO sensitivity. DSLAM T

FCC

40

VoD

CO

NPVR

PLTV

IO

FCC

VoD

NPVR

PLTV

0.05

1.46

1,731

25

20

1.49

0.01

724

10

1.50

7

0.00

2.5

1.50

7

0.00

FCC

PLTV

Total network

VoD

NPVR

540

11,759

239

0

55%

34

540

8,473

526

0

50%

732

34

533

12,185

814

0

49%

0

0

0

2,352

113

533

23%

CO—Central office DSLAM—Digital subscriber line access multiplexer IO—Intermediate office FCC— Fast channel change NPVR—Network-based personal video recording PLTV—Pause live television VoD—Video-on-demand

Second, for the long local loop, as total traffic increases, the optimal caching solution includes caches at the DSLAM. This is interesting as it suggests that traffic not only impacts the network savings but can change the optimal caching architecture. This is again consistent with the results discussed in the cache optimization section, formula 8: as L increases, so does the memory M1; however, if T (total traffic) is great enough, it can overcome the large L. Finally, we look at the alpha sensitivity analysis for VoD, which is shown in Table VII, and we make two observations. First, as the alpha of the service increases there is a corresponding increase in network savings.

Second, counterintuitively, the number of VoD items cached decreases as alpha increases. As alpha increases, the popularity of the first few items increases, and thus to achieve a certain hit rate, fewer items are required. Plotting formula 9, which is the memory solution for the second level (CO), yields a bell-shaped curve in which the total memory M2 at first increases with alpha up to a maximum (the “rising part of the bell”), then steadily decreases (the “declining part of the bell”). It appears that the range of alpha values shown in Table VII coincides with the declining part of the bell-shaped curve. Thus, these modeling results appear to be consistent with our analytical model.

Table VI. Long local loop traffic per service, per CO sensitivity. DSLAM T

FCC

0

VoD

CO

NPVR

NPVR

PLTV

FCC

VoD

NPVR

PLTV

Total network

0

0

200

0

10,708

968

322

55%

0.0

640

29

533

0

8,494

506

0

46%

0.0

1.5

818

38

540

0

8,489

509

0

35%

0.0

0.0

0

0

0

2

2,352

108

540

15%

PLTV

FCC

VoD

1.5

18.0

0.0

20

1.5

7.2

10

0.0

2.5

0.0

CO—Central office DSLAM—Digital subscriber line access multiplexer FCC—Fast channel change IO—Intermediate office NPVR—Network-based personal video recording PLTV—Pause live television VoD—Video-on-demand

24


IO

DOI: 10.1002/bltj

Table VII. Long local loop VoD alpha sensitivity. DSLAM a

FCC

VoD

NPVR

CO PLTV

IO

FCC

VoD

NPVR

PLTV

0.3

1.50

1,461

97

0.7

1.50

543

1.1

1.50

1.5

1.50

FCC

PLTV

Total network

VoD

NPVR

540

8,721

277

0

33%

14

540

8,165

835

0

36%

354

4

344

5,692

1,314

196

37%

286

0

214

4,487

2,689

326

39%

CO—Central office DSLAM—Digital subscriber line access multiplexer FCC—Fast channel change IO—Intermediate office NPVR—Network-based personal video recording PLTV—Pause live television VoD—Video-on-demand

Introduction of the Concept of “Cacheability” In our modeling exercise, we discovered there are several dependencies that determine the overall benefit of caching. In general, we concluded that the caching benefit is a function of the service usage—if a service is not used, it does not generate much traffic and there will be little economic benefit to caching. Additionally, we see that the service popularity distribution has a great deal of impact on where items are stored in the network. The most popular titles sometimes command a large portion of the subscribers and these titles tend to move closer to the subscriber. Finally, storage requirements (file size and total service storage) also impact which services are cached closer to the customer and which services are cached farther back in the network. Thus, traffic per service, probability density function (PDF) of popularity distribution for titles per service, and object size are key contributors to the caching solution, and this leads us to our discussion of “cacheability.” How do we decide for a particular cache with a given memory size which items from several services to put into the cache? The problem is [6], for a given set of services, we try to maximize the total cache effectiveness subject to the limits of available cache memory M and cache traffic throughput T, i.e., the maximum of traffic demand that can be served by cache. Total cache effectiveness is defined by the total amount of traffic served from the cache relative to

the amount requested. This may be formulated as a constraint optimization problem: Maximize the total amount of traffic served from the cache: N

max a TiFi( :Mi兾Si ; ) i1

subject to cache memory constraint: N

a Mi M

i1

and cache throughput constraint: N

a TiFi( :Mi兾Si ; ) T

i1

where N is the total number of services, M is an available cache memory, T is the maximum cache traffic throughput, Ti is the traffic for the i-th service, i 1, 2, … , N, Fi (n) is the cache hit rate as a function of number of cached titles n (assuming that all items are ordered by popularity) for the i-th service, i 1, 2, … , N, Mi is the cache memory occupied by titles of the i-th service, i 1, 2, … , N, Si is the size per title for the i-th service, i 1, 2, … , N, :x; is max integer that x. Note that Fi (n) is the ratio of traffic for i-th service that may be served from the cache if n items

DOI: 10.1002/bltj


25

(titles) of this service are cached. This function is closely related to cumulative distribution function (CDF) of content popularity. In particular, Fi (0) 0 and Fi (ni) 1, where ni is a total number of titles for the i-th service. The method of Lagrange multipliers may be applied to a continuous version of this problem. The method of Lagrange multipliers is used to find the extrema of a function of several variables subject to one or more constraints; it is one of the basic tools in constrained optimization. Lagrange multipliers compute the stationary points of the constrained function; extrema occur at these points, or on the boundary, or at points where the function is not differentiable. Assuming that functions Fi are differentiable and applying the method of Lagrange multipliers to our problem we have: N N d a a TiFi(Mi兾Si ) l1 a a Mi Mb dMi i1 i1 N

l2 a a TiFi(Mi兾Si ) Tbb 0 i1

or Ti dFi Mi l1 a b Si dMi Si 1 l2 for i 1, 2, . . . , N These equations describe the stationary points of the constraint function. The optimal solution may be achieved at stationary points or on the boundary (e.g., where Mi 0 or Mi M). According to the last equation, at the stationary point, two or more services that share the memory should be “balanced,” that is, have the same value of functions fi(m)

Ti dFi m a b Si dm Si

These functions, called “cacheability” functions, quantify the benefit of caching the i-th service per unit of memory used. Note that the function dFi/dm is closely related to the content popularity PDF for the i-th service. This function decreases with an increase of m. Therefore, for given parameters T and S, the cacheability function fi(m) also decreases when m increases.

26


DOI: 10.1002/bltj

To illustrate how cacheability functions may be used to find the optimal solution to this problem, consider the example where we have two services with cacheability functions f1(m) for the first service and f2(m) for the second service, as shown in Figure 7. Let us assume, for example, that we use M1 units of cache memory for the first service and M2 units of cache memory for the second service. The total caching benefit of both services—which is the amount of traffic Tc served from the cache—may be computed as follows: M1

Tc

M2

冮 f (m)dm 冮 f (m)dm 1

0

2

0

First, consider the case when Tc T, and M1 M2 M (i.e., cache memory limited case). If f1(M1) f2(M2) (as shown on the chart) then we may increase caching benefit Tc by “trading” a small amount of memory ⌬m of the second service for the same amount of memory for the first service. In this case, we would use M1 ⌬m units of cache memory for the first service and M2 ⌬m units of cache memory for the second service, and the total cache memory used would be the same, but the caching benefit would be M1¢m

T1c

冮 0

M2¢m

f1(m)dm

冮

f2(m)dm

0

which is more than Tc (for small ⌬m), because f1 (m1) f2 (m2) for m1 僆 [M1, M1 ⌬m] and m2 僆 [M2, M2 ⌬m]. This reasoning demonstrates that if we have an optimal solution where both services share the cache memory, then this solution would be balanced: that is, cacheability values of the services in the optimal solution should be equal. If, on the other hand, Tc T, M1 M2 M, and f1(M1) f2(M2) (i.e., cache throughput limited case) then similar reasoning demonstrates that by “trading” a small amount of memory we can get a “better” optimal solution (i.e., one that uses the same amount of cache throughput Tc but a smaller amount of cache memory). This approach allows implementation of the following algorithm for finding optimal cache partitioning. Using the same example with two services, for

Cacheability (sec1)

f1(m)

H1

H2

0 M1

f2(m)

M2 Cache memory (bytes)

Figure 7. Cacheability: example for two services.

every horizontal line (horizon) that intersects the cacheability curve(s), we determine the amount of cache memory used for the services as well as the corresponding traffic throughput. As the horizon moves down, the amount of cache memory used increases and the traffic throughput increases. As soon as the memory or traffic limit is reached (whichever occurs first), the optimal solution is obtained. Depending on the curves, the optimal solution may be achieved when the horizon intersects a) one curve (horizon H1) only or b) both curves (horizon H2). In case a, cache memory would be allocated entirely to the first service, while in case b, both the first and the second services would share the cache memory in some proportion. A discrete version of this approach has been used for developing a cache partitioning tool that optimally configures cache memory for a given set of services.

Conclusions We have demonstrated analytically and through computer modeling that cache architectures are dependent on several key variables, including traffic per service, service popularity, network topology (i.e., the number of levels and nodes per level), and total storage per service. Other factors affecting cache architectures include memory and equipment costs, which ultimately compete to drive more or less storage in the

network. Small storage services move closer to the edge of the network during the optimization while large storage services move back—i.e., more cacheable items move to the edge; less cacheable move back. However, cacheability is determined on a per-object basis, and thus a large storage service that has a few very popular items may see those items move to the edge during optimization. We have proposed a solution to cache optimization with the cacheability function that maximizes throughput and storage per cache. We note that cost of memory is falling at a rate much faster than equipment (transport) costs, and thus the trend will be toward more storage over time. In some cases that causes the optimal solution to use multiple-level or hierarchical cache architectures. Finally, we note that a large number of nodes per level can make it difficult to find a caching solution (in longer local loop scenario); however, if traffic is large enough, this can be overcome. *Trademark Nielsen is a trademark of CZT/ACN Trademarks, L.L.C. References [1] C. Anderson, “The Long Tail,” Wired Mag., 12.10, Oct. 2004, http://www.wired.com/wired/ archive/12.10/tail.html. [2] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web Caching and Zipf-Like

DOI: 10.1002/bltj


27

[3]

[4]

[5]

[6]

[7]

[8]

Distributions: Evidence and Implications,” Proc. 18th Annual Joint Conf. of IEEE Comput. and Commun. Soc. (INFOCOM ‘99) (New York, NY, 1999), vol. 1, pp. 126–134. M. Chesire, A. Wolman, G. M. Voelker, and H. M. Levy, “Measurement and Analysis of a Streaming-Media Workload,” Proc. 3rd USENIX Symposium on Internet Technol. and Syst. (USITS ‘01) (San Francisco, CA, 2001), vol. 3, paper 1. K. P. Gummadi, R. J. Dunn, S. Saroiu, S. D. Gribble, H. M. Levy, and J. Zahorjan, “Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload,” Proc. 19th ACM Symposium on Operating Syst. Principles (SOSP ‘03) (Bolton Landing, NY, 2003), pp. 314–329. S. Saroiu, K. P. Gummadi, R. J. Dunn, S. D. Gribble, and H. M. Levy, “An Analysis of Internet Content Delivery Systems,” Proc. 5th ACM Symposium on Operating Syst. Design and Implementation (OSDI ‘02) (Boston, MA, 2002), pp. 315–327. L. Sofman, B. Krogfoss, and A. Agrawal, “Optimal Cache Partitioning in IPTV Networks,” Proc. 11th Commun. and Networking Simulation Symposium (CNS ‘08) (Ottawa, Ont., Can., 2008). F. Thouin and M. Coates, “Video-on-Demand Networks: Design Approaches and Future Challenges,” IEEE Network, 21:2 (2007), 42–48. S. Vanichpun and A. M. Makowski, “Comparing Strength of Locality of Reference – Popularity, Majorization, and Some Folk Theorems,” Proc. 23rd Annual Joint Conf. of IEEE Comput. and Commun. Soc. (INFOCOM ‘04) (Hong Kong, Ch., 2004), vol. 2, pp. 838–849.

(Manuscript approved June 2008) BILL KROGFOSS is a strategy director in Alcatel-Lucent’s corporate Chief Technology Office (CTO) in Plano, Texas. With a focus on content distribution, he leads a small team utilizing modeling and simulation to help drive strategy in video networking and look at performance and scalability issues for Internet Protocol television (IPTV) and Internet video. He has been with the company over 10 years, serving in strategy and product management roles, and has over 20 years experience in data and telecommunications. Prior to joining AlcatelLucent, he was in international product management and strategy for Tellabs, managing asynchronous transfer mode (ATM) product lines and the evolution of traditional transport and data networks. He has authored

28


DOI: 10.1002/bltj

many papers in the areas of video distribution, transport, and data networking. Mr. Krogfoss received a BSEE from St. Cloud State University in Minnesota. LEV SOFMAN is a senior research scientist in AlcatelLucent’s corporate Chief Technology Office in Plano, Texas. His areas of interest include network/traffic modeling and cost/ performance analysis and optimization. Recent contributions include IPTV traffic engineering and network modeling for the Lightspeed project for AT&T, Dallas Fort Worth metro network modeling, and AT&T signaling network optimization. Before joining Alcatel-Lucent, Dr. Sofman led activities on softswitch performance and cost analysis at Westwave Communications. Prior to that, at MCI, he was responsible for cost, performance, and traffic modeling of core, metro, and access networks, and network modeling and simulation for dynamically controlled routing. He also participated in software development for the automated provisioning systems that control all multiplex activity for the MCI networks. He has served as a Visiting Member at the Courant Institute of Mathematical Science in New York University and a Research Scientist at the National Patent Institute, Moscow, Russia. He is author of over 50 publications/ presentations in international journals and conferences and holds seven U.S. patents and two patents from Russia. He is a member of the Alcatel-Lucent Technical Academy and the Institute for Operations Research and Management Sciences (INFORMS). He received his M.Sc. and Ph.D. degrees in mathematics from Moscow University, Russia, and from the Institute of Information Transmission Problems, Russian Academy of Sciences. ANSHUL AGRAWAL is a network analyst in the Network and Technology Strategy group at AlcatelLucent in Plano, Texas. He is currently involved in modeling various enhancements in IPTV networks, including caching and targeted ad insertion, and investigating the scalability of IPTV middleware and content distribution networks. His prior research included projects on multiprotocol label switching (MPLS) and active networks, optical wavelength switching, and optical burst switching technologies. He has several publications and a patent. Dr. Agrawal holds a Ph.D. in telecommunications and computer networking and an M.S. in computer science, both from the University of Missouri, Kansas City. He received his B.Eng. in electrical/electronics engineering from the National University of Singapore. ◆