Bandwidth-Efficient Multipath Transport Protocol for ... - IEEE Xplore

53 downloads 1395 Views 4MB Size Report
Jun 14, 2016 - Abstract— Recent technological advancements in wireless ... Manuscript received September 15, 2015; revised January 26, 2016 and.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

2477

Bandwidth-Efficient Multipath Transport Protocol for Quality-Guaranteed Real-Time Video Over Heterogeneous Wireless Networks Jiyan Wu, Member, IEEE, Chau Yuen, Senior Member, IEEE, Bo Cheng, Member, IEEE, Yuan Yang, Ming Wang, and Junliang Chen

Abstract— Recent technological advancements in wireless infrastructures and handheld devices enable mobile users to concurrently receive multimedia contents with different radio interfaces (e.g., cellular and Wi-Fi). However, multipath video transport over the resource-limited and error-prone wireless networks is challenged with key technical issues: 1) conventional multipath protocols are throughput-oriented, and video data are scheduled in a content-agnostic fashion and 2) high-quality real-time video is bandwidth-intensive and delay-sensitive. To address these critical problems, this paper proposes a bandwidthefficient multipath streaming (BEMA) protocol featured by the priority-aware data scheduling and forward error correctionbased reliable transmission. First, we present a mathematical framework to formulate the delay-constrained distortion minimization problem for concurrent video transmission over multiple wireless access networks. Second, we develop a joint Raptor coding and data distribution framework to achieve target video quality with minimum bandwidth consumption. The proposed BEMA is able to effectively mitigate packet reordering and path asymmetry to improve network utilization. We conduct performance evaluation through extensive emulations in Exata involving real-time H.264 video streaming. Compared with the existing multipath protocols, BEMA achieves appreciable Manuscript received September 15, 2015; revised January 26, 2016 and March 27, 2016; accepted April 5, 2016. Date of publication April 12, 2016; date of current version June 14, 2016. The research reported in this paper is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61132001, 61550110244; National High-tech R&D Program of China (863 Program) under Grant No. 2013AA102301; Fundamental Research Funds for the Central Universities under Grant No. 3222006403; Program for New Century Excellent Talents in University (Grant No. NCET-11-0592); Project of New Generation Broadband Wireless Network under Grant No. 2014ZX03006003; National Research Foundation, Prime Minister’s Office, Singapore under its IDM Futures Funding Initiative. The associate editor coordinating the review of this paper and approving it for publication was B. Liang. (Corresponding author: Bo Cheng.) J. Wu is with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China, and also with the Department of Research and Development, OmniVision Technologies Singapore Pte. Ltd., Singapore 068898 (e-mail: [email protected]; [email protected]). C. Yuen is with the Engineering Product Development Pillar, Singapore University of Technology and Design, Singapore 487372 (e-mail: yuenchau@ sutd.edu.sg). B. Cheng, M. Wang, and J. Chen are with the State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail: [email protected]; [email protected]; [email protected]). Y. Yang is with the School of Instrument Science and Engineering, Southeast University, Nanjing 211189, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCOMM.2016.2553138

improvements in terms of video peak signal-to-noise ratio, end-toend delay, bandwidth utilization, and goodput. Therefore, BEMA is recommended for streaming high-quality real-time video to multihomed terminals in heterogeneous wireless networks. Index Terms— Multipath transport, heterogeneous wireless networks, bandwidth-efficiency, real-time video communication.

I. I NTRODUCTION

T

HE MASSIVE deployment of various network infrastructures enables mobile users to surf the rich Internet contents with ubiquitous access options, e.g., wireless local area networks (802.11 family), cellular networks (UMTS, HSDPA, LTE), and broadband wireless networks (LTE and WiMAX). The state-of-the-art mobile terminals (e.g., the Samsung S5 smart phones [1] and Mushroom products [2]) are equipped with multiple radio interfaces to concurrently receive data through parallel wireless access networks. With the popularity of such multihomed mobile terminals, the future wireless networking is expected to incorporate heterogeneous access options for providing high-quality mobile services [3]–[7]. Supported by the technological progresses in wireless infrastructures and hand-held devices, mobile video streaming has already dominated the killer applications over the Internet. The portion of video traffic in global mobile data usage has already reached 55% in 2014 and will exceed 72% by the end of 2019 [8]. In parallel, global mobile data is expected to increase 10-fold in the next five years. This tremendous growth imposes heavy loads on the capacity-limited wireless platforms. The resource restrictions of single wireless networks prompt the integration of heterogeneous access medium for concurrent video transmission to multihomed devices as shown in Fig. 1. Therefore, many end-to-end solutions [9]–[15], [41] (at application/transport layer) have been proposed to enable multipath video streaming. In particular, an effective transportlayer protocol is able to guarantee the application-layer QoS (Quality-of-Service) and enhance the network-level utilization. Furthermore, transport protocols only require modifications at communication terminals (e.g., Linux kernel implementation) and reduce the burden of application programs. A. Problem Statement In this paper, we investigate the multipath transport of high-quality real-time video (e.g., online gaming, video call,

0090-6778 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2478

Fig. 1. Multipath video transport with the proposed BEMA protocol over both LTE and Wi-Fi networks to multihomed terminals.

live sports program, etc.) over heterogeneous wireless networks. The challenging issues are summarized as follows. 1) Stringent QoS requirements. High-quality live video streaming is bandwidth-intensive and delay-sensitive. The throughput demand for high-definition video distribution mainly ranges from 2−6 Mbps. Besides, the oneway delay is limited to be less than 150 ms to achieve excellent real-time video quality [15]. 2) Network bandwidth limitation. The radio resources in wireless platforms are scarce and time-varying. Recent studies [4], [16] reveal that the available bandwidth for individual mobile users in 4G LTE networks generally ranges from 1.5 to 2.5 Mbps. 3) Path asymmetry. The different physical properties and time-varying network status result in the path asymmetry of heterogeneous access networks [4]. The involvement of an unreliable communication path in multipath video transport only degrades the average quality. 4) Problems of current multipath protocols. Multipath TCP (MPTCP) [17] and Stream Control Transmission Protocol (SCTP) [18] are the transport-layer protocols recommended by IETF (Internet Engineering Task Force) to enable parallel data transfer over multiple communication paths. However, both MPTCP and SCTP are ineffective for real-time video delivery since the packet retransmission mechanism incurs large end-to-end delay. Multipath RTP (MPRTP) [13] is a light-weight protocol but unable to provide reliable data transmission over the error-prone wireless networks. Although there are some Forward Error Correction (FEC) based multipath protocols [19], these solutions are throughput-oriented and video data is scheduled in a content-agnostic fashion. Thus, it still remains problematic to deliver high-quality real-time video with the existing multipath transport protocols. B. Contributions This research advances the state-of-the-art by developing a Bandwidth-Efficient Multipath streAming (BEMA) protocol

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

with the following solutions: 1) employs the FEC coding (i.e., systematic Raptor codes [20]) for reliable data transfer against network packet losses; 2) performs priority-aware video frame protection to enhance bandwidth utilization; 3) delivers video streaming over the available communication paths based on the quality level. BEMA is able to effectively mitigate the packet reordering, identify congestion losses, and overcome the path asymmetry to enhance streaming video quality. The detailed scheduling algorithms of the proposed protocol will be introduced in Section IV. Specifically, this paper makes the following contributions. • Present an analytical framework to model the multipath video transport with quality requirement and delay constraint over heterogeneous wireless networks. • Propose a multipath transport protocol dubbed BEMA that effectively integrates the following key components 1) A Raptor coding scheme with dynamic code rate and symbol size adaption to provide differentiated frame protection under delay constraint. 2) A path quality based data distribution mechanism to adjust video traffic load and minimize total distortion over multiple communication paths. • Perform extensive semi-physical emulations over the Exata platform involving H.264 video streaming. Evaluation results demonstrate that: 1) BEMA improves the average video PSNR by up to 5.9 (18.2%), 7.8 (23.3%), 9.7 (31.7%), and 11.8 (37.5%) dB compared to the CMT-DA [4], FMTCP [21], CMT-QA [9], and MPLOT [19]. 2) BEMA reduces the average end-to-end delay by up to 22.3 (22.7%), 34.5 (27.2%), 49.2 (34.2%), and 65.1 (41.6%) ms compared to the CMT-DA, FMTCP, CMT-QA and MPLOT, respectively. 3) BEMA reduces the percentage of overdue video frames by up to 6.3%, 12.5%, 16.3%, and 20.2% compared to the CMT-DA, FMTCP, CMT-QA and MPLOT, respectively. 4) BEMA increases the average goodput values by up to 0.13 (7.9%), 0.19 (12.9%), 0.25 (17.5%), and 0.35 (22.9%) Mbps compared to the CMT-CA, FMTCP, CMT-QA and MPLOT, respectively. 5) BEMA reduces the bandwidth consumption by up to 18.9%, 22.1%, 25.3%, and 28.6% compared to the CMT-DA, FMTCP, CMT-QA and MPLOT while achieving the same video quality. C. Paper Organization The rest of this paper is organized as follows. In Section II, we briefly review and discuss the related work to this research. Section III presents the protocol design and problem formulation. The detailed scheduling algorithms (FEC coding and data distribution) of the proposed BEMA protocol are described in Section IV. Section V provides the performance evaluation and concluding remarks are given in Section VI. II. R ELATED W ORK The related work to this study can be generally classified into three categories: 1) multihomed video communication;

WU et al.: BANDWIDTH-EFFICIENT MULTIPATH TRANSPORT PROTOCOL

2479

TABLE I D IFFERENCES OF THE P ROPOSED BEMA W ITH O UR E ARLIER W ORKS [4], [10], [11], [22]

2) multipath transport protocol; 3) video priority aware FEC coding schemes. A. Multihomed Video Communication Han et al. [5] design and implement a live video streaming system over heterogeneous wireless networks based on fountain code. This system aims at optimizing the live video streaming quality by taking advantage of the multipath diversity to maximize encoding source rate. In literature [10], the authors propose a sub-frame level scheduling approach, which splits large-size video frames to optimize the delay performance of High-Definition (HD) video streaming over heterogeneous wireless networks. In [21], the authors introduce a dynamic rate allocation algorithm into Joint Source-Channel Coding (JSCC) to optimize the mobile video quality over heterogeneous networks. Bui et al. [23] propose the GreenBag that includes the load balancing, segment management and energy-aware mode control to deliver mobile video over heterogeneous wireless networks. Xing et al. [6] propose a real-time adaptive algorithm for video streaming over multiple access networks. The video streaming process is formulated as a Markov Decision Process (MDP) and a reward function is designed to consider the QoS requirements. B. Multipath Transport Protocol 1) SCTP Solutions: In reference [24], Wallace et al. review the recent progresses and research issues in SCTP. Iyengar et al. [25] study three negative effects in CMT, i.e., unnecessary fast retransmissions, overly conservative congestion window growth, and increased acknowledgment traffic. In literature [4], the authors propose a Distortion-Aware Concurrent Multipath Transfer scheme (CMT-DA) to minimize the end-to-end distortion in mobile video delivery over heterogeneous wireless networks. However, CMT-DA provides data protection with retransmissions and is not appropriate for the real-time video applications with stringent delay constraint. Wu et al. [14] propose a Content-Aware Concurrent Multipath Transfer (CMT-CA) scheme that performs priority-aware chunk scheduling in SCTP to optimize the streaming quality. Xu et al. [9] propose a Quality-Aware Adaptive Concurrent Multipath Transfer (CMT-QA) scheme that includes the components of data distribution, path quality estimation and optimal retransmission. Specifically, the path quality is estimated with the sending buffer size and transmission delay. 2) MPTCP Solutions: The Fountain Code-based Multipath TCP (FMTCP) proposed by Cui et al. [21] leverages the rateless fountain coding to overcome channel erasures and

path heterogeneity in multipath data transfer. However, the video quality is not considered in [21] and this metric is significantly different from the network-level parameters (e.g., throughput, delay, packet loss rate, etc.). Peng et al. [26] propose an energy-efficient MPTCP scheme that leverages the throughput-energy tradeoff for path selection and congestion control. Chen et al. [27] conduct a measurement study of MPTCP performance over cellular and Wi-Fi networks to investigate the impact of path diversity on application-level metrics. In reference [59], a multipath TCP solution named ADMIT is developed for streaming high-quality mobile videos to multihomed terminals in heterogeneous wireless networks. However, ADMIT may suffer from serious delay performance degradations should packet loss occurs due to the retransmission mechanism in MPTCP. The BEMA protocol developed in this paper improves multipath video transmission with frame-level video distortion model and rateless Raptor coding. Peng et al. [60] design a fluid model for MPTCP algorithms and characterize parameters for TCP-friendliness. Singh et al. [13] propose a multipath real-time transport protocol (MPRTP) that extends RTP to multipath communication scenario. In literature [19], the authors propose a Multipath Loss Tolerant (MPLOT) protocol that exploits multipath diversity in wireless networks based on block erasure code. The differences of the proposed BEMA with our earlier works [4], [10], [11], [22] are summarized in Table I. BEMA is different from the existing studies [4], [10], [11], [22] in the protocol type, data protection mechanism, data allocation scheme, priority-aware scheduling and FEC coding algorithms. C. Video Priority Aware FEC Coding Literature [50] generally reviews the recent works on FEC coding for video data protection. Sgardoni and Nix [51] present an APP/PHY cross-layer design that performs Raptor code-aware link adaption for unicast video streaming over mobile broadband networks. Tournoux et al. [52] propose a rateless coding scheme that keeps on sending the encoded symbols until receiving an acknowledgement or passing the deadline. Ahmad et al. [53] propose an on-the-fly erasure coding scheme called Tetrys that considers the feedback information during the FEC coding process. Huang et al. [54] propose a hybrid FEC/ARQ protocol built on a packet streaming code. In [55], a cross-layer FEC scheme using Raptor and rate compatible punctuated convolutional (RCPC) codes is developed for video transmission over wireless channels. Xiao et al. [56] propose a sub-GoP level FEC coding scheme to optimize the delay performance for real-time

2480

Fig. 2.

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

System diagram of the proposed BEMA (Bandwidth-Efficient Multipath Streaming) protocol.

video applications. In [57], a randomized expanding FEC coding scheme is proposed to append parity packets for current and previous frames for enhanced data protection. Cui et al. [58] propose to leverage end-to-end coding for improving TCP performance and develop a multipath coding model for TCP-based data transmission. This model is important to integrate end-to-end coding with TCP and provides insights for the future work. In summary, the existing multipath protocols are ineffective for real-time video communication due to the retransmission mechanism or content-agnostic scheduling. To the best of our knowledge, the proposed BEMA is the first multipath transport protocol that employs the Raptor coding and priority-aware scheduling to stream high-quality real-time video over heterogeneous wireless networks. III. P ROTOCOL D ESIGN AND P ROBLEM F ORMULATION A. System Overview Fig. 2 presents the system diagram of the proposed BEMA framework, which is a completely transport-layer protocol. The goal of this protocol is to effectively integrate the available resources in heterogeneous wireless networks to stream highquality real-time video. The working components are implemented at both sender and receiver sides. We consider the multipath data transfer of a single video flow over multiple access networks and each wireless network is modeled as an independent end-to-end communication path. Assume the video is encoded in H.264 format [28], which is pervasively employed in current industrial communities for video compression. For real-time video applications, distortion and delay are two primary QoS parameters to guarantee the user-perceived quality [15], [29]. Therefore, we assume the quality constraint dmin and delay requirements T are imposed by the video application while invoking the socket. The default value of the quality constraint is dmin = 0 (i.e., distortion minimization). Each video frame is expected to arrive at the destination within its playout duration (e.g., the deadline T is 40 ms if the video is encoded at 25 frames per second). To ensure the compatibility and portability of the proposed system, BEMA employs UDP (User Datagram Protocol) to transmit video data and TCP for control information exchange (e.g., connection establishment and feedback information). 1) Sender Components: The working procedures at the sender side include the Raptor coding, data distribution and

path status estimation. In particular, the scheduling algorithms for these decision processes are described in Section IV (Algorithm 1-2). The main control parameters in the FEC coding process involve the code rate and encoded symbol size. For each decision epoch, the video frames are converted into encoded symbols and encapsulated in different data packets. The data distributor will partition the coded video traffic into multiple sub-flows based on allocation vector over the available paths. These sub-flows are transmitted to the destination using the UDP sockets. The data packets allocated for each path will be temporarily stored and queueing at the send buffers before being delivered. To guarantee the fairness with other TCP flows, a TCP-friendly rate control (TFRC) [30] unit is adopted to regulate the transmission rate of each subflow. However, the TFRC may unnecessarily reduce the transmission rate if wireless losses occurs. Therefore, the ZigZag scheme [31] is employed for the packet loss differentiation (i.e., distinguish congestion losses from wireless losses). 2) Receiver Components: At the receiver side, BEMA reorders the received packets from multiple paths and checks whether they have passed the decoding deadline. The overdue packets will be dropped since they cannot contribute to the video display process. If any data packet is lost during transmission, the Raptor decoding is performed to recover the lost packets. The information feedback unit is implemented at the receiver side to periodically provide the network status information (available bandwidth, round trip time, packet loss rate) through a most reliable communication path to the sender side. These information are involved in the decision process of FEC coding and data distribution at the sender side. Each data packet is associated with a transmission sequence number for the traffic reassemble at the receiver side. BEMA aims at minimizing the sum of the total distortion of the video frames. This analytical framework includes the mathematical models of communication path [4], video traffic and systematic Raptor codes [20]. These models will be introduced in the rest of this section. The mathematical notations used throughout this paper are summarized in Table II. B. Model Description 1) Communication Path Model: Each communication path is characterized by the physical properties of round trip time RT T p , packet loss rate π pB and available bandwidth μ p . We model the burst loss behavior based on Gilbert model [32]

WU et al.: BANDWIDTH-EFFICIENT MULTIPATH TRANSPORT PROTOCOL

TABLE II D EFINITIONS OF BASIC N OTATIONS

2481

Definition 3 (Decoding Dependence Am ): Since some video frames are encoded based on the prediction from other frames in the same GoP, there are decoding dependencies among these frames. Therefore, each frame m is associated with its parent frames (ancestors) Am . The actual total distortion of a video frame is also dependent on the video coding algorithms, error concealment, content complexity, etc. These factors are at the application layer and out of the control scope of BEMA. It is also infeasible to online estimate the complex content parameters (e.g., truncation and drifting parameters [33]). The drifting distortion caused by parent frames can be expressed  as  m ∈Am Im  · m  . According to the affine models in [3] and [33], we use the following mathematical expression to estimate the total distortion dm  dm = m + Im  · m  , (1) m  ∈Am

in which m represents the effective data loss rate and the expression is derived with the following proposition. Proposition 1 (Effective Data Loss Rate): The effective data loss rate m represents the percentage of lost video data after the systematic Raptor decoding process. This loss probability includes both the channel losses and expired packet arrivals. The equation of m is presented as follows ⎧  n−k ⎪ ⎪ if πmt + 1 − πmt · πmo < , ⎨0, n m = π t + 1 − π t m m ⎪ ⎪ ⎩ otherwise, · πmo , (2) and continuous time Markov chain. In particular, the average loss burst length is 1/ξ pG and π pB = ξ pB /(ξ pB + ξ pG ). The detailed descriptions of these physical properties are presented in [4]. In this research, we employ the path status estimation model developed in [49] to capture the physical properties of the communication paths. This estimation model is able to achieve high accuracy with low network overhead [49]. 2) Video Traffic Model: In the analytical framework of BEMA, we employ the frame-level distortion model developed in [3] and [33] to analyze the objective quality of real-time video streaming. Assume a GoP (Group of Pictures) consists of M frames and each of them is identified by a index m (1 ≤ m ≤ M) and length Lm . Let λ denote the video encoding rate and we consider the constant bit rate (CBR) stream in this work. In this paper, each video frame is modeled with the properties of total distortion, distortion impact and decoding dependence. These terminologies are defined as follows. Definition 1 (Total Distortion dm ): The quality degradation of the m-th frame after the data transmission and video decoding process. This distortion is caused by both the transmission impairments and decoding dependency on the parent frames. Definition 2 (Distortion Impact Im ): The distortion caused by the lost data/packets belonged to the m-th video frame. The distortion impact value for different video contents and frame types are investigated in [34].

in which πmt denotes the transmission loss rate and πmo represents the overdue loss rate. These loss rates are mathematically expressed as follows. 1  πmt = nm p∈P ∀c p 

nm, p   (i+1)  n−1  cip ,c p  c1p θp , × 1 i · πp · Fp ⎡i=1 ⎢ πmo = exp ⎢ ⎣−

c p =B

i=1

 max p∈P



T n m, p ·S μ p ·(1−π pB )

+

ν p ·RT T p 2×ν p

⎥ ⎥ ⎦.

(3)

Proof: The proof is provided in Appendix. 3) Systematic Raptor Code: Raptor code is an extension of the Luby Transform (LT) codes and belongs to the block erasure code. The main motivation to choose the Raptor codes is the high coding efficiency, low processing time, and strong error-correction capability [20], [35]. Besides, Raptor codes exhibit higher flexibility than the traditional Reed-Solomon codes in the number of source symbols (i.e., rateless coding). The encoding and decoding algorithms of systematic Raptor codes are specified in [35]. In the FEC scheme of BEMA, the Raptor coder partitions the video frames in a GoP into several source blocks. A source block is composed of k source symbols in predefined size S. The Raptor encoding process is

2482

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

where dm = m + ⎧ ⎪ 0, ⎪ ⎪ ⎨

Fig. 3.

divided into two stages, i.e., code constraint processor and LT encoder. Raptor code supports a variable number of source symbols (k) in a source block. Fig. 3 presents a conventional algorithm that maps each source symbol to some encoded symbols. The Raptor decoder is able to recover all the source data if any k · (1 + δ) (δ > 0) of the n encoded symbols are successfully received. If the number of received encoding symbols is less than k · (1 + δ), the received data can still be used for the decoding process. For the systematic Raptor codes, k encoded symbols are identical to the k original source symbols in the coding block while n − k redundant symbols are generated. The number of encoded symbols (n), symbol size (S) and code rate (R = k/n) can be dynamically determined. The code rate R is critical to the FEC performance since it affects both the recoverability and delay performance. Decreasing the code rate indicates higher resilience against channel losses but also induces larger end-to-end delay. Conversely, a higher code rate degrades the error robustness but improves the delay performance. To strike an effective tradeoff between delay performance and error resilience, an efficient coding algorithm is desirable for online adaption of code rate and symbol size. C. Problem Formulation It is difficult to guarantee deterministic (strict) delay bounds of all the video frames over the unreliable and time-varying wireless networks [29]. Therefore, the delay constraint in the optimization problem is formulated as minimizing the probability of deadline violations, i.e., arg min {P(Dm > T)}. As multiple video frames may be allocated to the same FEC block, Rm = 0 indicates the m-th frame is not at the end of a specific block. The problem of delay-constrained distortion minimization can be stated as: Given the feedback channel status, video frames {Lm , Im , Am }1≤m≤M , quality requirement dmin and delay constraint T, find the data allocation vector m = { m, p } p∈P and Raptor coding parameters {Rm , Sm } to minimize the absolute difference (|·|) between the expected total distortion and quality requirement dmin . Mathematically, the problem of average distortion minimization with statistical delay guarantees can be formulated as follows

  M m=1 dm − dmin , (P1) : { m , Rm , Sm }1≤m≤M = arg min M

s. t.

arg min {P(Dm > T)} , 1 ≤ m ≤ M,   λ p ≤ μ p · 1 − π pB ,

Im  · m  ,

m  ∈Am

 if πmt + 1 − πmt n−k m = , · πmo < ⎪ ⎪ n  ⎪ ⎩π t + 1 − π t · π o , otherwise, m m m

Systematic Raptor coding for the video frames in a GoP.





(4a) (4b)

πmt and πmo from Equation (3), M M n m, p m=1 (Lm /Rm ) · λ, λ p = m=1 M M nm Lm m=1 m=1

 ν p · RT T p n m, p · S + . Dm = max 2 × νp p∈P μ p · (1 − π pB ) The term (4a) states the statistical delay guarantee T for the video frames in a GoP [29]. The term (4b) regulates the video transmission rate allocated for each communication path ought to be less than the available bandwidth. This optimization problem can be decomposed into the subproblems of multipath frame scheduling ( m ) and Raptor coding adaption (Rm , Sm ). The multipath frame scheduling problem can be converted to the precedence constrained multiple knapsack problem and such problems prove to be NP-hard [12], [36]. The FEC coding adaptation problem is also NP-hard since it is computationally prohibitive to consider all the possible values of code rate Rm and symbol size Sm for each video frame. For instance, if the GoP size is M = 15 and the number symbols is  of redundant 16 ≈ 1.5 × 10 possible N = 80, there are approximately 84 15 combinations for the redundant symbol allocation. Therefore, there is no optimal solution with polynomial time complexity. The exhaustive search to obtain the global optimal solution for frame scheduling and FEC coding is not feasible for online operations in real-time video application because of the limitation in execution time. Thus, we introduce two heuristic scheduling algorithms in the BEMA protocol to achieve sub-optimal performance. IV. S CHEDULING A LGORITHMS OF BEMA This section presents the scheduling algorithms of the proposed BEMA protocol for optimization problem (P1). The challenging issues can be summarized as follows. 1) In the code rate adaption, there is an inherent tradeoff between the delay performance and error robustness to minimize the sum of total distortion. 2) In the data distribution process, it is necessary to consider the path asymmetry because involving an unreliable communication path in multipath video transport will degrade the streaming quality and network utilization. The scheduling algorithms in BEMA are interdependent. In each decision epoch, the FEC coding scheme is invoked in the data distribution algorithm in order to estimate the total traffic rate. First, we present the FEC coding algorithm for code rate and symbol size. The second subsection describes the quality-based data distribution scheme over multiple paths.

WU et al.: BANDWIDTH-EFFICIENT MULTIPATH TRANSPORT PROTOCOL

2483

A. FEC Coding 1) Code Rate: It is a challenging issue to determine the code rate value due to the tradeoff between delay performance and error robustness. We propose to introduce just enough redundant symbols to achieve the optimal tradeoff between endto-end delay and error-correction capability, and thus achieve the target quality dmin . In particular, the balance between delay performance and error robustness is achieved with a fast search algorithm in the code rate adaption. Assume the effective data loss rate of the m-th video frame will approximate the tolerable m after the code rate adaption. In this case, the loss rate nmk−k m video frame data can be successfully recovered at the receiver side. Therefore, the goal of the proposed algorithm is to m , minimize the absolute difference between the m and nmk−k m i.e., (P2) :

{Rm }1≤m≤M     n m − km  = arg min  − m , km M ρ (L m /Rm ) m=1 s. t. + ≤ T, μ · (1 − π B ) μ−λ M (Lm /Rm ) λ · m=1 ≤ μ · (1 − π B ). M m=1 Lm

To determine the value of n m , the expression of m (n m ) can be obtained with Equations (2) and (3). First, we assume that an identical encoded symbols size S is used across all the Raptor coding blocks. Therefore, the upper bound (n max ) of the symbol number satisfies to ρp n max · S + ≤ T, μ p · (1 − π pB ) μ p − λ p   n max · S ≤ μ p · 1 − π pB . λ p · M m=1 Lm Then, we can have the following result

  T · (μ p − λ p ) − ρ p · μ p · (1 − π pB ) n max = min , (μ p − λ p ) · S  M μ p · (1 − π pB ) · m=1 Lm . λp · S To determine the number of source symbols km , we choose the encoded symbol size (S) in the interval of [32, M SS] Bytes [37]. For instance, there are six possible values (i.e., 32, 64, 128, 256, 512, and 1024) if the M SS is 1024 Bytes.

M   S = arg min dm . S∈[32,M S S]

{Nm }1≤m≤M

m=1

M Nm of redundant symbols The maximum number N = m=1 for all the video frames can be obtained with ! M m=1 Lm , (5) N = n max − S in which x denotes the smallest integer larger than x. The goal of the sub-group level Raptor coding is to appropriately allocate the N redundant packets to all the M frames to minimize the absolute difference between total distortion and quality requirement [Equation (6), as shown at the bottom of this page]. Then, the redundant symbols can be appended to Raptor coding block based on the results of {Nm }1≤m≤M . Fig. 3 depicts the redundant symbol allocation process for subgroup level Raptor coding. The term sub-group indicates one or multiple video frames within the same video GoP. It is computationally prohibitive to obtain the optimal redundant symbol allocation for Equation (6) in online operation as  N −1 possible combinations. Therefore, there are totally M+M we introduce a fast search algorithm to obtain a sub-optimal solution with polynomial time complexity for the redundant symbol allocation. For each iteration, the proposed Raptor coding algorithm tries to allocate one redundant symbol for all possible M frames, and estimates the resulting total distortion M d with Algorithm 1. The goal of this fast search m m=1 algorithm is to allocate the N redundant  symbols for the M M video frames to achieve the minimal m=1 dm . After allocating the redundant symbols, the code rate Rm for each frame can be obtained with Lm /(Nm · S). 2) Symbol Size: As shown in Fig. 2, BEMA employs the TFRC unit to regulate the transmission rate of each subflow. The goal of the symbol size adjustment is to possibly minimize the TCP congestion control (AIMD) delay, and thus reduce the probability of overdue frames. Specifically, Brosh et al. [38] reveal that TCP exhibits delay performance bias towards small-size packets because AIMD is a packetbased congestion control algorithm. Therefore, it is necessary to identify the loss state and reduce the symbol size to mitigate the AIMD delay. The symbol size adjustment for each Raptor coding block is to minimize such TCP-level delay while not degrading the goodput performance of video traffic, i.e., {Sm }1≤m≤M = arg min {P(E(Dm ) > T)} , M " #  1 − m (Rm , Sm ) s. t. λ ·

(P3) :

m=1

≤λ·

M " 

# 1 − m (Rm , S) .

m=1

       M  M  "  #     m (Nm , S) + = arg min  Im  · m  (Nm , S) /M − dmin  , s. t. Nm ≤ N   m=1 m=1  m  ∈Am   %& ' $   d (N ,S) m

m

(6)

2484

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

Algorithm 1 FEC Coding

1 2

Input: {RT T p , μ p , π pB , ξ pB } p∈P , T, Lm ; Output: {Rm , Sm }1≤m≤M ; Code Rate Adaption: n max=   min

= {32, 64, 128, ..., M S S}; M S = arg min m=1 dm ; S∈ ( M ) m=1 Lm ; N = n max − S

3 4

Fig. 4. Relationship between the encoded symbol size, coding time and average video PSNR values.

5

for j = 1 to N do index = 1, dtemp ⇐ ∞; for i = 1 to M do N = Ni + 1; iM M  m=1 dm = m=1 [m (Nm ) + m  ∈Am Im  · m  ]; for each S ∈ do M ; S = arg min d m m=1

6 7

Note that reducing the symbol size also induces larger overhead (in coding time and packet header), which in turn results in lower video quality and goodput performance [42]. Fig. 4 presents the relationship between symbol size, coding delay and average video PSNR. As shown in this figure, the coding time decreases while the symbol size becomes larger. This is because of the smaller number of XOR operations when the symbol size increases. While the encoded symbol size is less than 512 Bytes, the video PSNR increases as the value of S becomes larger. The quality improvement in this interval is due to the reduction in coding time. On the other hand, a larger symbol size also indicates smaller Raptor block size and lower resilience against packet losses. Therefore, the average PSNR value gradually decreases when the symbol size is larger than 512 Bytes. We perform the symbol size adjustment with a greedy search for each of the M video frames. In each loop, the symbol size is also selected from the interval of [32, M SS] Bytes. Algorithm 1 summarizes the process of the proposed Raptor coding scheme and the time complexity is concluded in Proposition 2. The detailed explanations for Algorithm 1 are presented as follows. •

• • • • • • • •

• •

Line 1-17: The code rate adaption procedure to dynamically allocate the redundant symbols for the video frames in a GoP to minimize the total distortion. Line 2: Determine the maximum number of encoded symbols that can be transmitted within the delay constraint. Line 3: Define the set of symbol size ( ) ranging from 32 to M SS. Line 4: Determine the initial symbol size S that minimizes the total distortion. Line 5: Calculate the number of redundant symbols (N ) to be allocated for all the M video frames. Line 6-17: The for loop to allocate the (N ) redundant symbols frame by frame. Line 7: Initialize the values of the frame index and total distortion (dtemp ). Line 8-15: The for loop to determine the number of redundant symbols for each of the M frames. Line 9-10: Allocate one redundant symbol to the m-th video frame and estimate the sum of total distortion for all the video frames. Line 11-12: Find the symbol size to minimize the sum of total distortion. Line 18-27: The symbol size adjustment procedure.

8 9 10 11 12

if

13 14

20 21

m=1 dm ≤ dtemp then M index = i, dtemp = m=1 dm ;

m

Symbol Size Adjustment: for m = 1 to M do if Rm == 0 then continue; for each * Sm ∈+ do * + m ; km = LS m , n m = R L·S

22 23

m

m

m

t (n , k ) = Equation (3); πm m m o (n ) = exp − T πm m E(Dm ) ; m (Sm ) = Equation (2); Sm = arg min {P(E(Dm ) > T)} s. t. Sm ∈ " #  # M " M m=1 1 − m (Sm ) ≤ m=1 1 − m (S) ;

24 25 26 27

28

S∈

Nindex = Nindex + 1; Rm = NLm·S , 1 ≤ m ≤ M;

17

19

M

Ni = Ni − 1;

15 16

18

 M Lm T ·(μ p −λ p )−ρ p ·μ p ·(1−π pB ) μ p ·(1−π pB )· m=1 ; , λ·S (μ p −λ p )·S

Return {Rm , Sm }1≤m≤M ;

• • • •



• •

Line 19-27: The for loop to dynamically determine the symbol size for each FEC block. Line 20-21: Continue the next iteration of the for loop if the m-th video frame is not the tail of a FEC block. Line 22-27: The for loop to select the symbol size from the set S to minimize the possibility of overdue packets. Line 23: Estimate the number of source (km ) and encoded (n m ) symbols based on the value of selected symbol size (Sm ). Line 24-25: Calculate the transmission and overdue loss rates based on the estimated numbers of source (km ) and encoded (n m ) symbols. Line 26: Estimate the effective data loss rate (m ) with Equation (2). Line 27: Select the symbol size (from ) that minimizes the possibility of deadline violation of the m-th video frame while not increasing the effective data loss rate.

WU et al.: BANDWIDTH-EFFICIENT MULTIPATH TRANSPORT PROTOCOL

2485

Proposition 2: The Raptor coding scheme is a polynomial2 + time heuristic solution with the complexity of O(M · ( N ·M 2 2 × n · k)), in which N is presented in Equation (5) and M denotes the number of video frames in a GoP. Proof: The analysis of time complexity for Algorithm 1 is presented as follows. • Line 6 − 17 are executed at most N times and thus the time complexity is O(N ). • Line 8 − 15: O(M). There are (M) potential positions for each of the N redundant symbols. • Line 10: the time complexity for the video distortion estimation is O(M · (M/2 + n · k)). • Line 19 − 27: O(M · n · k). In the worst case, there are M times of calculations in this loop and each with the complexity of n · k for approximate estimation of πmt . Therefore, the total time complexity of the proposed Raptor 2 coding adaption algorithm is O(M · ( N ·M + 2 × n · k)). 2 B. Data Distribution We seek to solve the data distribution problem by allocating the encoded symbols (belonged to different frames) according to the quality level of the communication paths. This is because involving an unreliable path in multipath transport will degrade the video quality and induce higher bandwidth consumption [39]. In particular, the path quality defined as follows. Definition 4 (Path Quality Q p ): The capability of a communication path to successfully deliver real-time traffic and the mathematical expression is as follows 

  μ p · (1 − π pB ) exp 1 − (1 − p )/10 − 1 , Qp = · exp RT T p λp (7) This quality parameter is based on the availability model developed  in [40] and the numerator  exp 1 − (1 − p )/10 −1 in Equation (7) denotes the probability that path p is available. p represents the resource availability of the communication path and we estimate p with latest five values of video traffic rate to available bandwidth. Equation (7) indicates the availability of path p is: 1) directly proportional to the resource availability p and available bandwidth μ p ; 2) inversely proportional to the assigned video traffic rate λ p , packet loss rate π pB , and round trip time RT T p . Therefore, the frame scheduling algorithm is inclined to assign more video frames to the communication paths with higher reliability. In the context of heterogeneous networks, the load imbalance problems may frequently occur due to the path asymmetry [4]. To mitigate severe imbalance among different paths, we employ a load imbalance parameter L p to indicate whether path p is overloaded and it can be expressed as [4] μ p · (1 − π pB ) − λ p . Lp =  P B i=1 μi · (1 − πi ) − λi /P This parameter stems from the principle of load balancing in data distribution over multipath networks [18]. The value of L p

is constrained to be smaller than a threshold limit value (TLV) and we set this upper limit to be 1.2 in the emulations [4]. For each GoP, the proposed frame-to-path mapping algorithm obtains the allocation vector to minimize the sum of total distortion subject to the constraints in delay and capacity (P4) : { m }1≤m≤M ⎧ ⎛ ⎞⎫ M ⎬ ⎨  ⎝m + = arg min m  · Im  ⎠ , ⎭ ⎩ m=1 m  ∈Am ⎧ μ p · (1 − π pB ) − λ p ⎪ ⎪ ⎪ ≤ TLV, ⎪ ⎨ P μi · (1 − πiB ) − λi /P i=1 s. t.  ⎪ M n ⎪ ⎪ m=1 m, p · S ⎪ < T. ⎩ μ p · (1 − π pB ) The initial rate allocation for each communication path is proportional to the loss-free bandwidth [19], i.e., λp = λ · 

μ p · (1 − π pB ) p∈P

μ p · (1 − π pB )

.

Then, the video frame data is distributed over the available communication paths according to the quality level Q p in each iteration. With regard to the bandwidth constraint (4b), the video frames with the highest distortion will be dropped to conserve bandwidth, and thus improve the overall streaming quality. The iteration will terminate if the sum of total distortion cannot be further reduced. Algorithm 2 describes the path quality multipath based data distribution scheme and the line-by-line explanations are as follows. • Line 1: Initialize the values of frame scheduling vector m , rate allocation λ p and total distortion D. • Line 2-9: The while loop to distribute data across multiple paths. This loop will terminate if the total traffic rate does not exceed the available bandwidth and the total distortion cannot be further reduced. • Line 3-4: Drop the video frame with the largest distortion. • Line 5: Invoke the Packet Allocation procedure to determine the values of { m }1≤m≤M . • Line 6: Invoke Algorithm 1 (FEC coding) to estimate the coding parameters. • Line 7-9: Update the total distortion D of all the video frames in the current GoP. • Line 11-25: The for loop to determine the scheduling vector for all the video frames. • Line 12-16: The for loop to exclude the overloaded communication paths from P, which indicates the set of all the paths involved in data distribution. • Line 13-14: Estimate the quality Q p and load balancing L p parameters for each communication path. • Line 15-16: Remove overloaded communication paths. • Line 17-18: Concurrently transmit the I frame data through all the available paths. • Line 20-22: Schedule the P frame over the path with highest quality. • Line 23-24: Or else, schedule the P frame over the path with least load.

2486

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 64, NO. 6, JUNE 2016

TABLE III

Algorithm 2 Data Distribution Input: {RT T p , μ p , π pB , ξ pB } p∈P , λ, T, {Sm }, TLV = 1.2; Output: m = { m, p }1≤m≤M,1≤ p≤ P ; λ·μ p ·(1−π pB ) 1 Initialize: m, p = {0} M×P , λ p =  , D = 0; μ p ·(1−π pB )    p∈P   M B 2 while λ > p∈P μ p · (1 − π p ) || m=1 dm < D do 3 m = max1≤m≤M {dm }; 4 5 6 7 8 9 10 11 12

PARAMETER C ONFIGURATIONS OF W IRELESS N ETWORKS

Drop frame m and M = M − 1; Determine { m }1≤m≤M with Packets Allocation; Estimate (n m , km )1≤m≤M 1;  using Algorithm  t (n ) + 1 − π t (n ) · π o (n ); m (n m ) = πm m m m m m dm = δˆm + m ( m , n m ) · δm + fm ; M D = m=1 dm ;

Procedure: Packets Allocation for m = 1 to M do for each communication path  p in P do μ p ·(1−π pB ) · exp RT T p

  exp 1−(1− p )/10 −1 ; λp

13

Qp =

14

μ p ·(1−π pB )−λ p R λp = λ ·  p R , Lp =  p P B p∈P

μi ·(1−πi )−λi / P B (L p > TLV)||(λ p > μ p · (1 − π p )) then

; if

i=1

15 16 17

Remove p from P;

if m == 1 then λ m, p =  P p i=1

18 19 20 21 22 23 24 25

λi

for 1 ≤ p ≤ P;

else p = max p∈P {Q p }; if (L p < TLV)&&(λ p < μ p · (1 − π pB )) then m, p = 1 else p = min p∈P {L p }, m, p = 1; Update the path parameters {L p , Q p } p∈P ; Return m = { m, p }1≤m≤M,1≤ p≤ P ;

emulation setup and network environment are configured in the emulation server. The transmission ends in the emulation topology are mapped to the communication terminals in local networks to mimic real data transfer with high fidelity. First, we describe the evaluation methodology that includes the emulation setup, reference schemes, performance metrics, and emulation scenarios. Then, we depict and discuss the evaluation results. A. Evaluation Methodology

The time complexity of Algorithm 2 is presented in Proposition 3. Proposition 3: The worst-case time complexity for 2 + 2 × n · k), Algorithm 2 is O(M 2 · (P 2 + 1) + M · ( N ·M 2 where M denotes the GoP length, P is the number of communication paths, and N is the number of parity packets. Proof: There are at most M iterations in the while loop to reduce the sum of total distortion. In each iteration, there are M(P 2 + 1) times of operations to determine the frame allocation vector m as the complexity of reliability estimation and path selection is P 2 . The time complexity for the FEC 2 coding is O(M · ( N ·M 2 + 2 × n · k)) (Proposition 2). Thus, the worst-case time complexity of Algorithm 2 is O(M 2 · (P 2 + 2 1) + M · ( N ·M + 2 × n · k). 2 V. P ERFORMANCE E VALUATION We evaluate the efficacy of the proposed BEMA framework by carrying out semi-physical emulations in Exata. The semiphysical emulations are different from the traditional tracedriven simulations (e.g., using NS-2 or OPNET). As shown in Fig. 5, the sender and receiver are connected to the emulation server through the Exata connection manager. The

The Exata and JM software are adopted as the network emulator and video codec, respectively. The system architecture designed for performance evaluation is presented in Fig. 5 and the main configurations are set as follows. 1) Emulation Setup: a) Network emulator: Exata 2.1 [43] is used as the network emulator. Exata is an advanced edition of QualNet [44] in which we can perform semi-physical emulations. The proposed BEMA protocol is developed using C++ and included in the transport-layer stack of Exata. In order to implement the emulations using real-time video, we integrate the source code of JM with Exata and adopt BEMA as the transport protocol for video transmission. The development steps are presented in the Exata Programmer’s Guide [43]. In the emulated network topology as shown in Fig. 5, the sender has one wired network interface and the mobile terminal is equipped with three wireless network interfaces, i.e., the cellular, Wi-Fi (802.11a/g) and WiMAX (802.16). The parameter configurations of different wireless access networks are summarized in Table III [4], [22]. As depicted in the figure, each router in the core networks is connected to four different traffic generators, which are used to generate background traffic and emulate network dynamics. The packet sizes of

WU et al.: BANDWIDTH-EFFICIENT MULTIPATH TRANSPORT PROTOCOL

Fig. 5.

2487

System architecture and emulation topology for performance evaluation.

TABLE IV V IDEO E NCODING PARAMETERS

the Pareto traffic generator are adapted to mimic the real traces collected on the Internet: 43% are large (>1400 Bytes), 17% are medium [(144, 1400) Bytes], and 40% are small (