ADAPTIVE ERROR CONTROL FOR PACKET

1 downloads 0 Views 161KB Size Report
they do not prevent packet loss altogether. Error con trol mechanisms attempt to minimize the visual impact of lost packets at the destinations. In this paper, we.
ADAPTIVE ERROR CONTROL FOR PACKET VIDEO IN THE INTERNET Jean-Chrysostome Bolot INRIA 2004, route des Lucioles 06902 Sophia Antipolis Cedex France [email protected]

ABSTRACT

Anecdotal evidence suggests that the quality of many videoconferences in the Internet is mediocre because of high packet loss rates. This makes it important to de sign and implement mechanisms that minimize packet loss and its impact in video (and audio) applications. There are two such types of mechanisms. Rate control mechanisms attempt to minimize the amount of packet loss by matching the bandwidth requirements of a video ow to the capacity available in the network. However, they do not prevent packet loss altogether. Error con trol mechanisms attempt to minimize the visual impact of lost packets at the destinations. In this paper, we provide motivation for using error control mechanisms based on forward error correction (FEC) and packet reconstruction. We examine a specic mechanism, and evaluate its cost as well as the benet expected from using it. This mechanism can be augmented to obtain a joint source/channel coding scheme suitable for both the current and the future integrated services Internet.

1. INTRODUCTION The current Internet provides a single class best eort service. From an application's point of view, this ser vice amounts in practice to providing channels with characteristics such as delay and loss distributions that vary with time. These variations are not known in advance, and they make it essentially impossible for the network to provide end users of a video connection with guaranteed video quality (expressed in terms of a minimum frame rate or a minimum SNR). One way to provide good quality video in such a network is to have video applications adapt to the service provided by the network, which in practice amounts to adapting the coding, transmission, reception, and decoding pro cesses based on the characteristics of the channels. This

Thierry Turletti MIT LCS 45 Technology Square Cambridge, MA 02139 USA

[email protected]

adaptation is typically done by control mechanisms. In this paper, we focus on the adaptation of video applica tions to the characteristics of the loss process of video packets in the network. This adaptation can be done using two types of mechanisms. Rate control mechanisms (whether sourcebased [4] or destination-based [10]) attempt to mini mize the amount of packet loss by matching the band width requirements of a video ow to the capacity avail able in the network. However, they do not prevent packet loss altogether. Error control mechanisms then attempt to minimize the visual impact of loss at the des tinations. In Sections 2 and 3, we describe approaches used in current applications. In Section 3, we pro vide motivation for using open-loop error control mech anisms based on forward error correction (FEC). Such mechanisms add redundant information in the video packets sent by the source so that (some of) the origi nal data can be recovered from the redundant data. We examine and evaluate a specic mechanism in Section 4. We briey describe in Section 5 how this mecha nism can be extended into a joint source/channel cod ing scheme suitable for both the current and the future integrated services Internet.

2. ERROR CONTROL USING ROBUST CODING The quality of the video delivered from a source to a destination depends in an essential way on the cod ing scheme used at the source and on the packet loss process in the network. With compression algorithms such as H.261 or MPEG, the loss of a single packet might degrade video quality over a large number of frames, specically until the next intracoded frame is received [11], [12]. Intracoded mode is used at peri odic intervals. However, the interval between the re

ceipt of two intracoded blocks of image can be very large in congested networks (since intracoded blocks can themselves be lost with non negligible probability) and/or in low bandwidth networks (since the interval between two such blocks being sent by the source is large). Three approaches can be used to tackle this problem. One approach is to reduce the time between intra coded blocks of image, in the extreme case to a sin gle frame. This approach is used for example in Mo tion-JPEG. However, it clearly has large bandwidth requirements. Another approach is to intra-code and transmit only those blocks in a frame that change more than some threshold. This approach referred to as conditional replenishment is used in nv [5] and vic [8]. In vic, blocks from the conditional replenishment stage are transformed using a cosine basis (vic refers to this cod ing algorithm as intra-H.261). In nv, they are trans formed using a cosine or Haar wavelet basis. In both coders, a background process also transmits at a low rate all the blocks in the image to ensure that all blocks are eventually transmitted. Note that intraframe cod ing has several advantages other than higher robustness to packet loss over interframe coding, including lower CPU requirements (since there is no need to include a decoder at the source for the inverse quantization and inverse cosine transform computations) and no need of NACK-based scheme for error recovery. However, it has bandwidth requirements higher by about 30% than those of interframe coding [15]. Thus, another approach is to use both intra and interframe coding. The problem however, as we men tioned earlier, is to achieve good resilience to packet loss. This is done by having the rate at which intra coded frames are sent depend on the loss rate observed in the network. This approach is used in the H.261 coder of ivs [15]. If the number of destinations is small, a NACK based scheme is used. As the number of des tinations increases, and/or as network congestion in creases, the scheme in ivs becomes closer to that used in conditional replenishment. However, the ivs coder still includes a decoder (unlike the optimized coder in vic). Furthermore, ivs keeps track for each macroblock (MB) of the time at which this MB was last coded. The background refreshment process then scans MBs and ensures that each MB is encoded at least once within some interval, the length of which is also a function of the packet loss rate measured in the network1. The approaches above all adjust the mixture of inter 1 Note that this background process can thus be included in the movement detection procedure. We could also pick and re fresh MBs at random.

and intracoded frame to minimize the impact of packet loss on image quality. An altogether dierent approach, which applies to both inter and intracoded frames, uses open-loop error control mechanisms such as Forward Error Correction (FEC). In these mechanisms, redun dant information is transmitted along with the original information so that (at least some of) the lost original data can be recovered from the redundant information. These mechanisms are attractive because they provide resilience to loss without increasing latency. Further more, FEC schemes for packet audio have been found to considerably decrease the perceived loss rate at des tinations and hence to improve audio quality [3, 7]. We examine FEC schemes for packet video in the remain der of the paper.

3. ERROR CONTROL USING FEC The potential of FEC mechanisms to recover from losses depends on the characteristics of the packet loss process in the network. Clearly, FEC mechanisms are more eective when lost packets are dispersed throughout the stream of packets received at a destination. Mea surements of audio packet losses in the Internet has shown that most loss periods are short (i.e. the aver age number of consecutively lost packets is small) [2], and that the loss process can be modeled reasonably accurately with the familiar 2-state Gilbert model. We have not yet analyzed the loss process of video packets, but we expect a somewhat similar result to hold (this assumes of course that the source does not send packets in bursts), and indeed the results shown in the table of Section 4 tend to support this claim. This makes FEC schemes attractive not only for audio, but for video applications as well. The simplest way to add redundancy to a video packet is to add no redundancy at all. Indeed, it is pos sible to recover at the destination from packet losses using simple loss concealment techniques such as spa tial and temporal interpolation. Spatial interpolation reconstructs a missing piece in a frame from its adjacent (presumably non missing) regions in the same frame. Its performance is mediocre for block transform coders such as the H.261 coder used in IVS because several consecutive lines are damaged when a packet it lost. Temporal interpolation replaces the missing region in a frame with the corresponding region in a previous frame. If motion vectors are available, temporal inter polation replaces the missing region with the region specied by the vectors [6]. Spatial and temporal in terpolation can be used together, and combined with interleaving to yield good robustness to loss [16]. Many FEC mechanisms proposed in the literature

involve exclusive-OR operations, the idea being to send every th packet a redundant packet obtained by exclu sive-ORing the other packets [14]. These mechanisms increase the send rate of the source by a factor of 1 , and they add latency since packets have to be received before the lost packet can be reconstructed. k

k

=k

k

4. DESCRIPTION OF OUR SCHEME We have developed a new scheme in which video packet number includes redundant information about previ ous packets f ? g. We refer to the maximum value of as the order of the scheme. For example, packet might include, in addition to the information it would include in the absence of FEC, additional information about packet ? 1 in a scheme of order 1, or about both packets ? 1 and ? 2 in a scheme of order 2, etc. If packet is lost, the destination waits for packet + 1, decodes the redundant information, and displays the reconstructed information. The amount of redundant information (and hence possibly the order of the scheme) is adjusted over time depending on the measured loss process in the network. For example, it might change from including no information (i.e. no redundant information is sent by the source) to includ ing information about packet ? 1 if the loss rate in the network has been found to increase. Dierent kinds of redundant information can be used. In our scheme, the redundant information about packet ? included in packet is made up of the MBs that were sent in packet ? , however typi cally encoded with a lower denition (i.e. at a higher compression rate obtained using for example a coarser quantizer). Furthermore, note that only those MBs en coded in packet ? but not encoded in packets ? ( 2 [0 [) have to be present in the FEC information in packet . Thus, the bandwidth requirements of our scheme are expected to be relatively low because a MB encoded in packet will likely be encoded again in packet + 1 (since movement in a region of the picture is likely to last over several frames). Its buer require ments depend on the order of the scheme. Indeed, the coder has to keep a copy of the DCT coecients of all the MBs sent in packets f ? g, where denotes the current order of the FEC scheme. However, the coder does not need to again compute the DCT coecients for these redundant MBs; it only needs to carry out the quantization and Human encoding calculations. In practice, the redundant information can be in cluded in the ow of packets sent by the source using the extension mechanism in RTPv2 [13]. The extension header includes 2 bytes (the dened-by-prole bytes) that identify the use of the FEC scheme, and 2 bytes n

n

i

i

n

n

n

n

n

n

n

n

k

n

n

n

i

k

k

n

;k

n

n

n

n

i

i

i

for the length of the extension. The extension includes the redundant information proper with appropriate ad ditional information such as the quantizer value used for each redundant information block, the number of such blocks, etc. We do not describe the details of the packet format for lack of space. Our FEC scheme is designed to run in parallel with the rate control mechanism already implemented in ivs [15, 4]. Consider as an example the case when the source uses MBs from packets ? 1 and ? 2 as re dundant information in packet . Since the redundant information corresponds old frames, it is reasonable to encode the MBs from older packets, i.e. redun dant packets ? 1 and ? 2, using quantizer values larger than those used for packet . Let n denote the quantizer value for packet . The coder typically goes through the following steps 1. Mark the MBs that have to be encoded in the new frame using the movement detection algorithm. 2. Identify duplicate MBs in frames , ? 1, ? 2 (i.e. MBs already marked in these frames). Keep only the latest DCT coecients from these MBs. 3. Quantize MBs from frame ? 2 with quantizer value n?2 chosen so that n?2 n?1 ; do Hu man encoding. 4. Quantize MBs from frame ? 1 with quantizer value n?1 n ; do Human encoding. 5. Quantize MBs from frame with quantizer value n ; do Human encoding. Of course, redundant information from ows ? 1 and ? 2 is decoded at the receiver only in case of packet loss. For convenience, we use the notation ( , f ? g) to indicate that video packet includes as redundant in formation MBs from packets in f ? g. In the example above, we considered the combination ( ? 1 ? 2). To each combination of main and redundant infor mation we can associate a cost (we briey discussed bandwidth and storage cost above) and a perceived loss rate, which is the loss rate after reconstructing or concealing the missing packets using the redundant in formation. We compute this perceived loss rate for a two state Gilbert loss process. The process is in state 1 at step if packet is lost, and in state 0 if packet is received correctly. We denote by the transition probability from state 0 to state 1, and by the tran sition probability from state 1 to state 0. The average loss rate without reconstruction is equal to ( + ). The table below shows the perceived loss rate for dif ferent combinations, obtained from the Gilbert model, and from measures collected between INRIA in France n

n

n

n

n

n

Q

n

n

n

n

n

Q

Q

> Q

n

Q

> Q

n

Q

n

n

n

n

i

n

n

i

n; n

n

n

;n

n

p

q

p= p

q

and University College London in the UK. For the mea sures, the values of the Gilbert parameters are = 0 08 and = 0 76. p

q

:

model above. We are examining Gilbert-like models with state-dependent transition probabilities.

:

Combination (n) (n, n-1) (n, n-2) (n, n-1, n-2)

Theory

p=(p + q) p(1 ? q)=(p + q) ? q)2 )=(p + q) p(1 ? q)2 =(p + q)

(p2 q + p(1

Measured 9.5% 1.4% 1.3% 0.4%

Predicted 9.5% 2.2% 1.2% 0.5%

The agreement between theory and experiments is quite good. As expected, adding redundant information de creases the perceived loss rate. We note that the last combination in the table might be an overkill when net work load is low and/or losses are rare. Thus, we need a mechanism to adjust the amount of redundancy added at the source based on the loss process in the network as measured at the destination. One such mechanism has already been implemented in an audio tool developed at INRIA [3].

5. CONCLUSION The results above show that the combination of FEC and packet reconstruction/concealment yields good re sults. The next step is to use a subband video coder (we are using a wavelet coder loosely based on the wavelet decomposition described in [1]) that would al low the development and evaluation of ecient joint source channel coding schemes. The goal then is to make sure that the visually more important bands are received without error or loss at the destinations. One way to do this in the current single class best eort service Internet is to use FEC schemes to provide un equal error protection proportional to the importance of each band. This amounts in practice to allocating more FEC information to these bands. The problem then is that of a bit allocation problem: how many bits of main and redundant information should be allo cated to each band given a model for the network and a distorsion function that takes into account the im pact on visual quality of packet loss and of the value of the quantizer used for the band? This problem can be expressed and solved using standard optimization tech niques. In practice, two issues are dicult to handle, namely i) nding an appropriate expression for the dis torsion function and ii) nding an appropriate network model that captures the essential features of both the video connection under study (such as the increased bandwidth requirements, and hence possibly loss rate, of a connection using a FEC scheme of high order) and of the other connections (possibly other video connec tions using the FEC scheme) sharing resources with the connection under study. This makes it necessary to consider loss models more complex than the Gilbert

6.

REFERENCES

[1] M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Processing, vol. 1, no. 2, pp. 205-221, 1993. [2] J-C. Bolot, A. Vega Garcia, The case for FEC-based error control for packet audio in the Internet, to ap pear in ACM Multimedia Systems. [3] J-C. Bolot, A. Vega Garcia, Control mechanisms for packet audio in the Internet, Proc. IEEE Infocom '96, San Fransisco, CA, pp. 232-239, April 1996. [4] J-C. Bolot, T. Turletti, Experience with control mech anisms for packet video in the Internet, INRIA report, March 1996. [5] R. Frederick, Experiences with real-time software video compression, Proc. 6th Packet Video Workshop, Portland, OR, Sept. 1994. [6] H. Ghanbari, V. Seferidis, Cell loss concealment in ATM video codecs, IEEE Trans. Circuits Systs. Video Tech., vol. 3, no. 3, pp. 238-247, June 1993. [7] V. Hardman, A. Sasse, M. Handley, A. Watson, Reli able audio for use over the Internet, Proc. INET '95, Honolulu, HI, pp. 171-178, June 1995. [8] S. McCanne, V. Jacobson, vic: A exible framework for packet video Proc. ACM Multimedia '95, Nov. 1995. [9] S. McCanne, M. Vetterli, Joint source/channel coding for multicast packet video, Proc. ICIP '95, Washing ton, DC, Oct. 1995. [10] S. McCanne, M. Vetterli, Receiver-driven layered mul ticast, to appear in Proc. ACM Sigcomm '96, Stan ford, CA, Sept. 1996. [11] Video codec for audiovisual services at p  64 kb/s, ITU-T Recommendation H.261, 1993. [12] MPEG 2 Video standard, ISO/IEC 13818-2. [13] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A transport protocol for real-time applications, Internet draft, Audio-video transport working group, June 1995. [14] N. Shacham, P. McKenney, Packet recovery in high-speed networks using coding and buer manage ment, Proc. IEEE Infocom '90, San Fransisco, CA, pp. 124-131, May 1990. [15] T. Turletti, C. Huitema, IVS Videoconferencing over the Internet, to appear in IEEE/ACM Trans. Net working. [16] Q-F. Zhu et al., Coding and cell loss recovery in DCT-based packet video, IEEE Trans. Circuits Systs. Video Tech., vol. 3, no. 3, pp. 248-258, June 1993.