COMPLEXITY/PERFORMANCE TRADE-OFFS FOR ... - CiteSeerX

COMPLEXITY/PERFORMANCE TRADE-OFFS FOR ROBUST DISTRIBUTED VIDEO CODING Abhik Majumdar, Rohit Puri, Prakash Ishwar † and Kannan Ramchandran University of California, Berkeley † Boston University {abhik, rpuri}@eecs.berkeley.edu, [email protected], [email protected]

ABSTRACT In this work, we analytically study the complexity-performance trade-offs associated with video codecs based on the principle of source coding with side information at the decoder. We address three important aspects. First, we quantify the theoretical performance gains attained with side-information based codecs over prediction-based coders like MPEG under a lossy transmission scenario when there is drift. Secondly, we show that it is possible to closely approach MPEG’s compression performance using sideinformation video coding principles with accurate DFD (Displaced Frame Difference) modeling even without sophisticated channel codes. Thirdly, we analytically show that the value of accurately estimating DFD statistics diminishes as the channel gets noisier.

1. INTRODUCTION A number of practical distributed video coding algorithms have been proposed recently [1, 2, 3]. In this work, we present a theoretical study of the various performance tradeoffs inherent in video codecs based on the principle of source coding with side information. While we use simple, analytically tractable, models in this analysis, even these simple models provide valuable insight into the complexity 1compression performance-robustness trade-offs associated with distributed video coding algorithms. The contributions of this work are three-fold. First, we will analyze the performance gains of a distributed source coding approach over a classical predictive coding approach for a joint compression/lossy transmission setting. When transmitting over lossy channels, the encoder and decoder are often out of sync (due to drift) and for this case we will quantify the gains of a distributed coding approach over the predictive one. Secondly, we will show that it is possible to achieve near MPEG-like 2 performance using distributed video coding with accurate DFD modeling and the same motion search as MPEG (without using sophisticated This work was funded in part by NSF grants CCR-0330514 and CCR0219722. 1 In this work, by complexity we refer to motion search complexity. 2 In this paper, by MPEG we refer to state-of-the-art predictive video coding solutions.

channel codes). Thirdly, we will discuss the complexitycompression performance trade-offs of distributed video coding. In particular, we will discuss the relative importance of correctly modeling the motion and the DFD statistics in a lossy transmission environment. Previous theoretical work [4] related to the problem of distributed video coding relies on the assumption of perfect knowledge of the DFD statistics. In such a scenario, it was shown that in principle the motion search operation can be completely transferred from the encoder to the decoder without any loss in compression performance leading to a low complexity encoding solution. In this paper, we recognize that for practical video coding algorithms, the DFD statistics are not known a priori. In fact, the motion estimation module finds both the best predictor and the DFD statistics for encoding. Thus there exists a natural trade-off between the complexity available to the encoder and the attainable compression performance since the absence of motion search operation at the encoder may imply poor modeling of DFD statistics. Hence we find that low encoding complexity distributed video coding algorithms have been unable to (so far) match the compression performance of full-complexity MPEG. In this paper, we show that given the right modeling of DFD statistics a distributed video coding approach can indeed match full-complexity MPEG without resorting to sophisticated channel coding techniques. Further, we investigate the importance of the correct modeling of DFD statistics when the data is to be sent over a lossy channel. In particular, we will analytically show that as the channel degrades, doing a full-motion search gives diminishing marginal utility over doing a coarse-motion search. 2. ROBUSTNESS ANALYSIS In this section, we study the superior robustness properties of the distributed source coding (dsc) approach over the classical predictive coding (pc) approach. Let random variable X be associated with the current data to be encoded. As in Figure 1 which shows the generic set-up of the problem, Y denotes the predictor at the encoder and Y + W denotes the decoder predictor, where W denotes the drift

noise that cannot be observed at the encoder. When W = 0, the encoder and decoder predictors are identical and there is no drift between the encoder and the decoder. In the following, we compare the performance of the two approaches for the discrete/lossless case and the continuous/Gaussian case. X = Y + Z corresponds to the current data to be encoded, where Z corresponds to the innovations noise between the current data X and the predictor data Y . Y, Z, W are independent random variables and their joint statistics are assumed known at encoder and decoder. Let H, I respectively denote the Shannon entropy and mutual information [5].

drift for the predictive coding case as compared to the distributed source coded case is: Rpc − Rdsc = H(Z|X, Y + W ) ≥ 0

This rate penalty is zero for W = 0 (no drift) but is nonzero in general. 2.2. The Gaussian Case We now present a rate distortion analysis for the two approaches in a jointly Gaussian setting. Random variables Y, Z, W are independent Gaussian random variables with 2 respectively. Let U denote the quanvariances σy2 , σz2 , σw ˆ tization random variable (the output of the encoder) and X the reconstruction random variable. We are interested in recovering X to a target distortion D. Using the techniques of rate-distortion coding with side information [9], we have σ2 σ2

Fig. 1. Problem set-up. 2.1. The Perfect Reconstruction Case Here we are interested in communicating X losslessly to ˆ = X with high probability. The the decoder, such that X encoder does not have access to the realization of the drift random variable W , merely having knowledge of the joint statistics of Y, Z, W . Using an extension of distributed source coding techniques [6], it can be shown [7] that the encoder needs a rate Rdsc given as Rdsc = H(X|Y + W ).

We now derive a bound on the rate required by the predictive coding approach. The predictive system first encodes the source innovations Z. This incurs a total rate of H(Z). If there is no drift between the encoder and the decoder (W = 0) the decoder can recover Z and use Y to recover X. However, in general there is a drift between the encoder and the decoder and merely encoding the innovations Z is not sufficient for recovering X. We need to incur an additional drift correction rate. However, the encoder does not have access to the realization of the drift random variable W , at best it could have access to the joint statistics. To correct for the drift, the best the encoder can do here is to use a distributed source coding approach, which results in an extra rate of H(X|Z, Y + W ). In fact, distributed source coding approaches towards correcting drift have been proposed in [3, 2, 8]. Thus the total rate of the predictive coding system is lower bounded by Rpc = H(Z) + H(X|Z, Y + W ).

Using the independence of Y, Z, W this rate is Rpc = H(Z) + H(Y |Y + W )

where the second term can be interpreted as the rate required to “correct” the side information, i.e., re-synchronize the encoder and the decoder frame memories, and the first term represents the innovations rate. So, the penalty due to

Rdsc

σz2 + σ 2y+σw2 1 y w = min I(X; U |Y + W ) = log p(u|x) 2 D

where the minimization is carried out over all probability density functions p(u|x) such that the overall expected distortion is at most D. The predictive coding system quantizes Z to Zˆ with a distortion D. Note that D < σ z2 since otherwise the predictive coding system will not encode the innovations at all. If the encoder and the decoder use identical predictor inforˆ can be recovered from Zˆ as X ˆ = Zˆ + Y . In the mation, X general case, however, there is a drift between the encoder and the decoder. As in section 2.1, the encoder needs to spend additional rate (using distributed coding techniques) to correct for the drift, thus resulting in a total rate ˆ + min I(X; U |Y + W, Z). ˆ Rpc = I(Z; Z) p(u|x)

ˆ is the rate required to where minp(u|x) I(X; U |Y + W, Z) correct the drift. It can be shown [7] that σ2 σ2

Rpc

σz2 (D + σ 2y+σw2 ) 1 y w = log . 2 D2

The difference R pc − Rdsc is given by Rpc − Rdsc =

1 + A/D 1 log 2 1 + A/σz2

2 2 where A = σy2 σw /(σy2 + σw ). For the range of interest 2 D < σz , the difference is positive. Further, we note in the high quality regime (i.e., D → 0), we have (R pc − Rdsc )/Rdsc → 1 i.e., the predictive coding system needs nearly double the rate as compared to the distributed coding system. For details regarding the proofs please refer to [7]. 3. COMPLEXITY-PERFORMANCE TRADE-OFFS

Till date, practical low-encoding complexity distributed video coding algorithms [1, 2] have been unable to match the performance of a full-motion MPEG coder in the absence of

drift. This is a consequence of the limited ability to estimate the DFD statistics accurately without investing in motion search at the encoder. In Section 3.1 below we will outline a high encoding complexity distributed video coding algorithm that matches the compression performance of full motion MPEG. In Section 3.2 we will show that the importance of modeling the DFD statistics accurately diminishes with increasing channel noise. 3.1. Lossless Channel Similar to MPEG, we run a motion compensation at the encoder. Let Y be the (quantized) motion compensated predictor in the DCT domain and X be the input block in the DCT domain. Let X i and Yi be the corresponding DCT coefficients and Q() be the quantizer map. At a high level, while MPEG sends the quantized difference Q(X i −Yi ), we will send those bits of Q(X i ) that cannot be inferred from Y i at the decoder. Specifically, we will use the multilevel coset coding framework (as in [10]) to send X i . Using the sideinformation Y i , the decoder will be able to recover the remaining bit-planes of Q(X i ) (see Figure 2) that are not sent by the encoder. The symbols outputted from this multilevel coding process will then be run through an entropy coder to form the final output (similar to MPEG entropy-coding the quantized difference symbols). The motion vectors will also be entropy coded and sent (as in regular MPEG). For more details please refer to [7].

DFD statistical models rather than the need for the best available channel codes. 3.2. Lossy Channel: Diminishing Role of Motion Search Let X be the block to be encoded. Let Y = (Y 1 , Y2 , . . . , YM ) be the set of predictors available to the encoder for the block ) be the set of predictors X. Let Y = (Y1 , Y2 , . . . , YM available to the decoder for the same block. Here Y is a noisy version of Y (due to previous transmission errors). We assume that X ↔ Y ↔ Y form a Markov chain, i.e., the set of predictors Y is a degraded version of the set of predictors Y. As in Section 2.1, we are interested in comˆ = X. municating X losslessly to the decoder, such that X The joint entropy H(X, Y, Y ) can be expanded as H(X, Y, Y )

=

H(Y) + H(X|Y) + H(Y |X, Y)

=

H(Y) + H(X|Y) + H(Y |Y)

(1)

where the second equality follows from X ↔ Y ↔ Y . H(X, Y, Y ) can also be expanded as H(X, Y, Y )

=

H(Y ) + H(X|Y ) + H(Y|X, Y ) (2)

Note that the minimum rate required to losslessly communicate X to the decoder is R dsc = H(X|Y ), since Y is the side-information available to the decoder. Using (1) and (2), we find that this rate is H(X|Y )

=

H(Y|Y ) + H(X|Y) − H(Y|X, Y )

Since we are interested in observing the effect of channel noise, we will upper bound R dsc by neglecting the last term (since H(Y|X, Y ) can at most increase to H(Y|X) as the noise increases). So, Rdsc ≤ H(Y|Y ) + H(X|Y)

(3)

Note that the term H(X|Y) is a measure of the source correlation while H(Y|Y ) is a measure of the effect of channel noise. Since H(X|Y1 ) ≥ H(X|Y1 , Y2 ) ≥ . . . ≥ H(X|Y1 , . . . , YM ) = H(X|Y)

Fig. 2. Partitioning the Integer lattice into 3 levels. X is the source, U is the (quantized) codeword and Y is the sideinformation. The number of levels in the partition tree depends on the effective noise between U and Y given X. A high encoding complexity distributed video coding solution that matches the performance of full motion MPEG was recently proposed in [11]. While [11] uses long block length state of the art channel codes (LDPCs in particular) to achieve this performance, we show in Section 4 that it is possible to achieve almost full motion MPEG like performance using relatively simple arithmetic codes rather than a bank of long block-length high complexity sophisticated channel codes. This indicates the importance of accurate

we will get better estimates of H(X|Y) as we search the list of predictors (motion search). However, as (3) shows, the rate Rdsc depends on the sum of the correlation noise and the channel noise. For a fixed correlation noise, the reduction in R dsc as we find the correlation noise more accurately (through more motion search), diminishes as we increase the channel noise. The above discussion can also be extended to a jointly Gaussian set-up similar to Section 2.2 and Figure 1, where we want to recover X to a target distortion D at the decoder. The predictors available to the encoder are Y 1 , Y2 , . . . , YM and the predictors available to the decoder are Y 1 + W, Y2 + W, . . . , YM + W . Yi and W are Gaussian random variables 2 and W is independent of Y i for all i and has variance σ w . It can be shown [7] that if the rate reduction obtained from full motion search as compared to a limited amount of search is < 0. So, the rate rebate diminishes as ∆R, then d(∆R) 2 dσw channel noise increases.

37 35

36

35

34

34 33

PSNR(dB)

PSNR(dB)

33

32

32

31 31

30

29 30

Proposed DVC algorithm H.263+ 29 500

600

700

800

900

1000

Proposed DVC algorithm H.263+

28

1100

1200

1300

1400

1500

27 100

150

200

250

300

350

400

450

500

550

Encoding rate (kbps)

Encoding rate (kbps)

(a)

(b)

Fig. 3. Comparison of proposed Distributed Video Coding (DVC) algorithm and H.263+ for (a) the Football sequence (352x240, 15fps) and (b) the Stefan sequence (176x144, 15fps). 38

36

PRISM H.263+ + FEC H.263+

34

PRISM H.263+ + FEC H.263+

36

34 32

PSNR (dB)

PSNR (dB)

32 30

28

30

28

26 26

24 24

22

22

0

2

4

6

8

10

12

packet drop rate (%)

(a)

20

0

2

4

6

8

10

12

packet drop rate (%)

(b)

Fig. 4. Comparison of the PRISM Distributed Video Coding algorithm, H.263+ and H.263+ protected with Forward Error Correcting (FEC) codes (Reed-Solomon codes used, 20% of total rate used for parity bits) over a simulated CDMA2000 1X channel for (a)the Football sequence (352x240, 15fps, 1700 kbps) and (b)the Stefan sequence(176x144, 15fps, 720kbps). 4. EXPERIMENTAL RESULTS

Figure 3 shows a performance comparison of the algorithm presented in Section 3.1 and H.263+ 3 for the Football and Stefan sequences. We use the same motion search as in the H.263+ coder. For the entropy coding arithmetic codes were used. As can be seen from the figure, the proposed scheme performs only marginally worse than H.263+. Figure 4 shows the performance comparison of the PRISM distributed video coder [1] and H.263+ over various error rates4 . PRISM does not do any motion search at the encoder and so loses to H.263+ at 0% loss rate due to inaccurate modeling of the DFD statistics. However, as channel noise increases, the importance of such accurate modeling diminishes (as described in Section 3.2) and the robustness advantages of distributed video coding starts to dominate leading to significant performance gains (over even H.263+ protected with FEC). 3 Free

version obtained from UBC. 4 The channel simulator was for a wireless network conforming to the CDMA 2000 1X standard.

5. CONCLUSIONS & FUTURE WORK We have presented complexity-performance trade-offs for robust distributed video coding. Our ongoing work consists of studying more accurate video models that will aid the further development of practical distributed video codecs. 6. REFERENCES [1] R. Puri and K. Ramchandran, “PRISM: A New Robust Video Coding Architecture Based on Distributed Compression Principles,” 40th Allerton Conference on Communication, Control and Computing, Allerton, IL, October 2002. [2] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,” in Proceedings of the IEEE, Jan 2005. [3] A. Sehgal and N. Ahuja, “Robust Predictive Coding and the Wyner-Ziv Problem,” in Proc. DCC, 2003. [4] P. Ishwar, V. M. Prabhakaran, and K. Ramchandran, “Towards a Theory for Video Coding Using Distributed Compression Principles,” International Conference on Image Processing (ICIP), Barcelona, Spain, September 2003. [5] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley and Sons, New York, 1991. [6] D. Slepian and J. K. Wolf, “Noiseless Coding of Correlated Information Sources,” IEEE Transactions on Information Theory, vol. 19, pp. 471–480, July 1973. [7] A. Majumdar, R. Puri, P. Ishwar, and K. Ramchandran, “Complexity performance trade-oofs for robust distributed video coding,” in UCB/ERL Technical Report, 2005. [8] A. Majumdar, J. Wang, K. Ramchandran, and H. Garudadri, “Drift reduction in predictive video transmission using a distributed source coded side-channel,” in Proc. ACM Multimedia, 2004. [9] A. D. Wyner and J. Ziv, “The Rate-Distortion Function for Source Coding with Side Information at the Decoder,” IEEE Transactions on Information Theory, vol. 22, no. 1, pp. 1–10, January 1976. [10] A. Majumdar, J. Chou, and K. Ramchandran, “Robust Distributed Video Compression based on Multilevel Coset Codes,” in Proc. Asilomar, Nov. 2003. [11] A. Sehgal, A. Jagmohan, and N. Ahuja, “Scalable Video Coding Using Wyner-Ziv Codes,” in Proc. Picture Coding Symposium (PCS), 2004.