Design and Implementation of Content-Adaptive ... - CiteSeerX

2 downloads 0 Views 230KB Size Report
Video telephony has been gaining popularity on mobile phones as high-speed cellular networks being deployed world wide. Region-of-Interest (ROI) video ...
Design and Implementation of Content-Adaptive Background Skipping for Wireless Video Yi Liang, Haohong Wang, and Khaled El-Maleh Qualcomm CDMA Technologies, San Diego, CA 92122, USA {yliang, haohongw, kelmaleh}@qualcomm.com Abstract—This work presents a low-complexity system implementation of a novel content-adaptive background skipping scheme for region-of-interest (ROI) video coding in mobile video phone applications. To improve the overall perceptual quality, the proposed approach reallocates bits to ROI macroblocks by adaptively skipping the non-ROI and using a weighted bit allocation scheme. The skip decision for the non-ROI is adaptively determined by the content information of the video, such as foreground shape deformation, foreground motion, background motion and background texture complexity. Experimental results demonstrate that the proposed scheme outperforms traditional schemes by up to 2 dB.

I. INTRODUCTION Video telephony has been gaining popularity on mobile phones as high-speed cellular networks being deployed world wide. Region-of-Interest (ROI) video coding has become a notable advanced technique in video phone applications that provides enhanced coding efficiency and superior perceptual quality [1-5]. For mobile video telephony with a tight bitrate constraint (typically lower than 64Kbps), skipping the unimportant portion of a frame, e.g. not coding but replacing it with the co-located part in a prior frame, allows spending more bits on the more important portion. By skipping the non-ROI, e.g. the background, and reallocating bits to the ROI, e.g. the foreground, the overall perceptual quality can be significantly improved since human vision tends to pay more attention to the ROI. The concept of adaptive frame/object/macroblock skipping has attracted a lot of attention in past literature. To determine the skip modes, the tradeoff between spatial and temporal quality is studied in [6], [7], where a perceptual rationale is used that the human visual system (HVS) is more sensitive to temporal changes when a frame contains high motion activities and otherwise more sensitive to spatial details. In our previous work [8], background skipping is proposed to reallocate bits to further improve the quality of the ROI, where every two consecutive frames are grouped into a unit in which the non-ROI of the second frame is allowed to be skipped for bit reallocation. The major contribution of this work is the contentadaptive background skipping scheme applied on ROI video coding, as well as the underlying low-complexity and low-

power system implementation. In this framework we jointly consider background skipping and optimal bit allocation at the frame level and the macroblock (MB) level respectively. The skip decision is mainly determined by the foreground shape deformation, foreground motion, background motion and the distortion due to the skip. An optimized weighted bit allocation algorithm is used to allocate bits for the foreground and background macroblocks. The scheme proposed in this work is different from [8] in the sense that background skipping is adaptive based upon video content variation, instead of a fixed structure. The rest of the paper is structured as follows. Section II presents the content-adaptive background-skipping scheme and its system implementation. Section III describes coding of the ROIs and bit allocation at the macroblock and frame levels. Experimental results are demonstrated in Section IV. II.

CONTENT-ADAPTIVE BACKGROUND-SKIPPING AND UNDERLYING ARCHITECTURE In this section we present the scheme of content-adaptive background skipping as well as the underlying system architecture. A. System Architecture and Implementation Fig. 1 depicts the system architecture of the proposed ROI video coding, which follows frame-by-frame and macroblock-by-macroblock processing. The system employs a ρ-domain algorithm for rate control [9] at the frame level, where ρ represents the number of non-zero quantized DCT AC coefficients in a macroblock. Bit budget is allocated for a frame before it is coded, using the ρ-domain model that considers remaining bits available and the number of frames in the rate control window [9]. ROI of the frame is detected and tracked so that the macroblocks in the frame are classified into ROI macroblocks and non-ROI macroblocks. Targeting video phone applications with head-and-shoulder type of scenes being the majority, ROI detection is made using skin-tone detection and analysis [3]. Motion estimation is conducted in an accelerated way, and the obtained motion information for each macroblock is used in the background skipping logic to be described later in Subsection B. The decision logic selects to skip a non-ROI macroblock when certain conditions are met. The ρ-budget

for the current frame is then adjusted, and MB-level bit allocation is performed accordingly. Transform-based coding, including DCT and quantization, is then performed in the accelerated unit.

i)

and foreground motion activity sufficiently higher (defined by threshold) than that of the previous frame.

Initialization

ii) Set i) not satisfied;

Frame-level bit budgeting

and background motion activity not in high percentile (defined by threshold) among background activities of the most recent frames in a window;

Fetch frame

Y

Foreground motion activity in high percentile (defined by threshold) among foreground activities of the most recent frames in a window;

ROI detction

and background motion activity not sufficiently higher (defined by threshold) than that of the previous frame;

Motion estimation

and the accumulated distortion due to background skipping is not higher than defined threshold.

Background skipping decision

N

MB-level bit allocation

MB-level bit allocation

Coding ROI macroblocks

Coding all macroblocks

Figure 1. System architecture of ROI video coding.

In implementing such a ROI video coding system, hardware and software have been co-designed to work in synergy, with the tradeoff between processing speed and design flexibility optimized. The most computationallyintensive functional blocks, including motion compensation and transform coding, can be performed in the special unit to speed up the process and reduce overall power consumption. ROI detection as well as the background skipping decision logic are implemented in software to allow design flexibility. Given the example of background skipping, the decision is made in software that sends corresponding parameters to the unit to program the subsequent coding blocks. The soft implementation of the decision logic allows convenient adjustment and tuning in a flexible way. B. Content-Adaptive Background Skipping Decision In this work, video content information, including foreground motion activity, background motion activity, as well as the accumulated distortion due to background skipping, is taken into consideration when making the skip decision, which is one step further from what is proposed in our earlier work [8]. The decision logic selects to skip the background in the current frame when either of the following two sets of conditions is met:

In the conditions above, motion activity is measured by the sum of motion vector lengths for all macroblocks either in an ROI or a non-ROI, with the motion vectors obtained from the motion estimation hardware block. The thresholds and the size of the window to include past frames are predefined and programmable through the software interface. Intuitively, when a large amount of activity or a quick increase of activity is found in the foreground, the decision is to skip the background, which allows reallocating more bits to code the ROI for superior quality. On the other hand, if the background contains large motion or the accumulated distortion due to background skipping is high enough, the decision is not to skip. III.

OPTIMAL BIT ALLOCATION AND RATE CONTROL

In ROI video coding, using the traditional peak signal-tonoise-ratio (PSNR) as a distortion metric to evaluate the quality of the image uniformly might not be appropriate or effective. In video phone applications, human vision tends to pay more attention to the speaker’s head and face that are typically the ROI. Since ROI video coding treats ROI and non-ROI regions differently, we use an ROI-weighted PSNR, denoted as PSNRRW, to measure the quality of ROI video. In PSNRRW = 10 log10 (255 2 / D frame ) , the distortion is calculated as D frame = α DR ( f , f% ) + (1 − α ) DNR ( f , f% ) ,

(1)

where DR and DNR are the MSE distortions of the ROI and non-ROI regions respectively, and f and f% represent the original and the reconstructed frames. α, between 0 and 1, is a parameter that is used to balance the quality between the ROI and the non-ROI, and can be adjusted through the application’s user-interface. PSNRRW is a superior metric to measure the perceptual quality of ROI video than the traditional PSNR. Fig. 2 demonstrates the same video frame coded by a traditional scheme that codes all MBs and an ROI-based

scheme as proposed in this work. At the same bitrate, the PSNR of a) is higher than that of b) since the background of b) is skipped and duplicated from a prior frame. On the other hand, the defined PSNRRW favors b) rather than a), which is more faithful to human perception.

Using the coefficient distribution model in [10] and the rate-distortion model in [11], and by solving ∂J λ / ∂ρ i = 0 , we obtain the optimal bit allocation for MB i as

ρi =

wiσ i N



w jσ j

ρbudget ,

(5)

j =1

the details of which can be found in our earlier work [8]. From (5) it is observed that bit allocation mainly depends on the standard deviation of the macroblock as well as whether the macroblock belongs to the ROI. (a)

(b)

Figure 2. Comparison of reconstructed frames using a) traditional coding and b) ROI coding. Carphone at 40Kbps. a) PSNR=29.94 dB, PSNRRW=29.18 dB; b) PSNR=28.72 dB, PSNRRW=30.04 dB.

A. Optimal Bit Allocation Macroblock-level bit allocation is performed after the background skipping decision, and before a macroblock is actually coded. Bit allocation is performed in software, using information returned by the motion estimation unit, such as the standard deviation of the macroblock. We solve the problem of optimally allocating bits among macroblocks in a given frame within a rate-distortion framework. The problem can be stated as minimize Dframe, such that R ≤ Rbudget ,

Considering a frame with a total of N macroblocks, we use ρi, σi, Ri and Di to denote the number of non-zero DCT coefficients, the standard deviation, the number of bits, and the mean-square-error distortion for the i-th MB respectively. We also define a set of weights wi for each frame as if the i th MBbelongs to the ROI

, (3)

if the i th MBbelongs to the non - ROI

i

i

i =1

We convert the problem in (2) to an unconstrained problem using a Lagrange cost function J, and the problem is formulated as N

minimize J λ = λ R + D = ∑ (λ Ri +wi Di ) , ρi

ρ nadjusted

 budget p −1 adjusted no skipping  p ρ n − p +1 − ∑ ρ n −i i =1  , = ∑ wiσ i budget i∈ROI  ρn otherwise  ∑ wσ + ∑ wiσ i i i  i∈ROI i∈non − ROI

where ρ nbudget is the budget obtained from the traditional frame-level rate control, and p-1 is the number of consecutive frames preceding the current frame that have skipped backgrounds, e.g. the (n-p)-th frame has its background coded. This strategy is a conservative approach in the sense that the current frame is only able to use a maximum number of bits saved by its predecessors. IV.

where K is the number of macroblocks within the ROI, so that D = N w D .



The adjusted ρ budget for Frame n, denoted by ρ nadjusted , is obtained by

(2)

where Rbudget is the bit budget for frame f and R is the total bits used to code the frame.

α  K wi =   1−α  ( N − K )

B. Frame-Level Bit-Budgeting and Rate-Control Before encoding the next new frame, a frame-level bit budget is allocated for rate control purpose. Due to possible background skipping, dynamic adjustment of the ρ budget is necessary on top of the traditional frame-level rate control framework. In this work, we propose a strategy that reduces the ρ budget when skipping is on, and saves the unused portion of the budget for future frames. For a frame that has to code its background, it confiscates all the ρ’s saved from the previous frames that have skipped backgrounds.

(4)

i =1

where the optimal λ is the solution that enables

N

∑R =R i

i=1

budget

.

EXPERIMENTAL RESULTS

We perform simulations using the H.263 Profile 3 codec implemented on the proposed system architecture, and experiment on a number of QCIF sequences coded at 32Kbps to 64Kbps for low-bitrate video phone applications. In the experiments, we compare three different ratecontrolled coding schemes: (1) A macroblock-level “greedy” algorithm where the bits are allocated to macroblocks in a uniformly distributed manner (not ROI-based) [9]; (2) Unitbased background skipping scheme that groups every two frames into a unit and skips the background of the second frame within each unit (our earlier work [8]); (3) The scheme proposed in this work that adaptively skips background, and dynamically adjusts bit allocation for rate control.

complexity system architecture provides superior perceptual quality in video phone applications. The bit adjustment algorithm at the frame level is efficient in rate control. Experimental results show that the proposed scheme outperforms traditional schemes by up to 2dB in PSNRRW.

35

34

PSNR RW (dB)

33

32

31

(a)

30

29 30

Greedy Unit-based Proposed 35

40

45 50 Rate (Kbps)

55

60

65

Figure 3. Rate-distortion comparison of different coding schemes. Carphone sequence. α=0.9 in PSNRRW.

40

38

PSNR RW (dB)

36

34

32

30

28 0

Greedy Unit-based Proposed 50

100

150 200 Frame number

250

300

Figure 4. PSNRRW comparison from frame to frame. Carphone at 48Kbps. α=0.9 in PSNRRW.

The coding gain for the Carphone sequence, a typical head-and-shoulder scene, is shown in Fig. 3. As shown in Fig. 3, the proposed approach outperforms the other schemes in the entire bitrate range and the gain is up to 2 dB in PSNRRW. Similarly, for the Foreman sequence, we have obtained a gain of up to 1.5 dB in PSNRRW. In Fig. 4, the PSNRRW is shown from frame to frame for Carphone. Fig. 5 shows the 15th reconstructed frame using different approaches. The advantage of the proposed approach is nearly 5 dB compared to the greedy algorithm and 3 dB compared to the unit-based background skipping scheme. CONCLUSIONS The presented content-adaptive background skipping scheme for ROI video coding implemented on a low-

(b)

(c)

Figure 5. Comparison of the reconstructed frames. Carphone at 48Kbps. a) Greedy algorithm, PSNRRW=32.05 dB; b) Unit-based background skipping, PSNRRW=33.99 dB; c) Proposed, PSNRRW=36.98 dB. α=0.9 in PSNRRW.

REFERENCES [1]

A. Eleftheriadis and A. Jacquin, “Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates”, Signal Processing: Image Communications, Vol. 7, No. 4-6, pp. 231-248, Nov. 1995. [2] S. Daly, K. Matthews, and J. Ribas-Corbera, “As plain as the noise on your face: adaptive video compression using face detection and visual eccentricity models”, Journal of Electronic Imaging, 10(1), Jan. 2001, pp. 30-46. [3] D. Chai, and K. N. Ngan, “Face segmentation using skin-color map in videophone applications”, IEEE Trans. Circuits Systems for Video Technology, Vol. 9, No. 4, June 1999, pp. 551-564. [4] M. Chen, M. Chi, C. Hsu and J. Chen, “ROI video coding based on H.263+ with robust skin-color detection technique”, IEEE Trans. Consumer Electronics, Vol. 49, No. 3, Aug. 2003. pp. 724-730. [5] C. Lin, Y. Chang and Y. Chen, “A low-complexity face-assisted coding scheme for low bit-rate video telephony”, IEICE Trans. Inf. & Syst., Vol. E86-D, No. 1, Jan. 2003. pp. 101-108. [6] F. C. M. Martins, W. Ding, and E. Feig, “Joint control of spatial quantization and temporal sampling for very low bit rate video”, in Proc. ICASSP, May 1996, pp. 2072-2075. [7] J. Lee, A. Vetro, Y. Wang, and Y. Ho, “Bit allocation for MPEG-4 video coding with spatio-temporal tradeoffs”, IEEE Trans. Circuits and Systems for Video Technology, Vol. 13, No. 6, June 2003, pp. 488-502. [8] H. Wang and K. El-Maleh, “Joint adaptive background skipping and weighted bit allocation for wireless video telephony”, in Proc. International Conference on Wireless Networks, Communications, and Mobile Computing, Maui, Hawaii, USA, June 2005. [9] Z. He and S. K. Mitra, “A linear source model and a unified rate control algorithm for DCT video coding”, IEEE Trans. Circuits and System for Video Technology, Vol. 12, No. 11, Nov. 2002. pp. 970982. [10] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images”, IEEE Trans. Image Processing, Vol. 9, No. 10, Oct. 2000. pp. 1661-1666. [11] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low-delay communications”, IEEE Trans. Circuits Systems for Video Technology, Vol. 9, No. 1, pp. 172-185, Feb. 1999.