Fast and Robust Sprite Generation for MPEG-4 ... - Semantic Scholar

Fast and Robust Sprite Generation for MPEG-4 Video Coding Yan Lu1, Wen Gao1, 2, and Feng Wu3 1

Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China [email protected] 2 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080,China [email protected] 3 Microsoft Research China, Beijing, 100080, China [email protected]

Abstract. This paper presents a fast and robust sprite generation algorithm for MPEG-4 video coding. Our contributions consist of two aspects. Firstly, a fast and robust Global Motion Estimation (GME) algorithm is proposed here. Spatial and temporal feature point selection schemes are incorporated into the hierarchical GME in order to speed up it. Experimental results demonstrate our method is up to seven times faster than the traditional one in MPEG-4 video verification model, and meanwhile the accuracy is slightly improved. Secondly, a sprite generation scheme with some novel techniques is developed. Rough image segmentation is also introduced for the purpose of image blending in sprite generation. The proposed algorithm can significantly improve the visual quality of the generated sprite image and the reconstructed background object in video coding. Furthermore, the proposed GME and sprite generation algorithms can be used for both frame-based and object-based video coding. Keywords. video coding, global motion estimation, sprite generation, MPEG-4

1 Introduction Sprite coding is one of the important components in MPEG-4 video coding, and the method for effectively compressing sprite image has been included in MPEG-4 video Verification Model (VM) [1]. However, how to generate sprite images still remains an open issue. A sprite, also referred to as mosaic [2], is an image composed of pixels belonging to a video object visible throughout a video sequence. Global Motion Estimation (GME) is the key part of sprite generation, and has attracted the attention of many researchers in the field of video processing and coding. Consequently, a number of GME methods have been developed in the past, which can be generally grouped into two categories: feature matching method [3][4] and gradient descent method [5][6]. Although each of these methods has its unique advantages and applications, they share some common features. For example, the hierarchical implementation has been adopted in most of these methods in order to speed up the GME process.

In this paper, a novel sprite generation technique is presented for video coding. Our GME algorithm for sprite generation is inspired by the earlier work in [6]. However, the difference is that the spatial and temporal feature point selection schemes are incorporated into the hierarchical GME in order to accelerate it. For video coding, the good visual quality and high coding efficiency are the fundamental goals, particularly for static sprite coding because the background object is normally reconstructed by directly warping the sprite according to MPEG-4 standard. However, few works have been done besides the global motion estimation in the earlier sprite generation methods. In this paper, a novel sprite generation scheme is proposed, which can generate sprite image with much better subjective visual quality. Furthermore, when no auxiliary mask information is available, the segmentation technique proposed in [7] is simplified and adopted in sprite generation, which can not only accelerate the motion estimation but also improve the visual quality of the generated sprite.

2 Global Motion Estimation Global motion is normally relative to camera motion such as panning, tilting and zooming, and can be modeled on the basis of a parametric geometrical model. In this paper, the camera motion over the whole scene is parameterized by a perspective transformation as follows. x' =

m0 x + m1 y + m 2 , m x + m4 y + m5 y' = 3 m6 x + m7 y + 1 m6 x + m7 y + 1

(1)

Here {m0, m1, …, m7} are the motion parameters. (x, y) and (x', y') are pair of coordinates whose positions are in correspondence between the two estimated images expressed in different coordinate systems. A traditional hierarchical algorithm based on gradient descent method is adopted to estimate global motion parameters in this paper. Moreover, the spatial and temporal feature points selection schemes are developed and incorporated into the hierarchical algorithm in order to speed up the motion estimation. Fig. 1 shows the block diagram of the hierarchical implementation. The spatial and temporal feature point selection methods are described in the following paragraphs. The other part of implementation refers to [6]. Spatial Feature Points (SFPs) selection is performed on the image to be estimated prior to other operations in order to decrease the pixels involving in motion estimation. It is based on the fact that pixels with large gradient values dominantly contribute to prediction errors rather than those located in smooth areas. The key component of SFPs selection is to choose those pixels with the largest values in the Hessian image. The Hessian image H(x, y) is calculated from the input image I(x, y) using equation (2). Afterwards, the pixels with largest magnitudes in the Hessian image are selected, which normally correspond to peaks and pits in the image to be estimated.  d 2 I ( x, y )   d 2 I ( x, y )   d 2 I ( x, y )  . −  dx 2   dy 2   dxdy   

H ( x, y ) = 

2

  

(2)

The Temporal Feature Points (TFPs) selection is done during the level transition in the hierarchical GME, which aims at further decreasing the pixels participating in the calculation of prediction errors. This operation is based on the fact that the pixels with larger temporal differences contribute more to the total prediction errors. The selection can be implemented as follows. Firstly, the temporal difference image between the current and the predicted image is calculated. The predicted image in term of current motion estimation has been calculated in the last step of previous GME level. Therefore the extra cost in time can be neglected. Secondly, the pixels with the largest absolute magnitudes in the temporal difference image are selected. This operation is the same as that in SFPs selection. Only those pixels involved in the previous GME level participate in the current TFPs selection. translational estimation gradient descent no converge down sample

down sample

yes projection TFPs selection gradient descent no converge

down sample

down sample

yes projection TFPs selection gradient descent

SFPs selection

no converge

current image

reference image

yes motion parameters

Fig. 1. Block diagram of the hierarchical GME implementation with spatial and temporal feature point selection

3 Sprite Generation The described sprite generation structure bases on that in Appendix D of MPEG-4 Video VM [1]. However, in order to achieve fast and robust sprite generation, some novel techniques are introduced as shown in Fig. 2. Instead of estimating the global motion of the current image directly from the previous sprite, the described algorithm first warps the previous sprite and then calculates the global motion referencing the warped sprite image. This long-term motion estimation method can greatly decrease the error accumulations caused by the individual frame. The extra cost in memory is reasonable because the size of warped sprite is the same as that of current frame. Static sprite coding is normally used for object-based video coding, however sometimes auxiliary segmentation information is either unavailable or not accurate enough to mask out all moving objects from the scene. The rough segmentation technique developed in [7] is incorporated into the proposed sprite generation, which is usually used in this algorithm when no auxiliary segmentation masks are available. The main goal of the described algorithm is to rapidly generate the background sprite with better visual quality. Assume that the video sequence comprises n frames, Ik, k = 0, 1, …, n-1. The sprite Sk is generated using Ii, i = 0, 1, …, k-1. Pk denotes the motion parameter estimated at the kth frame. The complete sprite generation algorithm at the kth frame is described as follows: 1) Divide Ik into reliable, unreliable, and undefined image regions. 2) Estimate global motion parameter Pk between Ik and Sk-1. 3) If no auxiliary segmentation is available, then segment Ik. 4) Warp image Ik towards sprite using Pk. 5) Blending the warped image with Sk-1 to obtain Sk. There are five modules used for processing each frame in the sprite generation, including Image Region Division, Fast and Robust Global Motion Estimation, Image Segmentation, Image Warping, and Image Blending. Bilinear interpolation is used for image warping, which is the same as that in MPEG-4 Video VM. Each module of the described algorithm is discussed in detail.

Feature Point Selection

Image Division

VOP Mask

Motion Estimation

Warping

Sprite Segmentation

Warping

Blending

Frame Memory

Fig. 2. Block diagram of fast and robust sprite generation

3.1. Image Region Division According to the visual part of MPEG-4 standard, static sprite coding is normally used for object-based video coding. The described algorithm first derives the reliable masks from the segmentation masks by excluding some pixels along the borders of background object, as well as the frame borders. The excluded areas are defined as unreliable image region; the rest region in background object is defined as reliable image region. Moreover, the areas marked as foreground objects are defined as undefined image region. The core technique of image division is how to extract unreliable image region, which can be implemented by scanning the background object from four directions, i.e., left to right, top to bottom, right to left, and bottom to end. The sprite image is correspondingly divided into reliable, unreliable, and undefined regions. Reliable sprite region has been constructed from reliable image region. Unreliable sprite region was the visible part of unreliable image region. And undefined sprite region was not yet visible in previous images. An example of image division is shown in Fig. 3. The image division can contribute to sprite generation from two aspects. The first is only reliable region participates in motion estimation, which can not only speed up motion estimation but also eliminate the effect of foreground objects and frame borders. The second is that reliable, unreliable, and undefined regions are differently dealt with in image blending, which can improve the visual quality of generated sprite.

(a) Original segmentation

(b) reliability masks

(c) warped sprite masks

Fig. 3. Illustration of image division. Light gray: unreliable region, dark gray: reliable region, and white: undefined region

3.2. Image Segmentation Segmentation information is necessary for sprite generation and compression. However, the auxiliary segmentation masks are sometimes unavailable. This paper presents a rough segmentation algorithm for the purpose of sprite generation, which is based on our earlier work in [7]. In this paper, segmentation is simplified by using an open operator with the window size of 9x9 instead of the time-exhausting iterative morphological operations. Fig. 4 illustrates the block diagram of rough image segmentation. The proposed algorithm can be described in detail using the following steps: 1) Warp previous sprite Sk-1 using motion parameter Pk to get predicted image S'. 2) Subtract current image Ik from S' to get the absolute difference image D. 3) Filter D using open operator to detect the motion zones. 4) Extract the final segmentation masks by thresholding the filtered image. The detailed morphological operation can be found in [8]. Although the segmentation is not accurate, it is enough only for the purpose of eliminating the effect of fore-

ground objects in background sprite generation. After the rough image segmentation, the background areas are marked as reliable image region, and the foreground areas are marked as unreliable image region. The image division results will contribute to the following image blending module. Ik Pk Sk-1

Warping

–

Morphological Filtering

Masks Thresholding

Fig. 4. Block diagram of image segmentation

3.3. Image Blending In the described algorithm, reliable and unreliable image regions contribute differently to the sprite updating. The pixels located in reliable region are weighted to update the corresponding sprite pixels. However, for those pixels located in unreliable region, the corresponding sprite pixels are updated only when they have never been updated by reliable pixels. Undefined image region has no effect in this process. The proposed algorithm can produce sprite image with better visual quality due to two reasons. Firstly, the reliable region ensures that reliable image information contributes more to sprite updating because reliable region normally corresponds to the background object. Secondly, the unreliable region tackles the aperture problem when no reliable image information is available. The good trade-off between reliable and unreliable image division provides better visual quality of sprite than simply averaging images.

4

Experimental Results

According to MPEG-4 video coding standard, sprite coding can be classified into two categories: Global Motion Compensation (GMC) based video coding and static sprite coding. In order to be compared with MPEG-4 video VM, the proposed technique has been verified in both situations. The comparison experiments were performed based on the platform of MPEG reference software for both frame-based and object-based video coding. Auxiliary segmentation masks came from [9]. No rate control mechanism was applied. Table 1 illustrates the experimental results on the Stefan, coastguard, and foreman sequences in CIF format (352 x 288), each of which has 10 seconds of video clips at frame rate of 30Hz. The results demonstrate that our proposed GME algorithm is up to 7 times faster than that in MPEG-4 VM, and meanwhile the PSNR is also slightly improved. The same experimental conditions were also used for static sprite generation and coding. The static sprite was first generated using our proposed method and MPEG-4 VM, respectively. Afterwards, the generated sprite was encoded using MPEG-4 static sprite coding algorithm. The results shown in Table 2 further verify that our GME algorithm is faster and more robust. Moreover, the proposed algorithm can significantly improve the visual quality of the sprite image and the reconstructed background

object. As an example, Fig. 5 shows the final sprite images generated from Stefan sequence using our proposed method. Undoubtedly, the sprite provides much better visual quality compared with that generated by averagely blending the contributing images, as shown in Fig. 6. Fig. 7(a) and Fig. 8(b) show the images reconstructed from the sprites generated from these two methods, respectively. Table 1. GMC coding results Sequence Stefan VO0 Coast G VO3 Foreman VO0 Stefan Rect

BR 500 75 160 540

Algorithms

YPSNR

UPSNR

VPSNR

Bits

GME time

MPEG-4 VM

28.62

33.79

33.51

5133465

1604

Proposed

28.62

33.81

33.51

5142881

233

MPEG-4 VM

29.46

37.05

40.91

762133

676

Proposed

29.47

37.11

40.96

762701

136

MPEG-4 VM

30.61

37.24

38.80

1635109

1660

Proposed

30.62

37.25

38.82

1644680

294

MPEG-4 VM

28.54

33.71

33.42

5527824

1482

Proposed

28.54

33.74

33.42

5536516

224

Table 2. Static sprite coding results Sequence Stefan VO0 Coast G VO3 Foreman VO0 Stefan Rect

BR 68 10 21 70

Algorithms

YPSNR

UPSNR

VPSNR

Bits

GME time

MPEG-4 VM

22.50

34.78

34.43

693729

1883

Proposed

22.69

34.93

34.60

709061

267

MPEG-4 VM

26.29

36.98

41.88

93273

1035

Proposed

27.22

39.04

42.69

97708

161

MPEG-4 VM

26.88

37.13

39.05

219833

1947

Proposed

26.93

37.46

40.27

218745

267

MPEG-4 VM

21.97

34.04

33.62

680050

2038

Proposed

22.66

34.45

34.27

731022

297

Fig. 5. Background sprite of the Stefan sequence generated with our proposed method.

Fig. 6. Background sprite generated with averagely blending images in traditional methods.

(a) (b) Fig. 7. Sprite coding results for background object sequence Stefan, frame 190, 70kbit/s. (a) Proposed method, and (b) MPEG-4 Video VM

5

Conclusions

This paper presents a novel technique for background sprite generation. Our main contributions lie in two aspects. Firstly, the proposed GME algorithm incorporating spatial and temporal feature point selection schemes can significantly accelerate the sprite generation process. Secondly, the proposed sprite generation algorithm with some novel techniques can produce sprite image with much better subjective visual quality. In other words, the proposed GME and sprite generation technique can greatly optimize that in MPEG-4 Video VM.

References 1. MPEG-4 Video Group: MPEG-4 Video Verification Model Version 16.0. ISO/IEC JTC1/SC29/WG11, MPEG2000/N3312, Noordwijkerhout, Netherlands (2000) 2. Sikora, T.: The MPEG-4 Video Standard Verification Model. IEEE Trans. Circuits Syst. Video Technol., Vol. 5 (Feb. 1997), 19-31 3. Smolic, A., Sikora, T., Ohm, J-R.: Long-term Global Motion Estimation and Its Application for Sprite Coding, Content Description, and Segmentation. IEEE Trans. Circuits Syst. Video Technol., Vol. 9 (Dec. 1999), 1227-1242 4. Grammalidis, N., Beletsiotis, D., Strintzis, M.: Sprite Generation and Coding in Multiview Image Sequences. IEEE Trans. Circuits Syst. Video Technol., Vol. 10 (Mar. 2000), 302311 5. Szeliski, R.: Image Mosaicing for Tele-reality. Digital Equipment Corp., Cambridge Research Lab., TR94/2. Cambridge, MA (May 1994) 6. Dufaux, F., Konrad, J.: Efficient, Robust, and Fast Global Motion Estimation for Video Coding. IEEE Trans. Image Processing, Vol. 9 (Mar. 2000), 497-501 7. Lu, Y., Gao, W., Wu, F., Lu, H., Chen, X.: A Robust Offline Sprite Generation Approach. ISO/IEC JTC1/SC29/WG11 MPEG01/M6778, Pisa (Jan. 2001) 8. Heijmans, H.: Morphological Image Operators. Academic Press, Boston (1994) 9. ftp://ftp.tnt.uni-hannover.de