Segmented video coding - IEEE Computer Society

M7.11 SEGMENTED VIDEO CODING M.J. Biggart A.G. Constantinides Department of Electrical Engineering Imperial College of Science and Technology London, SW7 2BT, ENGLAND

ABSTRACT A means of preserving the perceptually significant features of an image is to eztract them b y segmentation prior to coding. Distortion can be restricted to less important areas, such as fine detail and tezture. This is the basis of several two-component and segmented image coding schemes. In this paper, the technique is eztended to video coding by applying segmentation to the frame difference signal. However, independent processing of each frame b y this approach leads to inadequate portrayal of motion. A n extension of the method, that incorporates guiding of the segmentation according to the history of the sequence, is developed. Results show a considerable improvement, suggesting that there may be further potential for the application of this approach in low rate video coding.

1 . INTRODUCTION

There is now widespread recognition in image coding of the inadequacies of simple image models (e.g. signal stationarity) and simple distortion criteria, such as Sum Square Error ( S S E ) . In response to this, coding techniques based on the image structure have emerged. In these systems, the image is segmented and perceptually important boundaries between homogeneous regions are preserved. Original examples are the two-component approaches of [1,2]. At a later stage, Kunt et a1 [3] also discussed some systems of this type. In 141, a coding technique based on piecewise constant image representation is shown to compare favourably with those based on the Discrete Cosine Transform (DCT) a t high compression. In the light of the success of these methods for single image coding, and increasing interest in low rate digital video applications (videoconferencing and videotelephony), assessment of the coding of segmented video signals may be valuable. At very low rates, large movements cause blurring and blocking effects in codecs based on transform techniques. Coding methods based on segmentation offer the possibility of preserving important edges tThis work was performed while M.J. Biggar was working at Imperial College under a Study Award from Telecom Australia. He has since returned to Telecom Australia Research Laboratories, Melbourne, Australia.

of objects in a scene, and restricting distortion t o the less perceptually important details. Some investigations in this area are described in the sections to follow. A straightforward approach, in which each frame is processed independently, is described in Section 2, but it does not portray moving objects well. Limitations of the method are highlighted in Section 3. A modification that uses the history of moving objects to guide each new segmentation is described in Section 4. Considerable improvements are possible using the new technique, and further extensions of it are proposed. 2. SEGMENTATION O F THE FRAME DIFFERENCESIGNAL To exploit temporal redundancy in the video signal, the current investigations concern segmentation and coding of the frame difference signal. Image regions are represented by uniform intensity plateaux. The suitability of a piecewise constant representation of the frame difference signal, based on a simple model, is shown in [5]. Coding of the frame difference permits temporal redundancy to be exploited without excessive inherent signal delay. This is important in interactive video applications such as the examples quoted earlier. The basic block diagram of the system is shown in Figure 1. Segmentation is performed by a region merging algorithm, using the method of [6], but based on stepwise minimisation of S S E a t each merge. It is the method used for the single frame coding investigation of [4].It is recognised that S S E is not an optimal distortion measure for human visual interpretation, but its use with the region merging segmentation scheme has lead to successful segmentations without excessive complexity. Its use can also be defended on the grounds that edges are preserved in a segmented image, whereas these are often blurred by the use of a S S E criterion in, for instance, transform or DPCM coding. Segmentation is achieved by recursively calculating a similarity measure between neighbouring regions, and merging those most similar until the desired number of regions remain. 4 way connectivity of pixels is considered; i.e. diagonally neighbouring pixels are considered not to be adjacent. The similarity measure is taken t o be the change in overall S S E as a result of the merge. Regions are considered to be most similar if, when merged and replaced by their mean intensity, the change in S S E is smallest. The change in S S E as a result of merging regions i and j is [6]

1108 (382561-9/88/0000-1108$1.00 0 1988 IEEE

IN-^ I."..

]-Total

frnmol I , "I,,"

rate required

(L

Regions

I

lcontrol

~-k

I Rate I

L

Boundary coding - T"-iission (error free)

Segmentation

I Last frame N 11 frame delayk (reconstruction)

.,

I

I

Figure 1: Block diagram of basic segmented frame difference coding system.

A S S E = (pi - pjI2-

N;Nj (Ni+ Nj)

is the average intensity of region i and N;is the number in region i . In the absence of an alternative distortion , and given the success of a SSE criterion for single ding [4,5], the aim here will also be to segment the video such a way that each frame has minimum SSE. It is how that minimisation of SSE of the frame difference equivalent to minimisation of SSE of the interframe he new frame Zn+l is found at the encoder to differ from e previous frame reconstruction I, by the frame difference or ignal Z,.

mation 2, to this error signal is encoded, and an a p Zn+l to the new frame is generated at the decoder:

t+l

= in

+ 2,

(3)

Figure 2: Sample fram ure 1.

m use of the coder of Fig-

ubtraction of Equation 3 from Equation 2 gives: study, the second method is preferred [5]. of the left side is equal to the SSE of the right, from llows that minimisation of SSE of the frame difference

he segmented images are coded by a method very similar to that escribed in [4]. In that paper, an edge tracing algorithm is used to define the boundaries between image regions. Intensities are transmitted as a list; one number per region. The only difference between the coding scheme described in [4] and that used here in the handling of intersections in the edge diagram, where 3 4 regions meet. In [4] these intersections were coded by pop-

are resolved later as necessary by readdressing the intersection. ersections to line segments but, for the resolutions and

1109

n to a fixed rate channel is usually necessary in a video coding application. Here, this is achieved by control of the number of regions in the segmentation; a feature of the segmentation method is that this can be selected a priori. The rate CO orithm is based on incrementing or decrementing the nu regions according to whether the present rate is bel the target rate respectively. The complete algorith ribed in [5]. It is not claimed t o be trate that effective control

Figure 1 was implemented to obtain coding results for a sequence of 256 x 256 pixel frames. The result after processing 15 frames from an active part of the sequence is shown in Figure 2. The frame rate is 15 s-l and the average coding rate 384 kbit.

4 . MOTIONGUIDEDSEGMENTATION

The observations of the last Section have been the motivation for a change in the region merging criterion used in forming the segmentation. The measure of similarity between regions is made to vary spatially according to the last frame segmentation. A weighting function is generated from the segmentation of frame difference N , and it is used to influence the segmentation of frame difference N + 1. The modification is included in the block diagram shown in Figure 4. Segmentation is performed according to a new criterion, which is a weighted version of Equation 1. The merging function must be monotonically increasing with the weighting function, so that merging of regions near previous edges is discouraged. A linear relationship has been used in the current series of experiments.

Figure 3: Edge maps of segmentations of successive frames, showing the failure to consistently define edges of moving objects. 3 . LIMITATIONS OF THE BASICMETHOD

While giving a recognisable result, Figure 2 suffers from severe distortion. Not only does the representation of moving objects become severely degraded, but a ‘trail” is left behind them. This can be seen as the region of noise to the left of the forehead in the still in Figure 2. The trailing effect is the result of “false contouring” in the frame difference signal. Corrections to the large jumps in intensity that result from a piecewise constant approximation to a smoothly changing signal cause intensity spikes in successive frames.

similarity measure = A S S E . F ( W ) where A S S E is defined from Equation 1, and

and W ( z ,y) is the spatially dependent weight. m is a parameter to be selected; it determines the influence of the last segmentation on the present one.

One of the factors leading to this distortion is the inadequacy of the S S E criterion in the segmentation. It does not adequately reflect the sensitivity of the eye to motion. This is illustrated by the segmentation edge maps shown in Figure 3. These show the borders of regions in the segmentations of two successive frame differences. They result from coding of the same test sequence, but under slightly different conditions (128 x 128 pixel frames, 10 frames.s-’ and 96 kbit.s-’). The left boundary of the moving head is defined in one frame, but not the other. To reliably portray moving objects, it is preferable to devote one’s available resources (i.e. rate) to describing the moving objects. One solution would be to use a segmentation criterion that concentrates region borders near the edges of moving objects. This would also improve the definition of smooth dopes in the intensity profile near the edges, reducing the trailing effect.

There are many options available to choose the generator of the weighting function W(z,y). It is difficult to define tighter constraints than that the function should have a greater value “near” the boundaries in the previous segmentation. Three possibilities have been investigated in [ 5 ] : convolution of a Gaussian function with the previous segmentation edge map, a Gaussian function dependent on the distance from the nearest edge in the edge map, and a Canny edge operator [7] applied to the previous segmentation. All of these offered improvements on the unweighted case, but no clear preference between the three was established.

-

.,

Boundary coding - Transmission (error free)

I

I

I

Last frame N (reconstruction)

(5)

1 frame delayk

Figure 4: Block diagram of the modified coder, incorporating weighting of the segmentation according to previously detected motion.

1110

Australia, for permission to publish his contribution to this re search.

“Miss America”, made available by the European COST organi-

Vol. COM-25, pp. 1315-1322 (NOV.1977). Constantinides, “Nonlinear Image Composite Source Coding”, Proc. IEE, Part F,Vol. 130, No. 5, pp. 441-451 (Aug. 1983).

gure 5: Sample frame resulting from use of the modified coder. using a segmentation that is the three possibilities listed dependence on the distance s segmentation. It may be

[3] M. Kunt, A. Ikonomopoulos and M. Kocher, ‘Second-

Generation Image-Coding 73, pp. 549-574 (Apr. 1985

ues”, Proc. IEEE, Vol.

[4] M.J. Biggar,

O.J. Mor A.G. Constantinides, “Segmented-Image Coding: a Performance Comparison with the Discrete C ansform”, To be published in Proe. IEE, Part F.

[5] M.J. Biggar, “Sour

of Segmented Digital Image and Video Signals”, PhD Thesis, Imperial College, Univer-

5 . CONCLUSION

e and A.G. Constantinides, ‘Graph

IEEE Trans. Pat

ns may be expected by extension of the technique, and on of other features into the coder. In the present in, the segmented frame difference coder has been used

gions. The method presented here assumes only that object edges are ‘near” those in a previous frame. Estimates of the direction

tion portrayal. Several enhancements to the method introduced in this paper are possible, and initial results suggest that effective low rate video coding by this approach is achievable and that further investigation is warranted.

1111

d Mach. Intel., Vol. PAMI-8,