Content-Aware Image and Video Resizing Based on ... - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Jun 27, 2011 - develop efficient techniques to resize image and video contents to fit them into ... Scaling and cropping are two standard methods for resizing.
J.-S. Kim et al.: Content-Aware Image and Video Resizing Based on Frequency Domain Analysis

615

Content-Aware Image and Video Resizing Based on Frequency Domain Analysis Jun-Seong Kim, Seong-Gyun Jeong, Younghun Joo, and Chang-Su Kim, Senior Member, IEEE Abstract — An adaptive image and video resizing algorithm based on the frequency domain analysis is proposed in this work. Given an image, we first construct an importance map by combining the gradient and the saliency information. We partition the image into several strips so that each strip contains pixels of similar importance levels. We model the distortion, which is caused by scaling a strip, in the frequency domain. Then, we scale each strip adaptively to minimize the overall distortion of the whole image. Moreover, we extend the proposed algorithm for video resizing. We add the motion term to construct the importance map, and suppress excessive parameter variations to achieve jitter-free video resizing. Simulation results show that the proposed algorithm provides higher quality resizing results than conventional algorithms, although it requires lower computational complexity.1

(a)

Index Terms — Image and video retargeting, salience map, Fourier analysis, and Lagrangian multiplier technique

I. INTRODUCTION Recently, portability has become one of the essential requirements for multimedia services, because people desire to consume multimedia contents using different types of devices. For instance, image or video contents are often produced at a fixed resolution, but they are consumed using various display devices with different resolutions, e.g., from a high definition television (HDTV) to a small mobile phone with quarter video graphic array (QVGA) resolution. It is hence important to develop efficient techniques to resize image and video contents to fit them into the resolutions of various devices. Scaling and cropping are two standard methods for resizing images and videos. Scaling changes the sampling rate uniformly over an entire image. It preserves all visual elements in the image, but it causes geometric distortions when the aspect ratio is changed. Moreover, it can shrink the image too much for viewers to perceive the details. On the other hand, cropping keeps only a main part of an image, while carving out border regions. The information in the carved regions is lost completely, even though the remaining region retains its visual characteristics. Fig. 1 illustrates how these standard methods resize an image. Human visual system (HVS) tends to pay Preliminary results of this work were partly presented in [1]. This work was supported partly by Samsung Electronics, partly by Seoul R&BD Program (No. ST090818), and partly by a Korea University Grant. J.–S. Kim, S.–G. Jeong, and C.–S. Kim are with the School of Electric Engineering, Korea University, Seoul, Republic of Korea, 136-713 (e-mails: {junssi153, sg_jeong, changsukim}@korea.ac.kr). Y. Joo is with Samsung Electronics, Suwon, Republic of Korea. Contributed Paper Manuscript received 11/30/10 Current version published 06/27/11 Electronic version published 06/27/11.

(b) (c) (d) Fig. 1. An example of image retargeting. An input image in (a) is resized by (b) the scaling method, (c) the cropping method, and (d) the proposed algorithm, respectively.

attention to the red tower. Scaling shrinks the tower too much, while cropping eliminates neighboring buildings that are also meaningful parts of the image. Recently, content-aware image and video resizing techniques have been proposed [1]-[12]. These techniques, called retargeting, attempt to overcome the weaknesses of both scaling and cropping methods. Specifically, they try to preserve important regions while shrinking less important regions as illustrated in Fig. 1(d). In this work, we propose an adaptive image and video retargeting algorithm using a divide-and-conquer approach. The proposed algorithm divides an image into several strips so that each strip consists of pixels of similar importance levels, and then scales down each strip according to its complexity. We analyze the scaling distortion in the frequency domain. Then, we formulate the retargeting task as a constrained optimization problem, and solve the problem using the Lagrangian multiplier technique. Furthermore, we extend the proposed algorithm to retarget video sequences. To enforce temporal coherence, we redefine the importance map to include the motion information, and constrain the retargeting parameters not to vary excessively between adjacent frames. Experimental results s h o w t h a t t h e p r o p o s e d a l g o r i t h m a c h i e v e s mo r e reliable content-awareretargeting at lower computational complexity than the conventional techniques [4], [8].

0098 3063/11/$20.00 © 2011 IEEE

616

IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

Optimization

Input Image

Importance Map

P artitioning

Result

Fig. 2. An overview of the proposed algorithm. An importance map is computed for each frame by combining the gradient and the saliency. Then, the frame is divided into several strips and each strip is scaled adaptively by solving a constrained optimization problem.

The rest of this paper is organized as follows. Section II briefly reviews prior work. Section III proposes the image retargeting algorithm, and Section IV extends it to the video retargeting algorithm. Section V presents experimental results. Finally, Section VI concludes this work. II. PRIOR WORK Early retargeting researches focused on the thumbnail generation. Suh et al. [2] proposed a thumbnail cropping algorithm, which crops an important region based on the saliency map [13] and the face detection results [14]. Chen et al. [3] also presented a method for adapting images to mobile devices using the saliency, face detection, and text detection. Recently, Avidan and Shamir [4] proposed the seam carving algorithm, which carves out a path of pixels, called a seam, repeatedly to achieve a target size. Their algorithm employs dynamic programming to find the least noticeable seam at each repetition. Rubinstein et al. [5] added the forward energy term to find a seam, and extended the seam carving algorithm to video retargeting. However, these seam carving techniques have a limitation that they distort important objects as well, when an image is shrunken too much. Also, it is hard to find unnoticeable seams when salient objects are distributed over an entire image. To overcome the drawbacks, Rubinstein et al. [6] proposed a retargeting algorithm that combines the seam carving operator with the cropping and the scaling operators. Their algorithm provides impressive results, but requires very high computational complexity to determine the optimal combination of operators. Choi et al. [11] adopts a scaling-based image retargeting algorithm, which exploits the compressed domain information, such as discrete cosine transform (DCT) coefficients and motion vectors, to reduce computational complexity. Similarly, Nam et al. [12] proposed a video retargeting algorithm, which also uses the compressed domain information, but their algorithm is based on the seam carving instead of the scaling approach. An alternative approach to image retargeting is to use warping. Gal et al. [7] proposed a 2D texture mapping scheme, which warps an image into various shapes without distorting important regions. Wolf et al. [8] proposed a video retargeting algorithm, which constructs a warping function between input and output images based on an importance map. It forms a system of linear equations and acquires the solution using a

least squares solver. Wang et al. [9] presented a warping algorithm that represents an image as a grid mesh and deforms the image without distorting important regions. Also, Wang et al. [10] extended the algorithm in [9] to resize video sequences. In [1], we proposed an image retargeting algorithm based on Fourier analysis, and generalized it straightforwardly for video retargeting. It divides an image into strips, and scales each strip with a different sampling rate. Similar to the image case, a video sequence is divided to sub-volumes, and each sub-volume is scaled down with an adaptive sampling rate. However, the partitioning of an entire video sequence demands a large storage space as well as a heavy computational burden. Moreover, it is difficult or even impossible to divide a video sequence into appropriate subvolumes when multiple objects have diverse motions. In this work, we improve the image retargeting algorithm in [1] and extend it to a video retargeting algorithm in a more elegant and efficient way. III. IMAGE RETARGETING Fig. 2 shows an overview of the proposed algorithm. We construct an importance map to partition an input image into multiple strips. We use both gradient and saliency information to define the importance map, whereas [1] uses only the gradient information. After the partitioning, we model the scaling distortion of each strip, and resize each strip adaptively by solving a constrained optimization problem. A. Importance Map We define the importance map S ( x, y ) of an input image by combining the gradient map SG ( x, y ) and the salient map S I ( x, y ) , S ( x, y )  G SG ( x, y )  I S I ( x, y )

(1)

where G and I are weighting coefficients. In this work, we set G  I  0.5. 1) Gradient Map The gradient map is composed of the gradient magnitudes of pixels in the input image I ( x, y ) , given by 2

    S G ( x , y )  I ( x , y )   I ( x , y )  I ( x , y )  . y  x 

(2)

J.-S. Kim et al.: Content-Aware Image and Video Resizing Based on Frequency Domain Analysis

The partial derivatives are approximated using the Sobel operators. b0=0

b1

bk-1

0th strip

bk (k-1)th strip

bk+1

bK-1

kth strip

bK=WO (K-1)th strip

Fig. 3. Partitioning of an image into K strips

(a)

(b)

617

In [1], only the top 10% of gradient magnitudes are summed up to preserve the shapes of objects, which are surrounded by strong edges. However, the selective summation is not necessary in this work, since the saliency map assigns high importance levels to pixels inside objects as well as boundary pixels, as illustrated in Fig. 4(c). The proposed algorithm then divides the image into K strips. As shown in Fig. 3, let bk denote the horizontal coordinate of the leftmost column in the k th strip, where 0  k  K  1 . The image is initially divided into the strips of the same size, and then the strip boundaries are updated iteratively based on two criteria: First, each strip should consist of columns of similar complexities. Second, across a strip boundary, the left strip and the right strip should have distinct complexities. Suppose that boundaries bk 1 and bk 1 are fixed. Then, we update the middle boundary bk by bk 1 1  b1 1 bk  argmin   c( x)  mk 1   c( x)  mk   m bk 1  bbk 1  x b x b k 1  mk  k1

  (4) 

where mk 1 and mk are the averages of the column complexities in the (k  1) th and the k the strip, i.e., (c)

(d)

Fig. 4. (a) An input image, (b) the gradient map, (c) the saliency map, and (d) the importance map.

2) Saliency Map We also use the saliency map, which was proposed by Itti et al. [13]. They computed a saliency map by aggregating intensity, color, and orientation features at difference scales. In this work, we use only intensity and color features to construct the saliency map. This is because our experiments confirmed that the exclusion of orientation features reduces the computational complexity significantly without degrading the retargeting performance. Moreover, to reduce the complexity further, we use simple box filters instead of Gaussian filters to generate images at different scales. Fig. 4 shows an example of the importance map. The gradient map in Fig. 4(b) assigns large values to object boundaries, while the saliency map in Fig. 4(c) detects the pagoda, the tree and the background mountain as salient regions. By combining these data, we obtain the importance map in Fig. 4(d). B. Partitioning Based on the importance map, we partition the image into several strips. We consider the case that the image is resized in the horizontal direction only for the sake of simplicity, but vertical resizing can be performed similarly. Suppose that the input image of size WI  H I is resized to the output image of size WO  H O , where WO  WI and H O  H I . We first obtain the column complexity, which is the sum of importance levels of all pixels in each column, given by c ( x )   S ( x, y ) . y

(3)

mk 1

 

b 1 x  bk 1

c( x)

b  bk 1

and mk

 

bk 1 1 x b

c( x)

bk 1  b

.

(5)

The first two terms in (4) imply the variations of column complexities within the (k  1) th and k th strips, respectively, and the third term is the inverse of the complexity difference between the two strips. As a result, the boundary is updated so that the complexity variation within each strip is low, whereas the complexity difference across the strips is high. In (4),  is a margin parameter to impose the minimum separation between two strip boundaries, and  is a weighting parameter. In this work,  is set to 5, and  is set to 0.05. After updating bk , we update the next boundary bk 1 using bk and bk  2 . We repeat this procedure until all boundaries converge. The third image in Fig. 2 shows a partitioning result. Each strip is shown in a different color, and the height of each column depicts the column complexity. We see that each strip is composed of columns of similar complexities. C. Adaptive Scaling 1) Scaling Distortion After the partitioning, the k th strip has width lk  bk 1  bk as shown in Fig. 3. In other words, the k th strip consists of lk columns. Let us suppose that we downsample the strip horizontally to remove rk columns. Then, the sampling rate for the k th strip is given by 1  rk lk . In the downsampling, the signal is pre-filtered to avoid aliasing artifacts. More specifically, a low-pass filter with cutoff frequency C  (1  rk lk ) is applied to the signal. Thus, the downsampling process incurs the loss of high frequency components outside the cutoff frequency. We refer to the loss as the scaling distortion. The scaling distortion of

618

IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

the k th strip can be computed in the frequency domain as follows. TABLE I THE WIDTH lk, THE COMPLEXITY mk, THE NUMBER OF REDUCED COLUMNS rk, AND THE SAMPLING RATE (S.R.) FOR EACH STRIP OF THE “SUMMERHOUSE” IMAGE IN FIG. 2. k

0

1

2

3

4

5

6

7

8

9

lk

34

80

104

46

35

36

93

20

60

92

mk rk

strips (k  5, 6) have lower complexities than the others. Therefore, they are downsampled with lower sampling rates. bk t-1

0.957 1.282 1.401 0.775 0.497 0.377 0.407 0.927 1.179 1.362 14

30

39

26

24

28

76

7

22

34

S.R. 0.588 0.625 0.625 0.435 0.314 0.222 0.183 0.65 0.633 0.63

t

0th

(k-1)th

strip

strip

0th strip

(k-1)th strip

kth strip

kth strip

(K-1)th strip

(K-1)th strip

Fig. 5. Updating of boundaries.

d k  2



2

r (1 k ) lk

Z (e j ) d 

(6)

where Z (e j ) denotes the frequency spectrum of the signal. Since a typical image signal is a low-pass signal, Z (e j ) decreases quickly as  increases. By approximating the magnitude of the frequency spectrum to an exponential function, we derive the scaling distortion of the k th strip [1], which is given by d k (rk )  2 



(1

rk ) lk

e



2 mk

d   mk e



2 mk

(e

2 rk mk lk

 1).

(7)

2) Optimization Let R  WI  WO denote the number of columns to be removed from the input image, where WI and WO are the input width and the output width, respectively. Since rk denotes the number of reduced columns from the k th strip, we have the constraint

r

k

 R.

(8)

k

Then, the objective is to minimize the sum of the scaling distortions d k ’s subject to this constraint. We can solve the constrained optimization problem by minimizing the Lagrangian cost function 2 rk 2    mk J   d k (rk )    rk    mk e (e mk lk  1)   rk    k k k  

(9)

where  is a Lagrangian multiplier. By setting the partial derivative rJk to 0 and clipping rk to the range [0, lk ] , we have m    rk  max 0, min{lk , lk (1  k log( lk ))} . 2 2  

IV. VIDEO RETARGETING

We extend the image retargeting algorithm to the video case. The image retargeting algorithm can be applied to each frame of a video sequence independently. But this simple approach may cause severe artifacts in the output sequence by breaking the temporal coherence. The proposed algorithm uses two variables for each strip: the strip width lk and the number of reduced columns rk . If these two variables change abruptly between frames, the output sequence tends to have jitter artifacts. In this section, we propose an algorithm to control those variables systematically to achieve reliable video retargeting. Note that the variables for the first frame are determined by the image retargeting algorithm in the last section. A. Importance Map with Motion Term Since HVS is sensitive to object motions as well as object shapes, we redefine the importance map to include the motion term, given by S ( x, y )  G SG ( x, y )  I S I ( x, y )  M S M ( x, y )

(11)

where S M ( x, y ) is the motion term and M is the weighting parameter. We set G  I  M  13 . To consider the actual object motions only, the global motion, which is caused by a camera motion, should be removed. Thus, the motion term is defined as S M ( x, y )  v Lx , y  v Gx , y

(12)

where v Lx, y and v Gx, y denote the local and the global motion vectors of pixel ( x, y ) . After dividing a frame into blocks of size 8  8 , we determine the local motion vectors using the block matching algorithm. To reduce the amount of computations, we simply set the global motion vector to be the most frequent one among the local motion vectors.

(10)

Since each rk in (10) is a monotonic decreasing function of  , we can use the bisection method [16] to determine  that satisfies the constraint in (8). TABLE I lists the width lk , the complexity mk , the number of reduced columns rk , and the sampling rate for each strip of the “Summerhouse” image in Fig. 2. Note that the middle

B. Updating Strip Boundaries The partitioning algorithm in Section III.B decides strip boundaries bk ’s, or equivalently strip widths lk ’s, sensitively to column complexities. This is acceptable in the image retargeting. However, if it is applied to each frame of a video sequence, neighboring frames can be retargeted with dissimilar partitioning structures and the resultant video can be degraded by jitters.

J.-S. Kim et al.: Content-Aware Image and Video Resizing Based on Frequency Domain Analysis lkt 1

0th strip

(K-1)th strip

rkt 1

lkt

0th strip

(K-1)th strip

619

executed when a strip becomes too large or too small, respectively. We set the division threshold to be the quarter of the input image width, and the merge threshold to be five pixels. C. Updating Sampling Rates After updating the strip boundaries, we also update the sampling rate, or equivalently the number of reduced columns, for each strip. Let rkt denote the number of reduced columns

lkt  lkt 1 0th strip

(K-1)th strip rkt 1 t t 1   lk  lk  lkt 1

Fig. 6. Compensation of the number of reduced columns. The top is the result at the previous time instance, the middle is the result at the current time instance without compensation, and the bottom is the result with compensation.

Therefore, we propose a motion-based method to update bk . Let bkt denote the boundary of the k th strip at time instance t . Then, given the partitioning result at the previous instance t  1 , we update the boundary bkt using the motion vectors of (k  1) th and k th strips, as shown in Fig. 5. Specifically, let vk 1 and vk denote the motion vectors of these strips, which are depicted by the blue and the red arrows in Fig. 5. Then, the boundary bkt can move to a position within the green range in Fig. 5. Thus, we update the boundary by adding the weighted sum of the two motion vectors as follows. bkt  bkt 1  (1   ) vk 1   vk

where  is a weighting factor given by  

(13) mk mk 1  mk

. Note that

mk is the complexity of the k th strip, and thus the update rule assigns a bigger weight to the motion vector of the more important strip. We also implement the division step and the merge step to handle the cases when new objects come into video frames or existing objects go out of frames. Fig. 7(a) illustrates these cases. A bird and a person move to the right direction over three consecutive frames. Blue lines depict strip boundaries. The first and the second frames are partitioned well so that each strip includes an object properly. However, additional unimportant columns come into the leftmost strip of the third frame, reducing the strip complexity. As a result, the strip may be shrunken too much. Moreover, in the third frame, the rightmost strip becomes too narrow and its separate processing becomes unnecessary. We overcome these issues by dividing and merging strips as shown in Fig. 7(b). We divide an overly grown strip using the partitioning algorithm in Section III.B. The red line in Fig. 7(b) illustrates a new boundary generated by the division step. Also, the rightmost strip is merged with its left strip. This division or merge step is

(a) (b) Fig. 7. (a) As objects move, partitions can become too large or too small. (b) These cases are handled by the division and the merge steps.

from the k th strip at time instance t . It is updated by rkt   rkt  (1   ) rkt 1 

rkt 1   lkt  lkt 1  , lkt 1

(14)

where rkt is the intra result obtained using the column complexities via (10), and rkt 1 is the result at the previous instance t  1 . Thus, the two terms in (14) correspond to the weighted sum of the current result and the previous result. If the strip contains important objects, it is better to preserve the previous result to maintain the temporal coherence. Thus, we set the weight parameter   e 2 mk . As the strip complexity mk gets higher, the previous result rkt 1 is weighted by a bigger weight. The last term in (14) compensates the change in the strip width for maintaining the sampling rate. As shown in Fig. 6, suppose that the strip width increases from lkt 1 to lkt . In such a case, even if the number of reduced columns is kept the same, i.e., rkt  rkt 1 , the sampling rate is reduced and jitter may occur. This can be prevented by increasing the number of reduced columns by the last term in (14), which is illustrated as black area in Fig. 6. Thus, by jointly controlling lk and rk , the proposed algorithm can provide jitter-free and temporally coherent retargeting results. V. EXPERIMENTAL RESULTS

First, we compare the performance of the proposed image retargeting algorithm with those of the standard scaling and the seam carving [4] in Fig. 8. The proposed algorithm preserves important regions more faithfully than the scaling. Moreover, the proposed algorithm does not distort object shapes unlike the seam carving, since the proposed algorithm downsamples important strips as well as less important ones when the target width becomes too narrow. For example, in Fig. 8, the seam carving distorts the pillars of the

620

IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

Fig. 8. Comparison of the proposed image retargeting algorithm with the standard scaling and the seam carving. The first column shows the input images. The second, the third and the last columns are obtained by the scaling, the seam carving, and the the proposed algorithm, respectively.

summerhouse in the first row, the silhouette of the boy in the second row, and the circular patterns and the flowerpot in the third row, whereas the proposed algorithm renders those objects more accurately. Next, we resize video sequences, Football, Soccer, and Coastguard. Fig. 9 shows the resized results of the Football 52nd frame, the Soccer 70th frame, and the Coastguard 17th frame. For comparison, we also provide the results of the standard scaling, the Wolf et al.’s algorithm [8], and the Nam et al.’s algorithm [12]. It is observed that the proposed algorithm preserves the shapes and the aspect ratios of the players and the ships more accurately than the standard scaling. The Wolf et al.’s algorithm causes artifacts in the player’s leg in Soccer and the boundary of Coastguard, since the optimization process may fail to satisfy the spatial constraints. The Nam et al.’s algorithm also distorts objects, such as the legs in Football and the woman in Soccer, since it inherits the disadvantages of the seam carving algorithm. On the contrary, no visible artifacts occur in the proposed algorithm. Fig. 10 illustrates how the proposed video retargeting algorithm preserves temporal coherence. Fig. 10(a) shows four successive frames of the original Coastguard sequence, in which three lines are overlaid to depict that the ship is in

the same relative position in all the frames. Fig. 10(b) is the retargeting results obtained by applying the image retargeting algorithm to each frame. We see that, although each frame is retargeted effectively, the ship moves back and forth, causing annoying jitter artifacts. On the other hand, as shown in Fig. 10(c), the proposed video retargeting algorithm provides jitter-free results. We compare the computational complexity of the proposed algorithm with those of the conventional algorithms. A personal computer with a 2.53 GHz dual core CPU is employed in this test. TABLE II lists the times for resizing the images in Fig. 8. The original resolution is 600  400, and the retargeted resolution is 300  400. The proposed algorithm resizes the images about eight times faster than the seam carving. Whereas the seam carving algorithm demands heavy computations to determine the seams based on dynamic programming, the proposed algorithm efficiently resizes the images using the standard downsampling operation. TABLE III compares the average times for retargeting the video sequences in Fig. 9. The original resolution 352  288 is reduced to the output resolution 288  288 . Both the Wolf et al.’s algorithm and the proposed algorithm require the

J.-S. Kim et al.: Content-Aware Image and Video Resizing Based on Frequency Domain Analysis

621

Fig. 9. Video clips from the Football, Soccer, and Coastguard sequences are resized. The first column shows the original frames. The second, the third, the forth and the last columns are obtained by the scaling, the Wolf et al.’s algorithm [8], the Nam et al.’s algorithm [12] and the proposed algorithm, respectively.

(a)

(b)

(c)

Fig. 10. Retargeting of four successive frames of the Coastguard sequence: (a) the original frames, (b) the resized frames obtained by applying the image retargeting algorithm to each frame independently, and (c) the resized frames by the proposed video retargeting algorithm.

computation of the importance map based on the saliency and the motion. However, whereas the Wolf et al.’s algorithm needs to solve a large system of linear equations, the proposed

algorithm determines the strip boundaries and the sampling rates efficiently with only a modest amount of computations. Therefore, the complexity of the proposed algorithm is about

622

IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011 TABLE II COMPUTATIONAL COMPLEXITY OF THE IMAGE RETARGETING ALGORITHMS Image

Seam Carving [4]

Proposed

6.652

0.841

Sunset

6.536

0.882

Flowerpot

6.621

0.862

TABLE III COMPUTATIONAL COMPLEXITY OF THE VIDEO RETARGETING ALGORITHMS Runtime (s/frame) Wolf et al.’s [8] Nam et al.’s [12] 1.556

0.081

0.791

Soccer

1.534

0.079

0.790

Coastguard

1.509

0.082

0.768

VI. CONCLUSION

We proposed a content-aware image and video retargeting algorithm based on the frequency domain analysis. The proposed image retargeting algorithm first divides an input image into several strips. Then, the proposed algorithm solves a constrained optimization problem to determine the sampling rate of each strip, so that the overall distortion in the whole image is minimized. Moreover, we extended the image retargeting algorithm to the video case. The video retargeting algorithm controls the strip widths and the sampling rates systematically to maintain the temporal coherence. Simulation results confirmed that the proposed algorithm provides fine retargeting performance without jitter artifacts. REFERENCES

[2] [3] [4] [5] [6] [7]

[10] [11] [12] [13]

Proposed

Football

twice lower than that of the Wolf et al.’s algorithm. On the other hand, the Nam et al.’s algorithm requires shorter runtimes, since it uses the compressed domain information to determine the importance map in a fast way. However, their algorithm is developed for MPEG-2 compressed video signals only and cannot be used to retarget general video sequences. Moreover, since their algorithm focuses on fast retargeting, it provides less faithful results than the proposed algorithm as shown in Fig. 9.

[1]

[9]

Runtime (s)

Summerhouse

Video

[8]

J.–S. Kim, J.–H. Kim, and C.–S. Kim, “Adaptive image and video retargeting technique based on Fourier analysis,” in Proc. IEEE CVPR, June 2009, pp. 1730–1737. B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs, “Automatic thumbnail cropping and its effectiveness,” in Proc. ACM Symp. User Interface Software and Technology, 2003, pp. 95–104. L. Q. Chen, X. Xie, X. Fan, W. Y. Ma, and H. Q. Zhou, “A visual attention model for adapting images on small displays,” ACM Multimedia Systems J., vol. 9, no. 4, pp. 353–364, 2003. S. Avidan and A. Shamir, “Seam carving for content-aware image resizing,” ACM Trans. Graph., vol. 26, no. 3, article 10, 2007. M. Rubinstein, A. Shamir, and S. Avidan, “Improved seam carving for video retargeting,” ACM Trans. Graph., vol. 27, no. 3, article 16, 2008. M. Rubinstein, A. Shamir, and S. Avidan, “Multi-operator media retargeting,” ACM Trans. Graph., vol. 28, no. 3, article 23, 2009. R. Gal, O. Sorkine, and D. Cohen-Or, “Feature-aware texturing,” in Proc. Eurographics Symp. Rendering, 2006, pp. 297–303.

[14] [15] [16]

L. Wolf, M. Guttman, and D. Cohen-Or, “Non-homogeneous contentdriven video-retargeting,” in Proc. IEEE ICCV, Oct. 2007, pp. 1–6. Y.–S. Wang, C.–L. Tai, O. Sorkine, and T.–Y. Lee, “Optimized scaleand-stretch for image resizing,” ACM Trans. Graph., vol. 27, no. 5, article 118, 2008. Y.–S. Wang, H. Fu, O. Sorkine, T.–Y. Lee, and H.–P. Seidel, “Motionaware temporal coherence for video retargeting,” ACM Trans. Graph., vol. 29, no. 5, article 127, 2009. K.-S. Choi and S.-J. Ko, “Fast content-aware image resizing scheme in the compressed domain,” IEEE Trans. Consum. Electron., vol. 55, no. 3, pp. 1514–1521, Aug. 2009. H.-M. Nam, K.-Y. Byun, J.-Y. Jeong, K.-S. Choi, and S.-J. Ko, “Low complexity content-aware video retargeting for mobile devices,” IEEE Trans. Consum. Electron., vol. 56, no. 1, pp. 182–189, Feb. 2010. L. Itti, C. Koch, and E. Niebur. “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intelli., vol. 20, no. 11, pp. 1254–1259, Nov. 1998. M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 1, pp. 34–58, Jan. 2002. P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. IEEE CVPR, Dec. 2001, pp. 511–518. Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, no 9, Sept. 1988. BIOGRAPHIES Jun-Seong Kim received the B.S degree in electrical engineering from Korea University in 2006. He has been with the Media Communications Laboratory in Korea University from 2006. Currently, he is in the joint M.S. and Ph.D. course in Korea University, Seoul, Korea. His research interests include computer vision and multimedia communications. Seong-Gyun Jeong received the B.S degree in electrical engineering from Korea University in 2010. He entered the Media Communications Laboratory in the School of Electrical Engineering of Korea University in 2010. Currently, he is in the M.S. course in Korea University, Seoul, Korea. His research interests include computer vision and computer graphics. Younghun Joo received the B.S and M.S degrees in physics from Sogang University in 1994 and 1997, respectively. Since 1998, he has been with Samsung Electronics, Suwon, Korea, where he is working on the development of multimedia solutions for mobile devices. His research interests include video data compression and multimedia communications.

Chang-Su Kim (S’95-M’01-SM’05) received the B.S. and M.S. degrees in control and instrumentation engineering from Seoul National University (SNU) in 1994 and 1996, respectively. In 2000, he received the Ph.D. degree in electrical engineering from SNU with a Distinguished Dissertation Award. From 2000 to 2001, he was a Visiting Scholar with the Signal and Image Processing Institute, University of Southern California, Los Angeles. From 2001 to 2003, he coordinated the 3D Data Compression Group in National Research Laboratory for 3D Visual Information Processing in SNU. From 2003 and 2005, he was an Assistant Professor in the Department of Information Engineering, Chinese University of Hong Kong. Since 2005, he has been with the School of Electrical Engineering, Korea University, where he is now an Associate Professor. His research topics include video and 3-D graphics processing and multimedia communications. He has published more than 150 technical papers in international journals and conferences.