LOW-COMPLEXITY SCALABLE DCT IMAGE ... - Semantic Scholar

LOW-COMPLEXITY SCALABLE DCT IMAGE COMPRESSION Ren´e J. van der Vleuten

Richard P. Kleihorsty

Christian Hentschelz

Philips Research Laboratories, Eindhoven, The Netherlands ABSTRACT We have developed a scalable image compression scheme with a good performance-complexity trade-off. Like JPEG, it is based on the 8 × 8 block discrete cosine transform (DCT), but it uses no additional quantization or entropy coding (such as Huffman or arithmetic coding). Bitrate or quality scalability is enabled by encoding the DCT coefficients bit plane by bit plane, starting at the most significant plane. The individual bit planes are efficiently encoded using simple rectangular zones. Our method offers about the same compression performance as JPEG, but at a lower complexity and with the additional feature of scaling the bit rate by simply truncating the generated bit string.

After the elimination of trees the complexity of the compression schemes is determined to a large extent by the entropy coding that is used. In [5], arithmetic coding is applied, whereas [7] uses adaptive Rice coding. We propose to take the complexity reduction one step further by using a DCT-based scheme without entropy coding. This results in a scalable compression scheme with a very competitive complexity-performance trade-off. In particular, we will show in this paper that we can achieve the performance of the baseline JPEG scheme [8], at a lower complexity, while adding the scalability feature. The new scalable compression scheme is explained in Section 2 and its performance is evaluated in Section 3. Section 4 concludes the paper.

1. INTRODUCTION

2. THE COMPRESSION SCHEME

The goal of scalable compression methods is to generate a bit string that can be truncated at any desired point, while maintaining the best possible quality (e.g. peak signal-tonoise ratio, PSNR) for the selected bit rate. The availability of such a scalable bit string considerably simplifies the system design by practically eliminating the need for a buffer control method when fitting the data to a certain given bit rate or memory size. In particular, the same single bit string simultaneously serves different channels with different bit rates, without the need to re-encode the original data. Thus, real-time adaptation to varying channel capacities (with application to the Internet or wireless communication channels) is very much simplified. The disadvantage of the well known scalable methods of [1, 2] is their complexity. It turns out, however, that complexity reductions are possible without major losses in performance. For example, the methods of [3, 4, 5] are based on the DCT instead of the wavelet transform, which reduces the complexity of the transform at the cost of a PSNR reduction of 0.6–1 dB [6]. A further complexity reduction for DCT-based scalable compression was achieved in [5], by not making use of trees (similarly, scalable wavelet transform coding without the use of trees was proposed in [7]).

As usual for DCT-based image compression, the image is divided into blocks of 8 × 8 pixels, which are independently transformed. The DCT coefficients are then represented in a signed magnitude binary form. To encode a whole image, we first encode the most significant magnitude bit plane for all DCT blocks. Then, the next lower bit plane is encoded for each block in the image. This is repeated until the least significant bit plane has been encoded. In this way, a truncatable bit string is generated, since the most significant information occurs first in the string, followed by subsequent refinement information. In order to achieve compression, we must efficiently encode the DCT coefficient bit planes. To do so, we distinguish between two classes of coefficients, namely significant and insignificant coefficients. During the encoding of a certain bit plane, a significant coefficient is a coefficient whose magnitude had a one in any of the higher bit planes (which have already been encoded). An insignificant coefficient is a coefficient with all zeroes on the higher bit planes. We distinguish between these two classes because they have very different probability density functions. A bit in the current bit plane of a significant coefficient has about equal probability of being a zero or a one. Therefore, there is not much to be gained by trying to more efficiently encode it. A bit in an insignificant coefficient, however, is

[email protected] y [email protected] z [email protected]

very likely to be a zero. Furthermore, because of the properties of the DCT (and typical images), the significant and insignificant coefficients tend to be clustered. This enables us to efficiently encode many “insignificant zeroes” by a zonal coding technique. Specifically, our encoding is based on the observation that the coefficients with larger magnitude tend to be those with the lower horizontal or vertical frequencies. Because of this, a zig-zag scan order is used for JPEG [8], for example. The zig-zag scan order is signal independent and assumes the data is concentrated in the upper left triangular zone of the transformed block. Although this assumption is true on average, we have found that the data for individual DCT blocks often has a bias for either the horizontal or vertical direction. Therefore, a signal-dependent rectangular scan zone, also emanating from the upper left corner, produces a more efficient encoding of the coefficients [9]. We now give a detailed description of our scalable coding method. The DC coefficients (the coefficients in the upper left corner of each DCT block) are processed separately from the AC coefficients (all coefficients except the DC coefficient). Specifically, the DC coefficients from all blocks are collected and put into the bit string before any AC coefficient data. There are several reasons for doing so. Our experiments have shown that completely outputting the DC coefficients in the beginning results in a higher performance at low bit rates than outputting them bit plane by bit plane (together with the AC coefficients). Furthermore, the thumbnail “DC-coefficient image” can easily be extracted from the string for quickly scanning the image contents. Finally, the “DC-coefficient image” can be efficiently compressed using differential encoding (like in JPEG [8], e.g.) or even DCT coding [6]. In the compression scheme used in this paper, however, the DC coefficients are directly output to the string without further compression. Figure 1 illustrates the procedure that is used to encode the AC coefficients. Initially, all AC coefficients are marked as insignificant. We then begin encoding the subsequent bit planes, starting at the most significant plane. First, for each significant coefficient the bit in the current plane is put in the string. When there are no significant coefficients (which is initially the case, since all coefficients are insignificant at the start of the procedure) no bits are put in the string. The bits for the significant coefficients are followed by a bit indicating whether there are any insignificant coefficients that become significant at the current bit plane. If there are, the positions and signs of those newly significant coefficients are transmitted. We repeat this procedure until all bit planes have been output to the string. To encode the newly significant coefficient positions in the string, we first output the dimensions of the rectangular scanning zone enclosing all the newly significant coefficients, using three bits each for the highest row (RMAX)

CMAX 1 0 0 1 0 RMAX 0 0 0

1 0 0 0 0 0 0 0

0 1 0 1 0 0 0 0

0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Fig. 2. Example where RMAX=3 and CMAX=4. and column (CMAX) of the rectangle, as shown in Figure 2. Then we output a single bit for each insignificant coefficient inside the rectangle to indicate whether it becomes significant. If it has, we also output its sign bit. Since we do not have to output position and sign bits for the coefficients that were already significant, the position encoding is very efficient. 3. PERFORMANCE We compare the performance of our new compression scheme to that of the baseline JPEG standard compression [8], as implemented by the Independent JPEG Group (release 6b). Figure 3 shows the operational ratedistortion curves for the two schemes for the Lena image. No perceptual weighting was used for both schemes, since this produces higher PSNRs than using non-flat perceptually weighted quantization. For rates around 0.5 bit per pixel, our method is slightly less efficient than JPEG. This is probably because we do not do any compression of the DC coefficients, whereas JPEG does quantization, differential coding, and entropy coding on these coefficients. Then, in the range of 1 to 2.5 bit per pixel, the performances are virtually the same. For higher rates, there is again a small performance difference. However, at these rates the decoded image is indistinguishable from the original, so there is no perceivable performance difference between the two schemes. Typically, the edge of visibility of compression artefacts is about 40–45 dB PSNR. Interestingly, for (near-)lossless compression at approximately 5 bit per pixel, our scheme catches up again with JPEG. After examining the objective PSNR measure, we also evaluated the subjective perceptual image quality and found that this depends on the order in which the DCT blocks are processed. Initially, we processed the DCT blocks in the usual scan order (i.e., left to right and top to bottom). However, at lower rates this gives rise to annoying artefacts,

0

no newly significant coefficients found for this bitplane

Bits for significant coefficients

1

RMAX

CMAX

Positions and sign bits for the newly significant coefficients

repeat for each bitplane Fig. 1. Visualization of the coding technique.

60

55

PSNR [dB]

50

45

40

35 New JPEG

30

25 0

0.5

1

1.5

2 2.5 3 Rate [bit per pixel]

3.5

4

4.5

Fig. 3. Performances of the new method and baseline JPEG for the gray-scale Lena image.

5

Scalable module 1-D DCT DCT mem. 64×3 shift reg. Miscellaneous

Total

Area [mm2] 0.40 0.12 0.06 0.11

0.69

JPEG module 1-D DCT DCT mem. ZZ scan mem. RLE, VLE, and Huffman table Quantizer Control Total

Area [mm2] 0.40 0.12 0.10 0.26 0.35 0.20 1.43

Table 1. Silicon areas of the scalable and baseline JPEG encoders for a 0.35µ CMOS process.

4. CONCLUSION We have developed a new scalable DCT-based image compression method. It efficiently encodes the DCT coefficient magnitude bit planes by an adaptive rectangular zoning technique. The decoded image quality is improved by adapting the DCT block scanning order to the image content. The method has about the same compression performance as the non-scalable baseline JPEG and a lower complexity, since it does not use complex entropy coding or quantization. 5. REFERENCES [1] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Processing, vol. 41, pp. 3445–3462, Dec. 1993.

because of a clearly visible quality difference between the blocks “above” and “below” the truncation point. This can be understood by the fact that we add a complete bit plane at a time to each block; when the bit string is truncated, the blocks scanned before the truncation point will have received one bit plane more than the other blocks. Therefore we investigated different, non-linear, block scan orders and found that first scanning the DCT blocks in the middle of the image and “spiraling” out towards the edges, results in a higher perceived quality for many images. In particular for those images which have a single object located in the center of the image. We also found that the perceived quality can be further improved by adapting the scan order to each individual image. The adaptation we use is based on the fact that as the bit rate is lowered blocking artefacts first become visible in low-contrast/low-texture areas. As a contrast measure, we use the total number of significant coefficients for a block. This information can easily be computed from the bit stream by the decoder, so no additional bits have to be sent (i.e., the method is backward adaptive). In our implementation, both encoder and decoder re-adjust their scanning orders, according to the block contrast, at the start of each new coefficient bit plane. With the adaptive scanning order, the perceptual image quality of our scheme is approximately equal to that of JPEG. Since the performance of our scalable coding method is about the same as that of baseline JPEG, we compared its complexity with that of baseline JPEG as well [10]. Some hardware area figures for the most important blocks as well as the total area of the implementation are presented in Table 1. It can be seen that the total silicon area of the scalable DCT block coder is 0.7 mm 2 in a 0.35µ CMOS process. This is half the area of a corresponding JPEG encoder with the same throughput, as can be observed from Table 1.

[2] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits, Syst. Video Techn., vol. 6, pp. 243–250, June 1996. [3] Z. Xiong, O. G. Guleryuz, and M. T. Orchard, “A DCT-based embedded image coder,” IEEE Signal Processing Letters, vol. 3, pp. 289–290, Nov. 1996. [4] Yeong-An Jeong and Cha-Keon Cheong, “A DCT-based embedded image coder using wavelet structure of DCT for very low bit rate video codec,” IEEE Trans. Cons. Electron., vol. 44, pp. 500–508, Aug. 1998. [5] D. Nister and C. Christopoulos, “An embedded DCT-based still image coding algorithm,” IEEE Signal Processing Letters, vol. 5, pp. 135–137, June 1998. [6] Z. Xiong, K. Ramchandran, M. T. Orchard, and Ya-Qin Zhang, “A comparative study of DCT- and wavelet-based image coding,” IEEE Trans. Circuits, Syst. Video Techn., vol. 9, pp. 692–695, Aug. 1999. [7] H. S. Malvar, “Fast progressive wavelet coding,” in Data Compression Conference (DCC ’99), Snowbird, UT, Mar. 29–31, 1999, pp. 336–343. [8] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, New York, NY, 1993. [9] R. P. Kleihorst and R. J. van der Vleuten, “DCT-domain embedded memory compression for hybrid video coders,” J. VLSI Signal Processing, vol. 24, pp. 31–41, Feb. 2000. [10] R. J. van der Vleuten and R. P. Kleihorst, “Low-complexity scalable image compression,” in Data Compression Conference (DCC 2000), Snowbird, UT, Mar. 28–30, 2000, pp. 23–32.