Lossless and near-lossless compression of line-drawing ... - CiteSeerX

UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Report Series A

Lossless and near-lossless compression of line-drawing images using Hough transform Pasi Fränti, Eugene I. Ageenko, Heikki Kälviäinen and Saku Kukkonen Report A-1998-6

ACM I.4.2 UDK 681.3.06 ISSN 0789-7316 ISBN 951-708-652-0

Lossless and near-lossless compression of line-drawing images using Hough transform Pasi Fränti Eugene I. Ageenko Department of Computer Science, University of Joensuu P.O. Box 111, FIN-80101 Joensuu, FINLAND Email: franti,[email protected]

Heikki Kälviäinen Saku Kukkonen Department of Information Technology, Lappeenranta University of Technology P.O. Box 20, FIN-53851 Lappeenranta, FINLAND Email: Heikki.Kalviainen,[email protected]

Abstract: Two novel methods for compressing bi-level line-drawing images are proposed. The main idea is the use of the Hough transform (HT). In the first phase, HT is applied for extracting line segments from the image. In the second phase, the original image is compressed by a JBIG-based method. The first variant (lossless method) uses reconstructed feature image for improving the prediction accuracy in the local context model. The compressed file consists of the extracted line features and the compressed raster image. The second variant (near-lossless method) applies feature-dependent filtering for removing noise from the original image. It improves the image quality and therefore results in better compression performance. Key words: image compression, document images, engineering drawings, Hough transform.

1. Introduction Lossless compression of bi-level images has been well studied in the literature and several standards already exist [1]. In the baseline JBIG the image is coded pixel by pixel in scan raster order using context-based probability model and arithmetic coding [2]. The combination of already coded neighboring pixels defines the context. In each context the probability distribution of the black and white pixels are adaptively determined. The current pixel is then coded by QM-coder [3], the binary arithmetic coder adopted in JBIG.

The baseline JBIG achieves compression ratios from 10 to 50 for typical A4-size images. The pixelwise dependencies are well utilized and there is not much room for improvement. Remarkable improvement has been achieved only by specializing to some known image types and exploiting global dependencies. For example, the methods in [4, 5] include pattern matching technique to extract symbols from text images. The compressed file consists of bitmaps of the library symbols coded by a JBIG-style compressor, location of the extracted marks as offsets, and a pixelwise coding of the matched symbols using two-layer context template. Here we study similar approach by utilizing global dependencies in line-drawing images such as engineering drawings, cartographic maps, architectural and urban plans, schemes, and circuits (radio electrical and topological). This kind of images consist mainly of straight-line elements. Global information can be gathered by extracting line features from the image. We propose two novel compression methods based on the Hough transform (HT) [6, 7]. The methods consist of two separate phases: In the first phase (feature extraction), the Hough transform is applied for extracting line segments from the image. In the second phase (compression), the original image is compressed by a pixelwise context model based on the baseline JBIG. The methods differ from each other in the way the extracted line features are utilized in the compression. The first variant (lossless method) uses reconstructed feature image for improving the prediction accuracy in the local context model. The compressed file consists of the extracted line features and the compressed raster image. Better compression performance is achieved if the improvement outweighs the overhead required by the line segments. The reconstructed feature image may also be used as a lossy approximation of the original image. The entire compression method, however, is lossless since the corresponding input and output images are identical. The second variant (near-lossless method) applies feature-dependent filtering for removing noise from the original image. The filtering smoothes edges along the detected line elements. This improves the image quality and gives better compression performance. The feature file is used only in the compression phase and therefore it is not stored in the compressed file. The method is near-lossless because the amount of changes is controlled – only isolated noise pixels are reversed. Moreover, undetected objects (such as text characters) are untouched allowing their lossless reconstruction. The rest of the paper is organized as follows. The Hough transform and its application for feature extraction are studied in Section 2. The lossless variant of the new compression method is introduced in Section 3, and the near-lossless variant in Section 4. Simulation results for a set of test images appear in Section 5. Conclusions are drawn in Section 6.

2. Feature extraction using Hough transform The feature extraction procedure is summarized in Fig. 1. The motivation is to find rigid fixed length straight lines in the image. The extracted line segments are represented as their end-points. A feature image is then reconstructed from the line segments and it is utilized in the compression phase. In the lossless variant, the extracted line segments are also stored in the compressed file.

,QSXW ,PDJH FEATURE EXTRACTION Hough Transform Line parameters End-point detection Line segments

Reconstruction

Encoding

OPTIONAL

)HDWXUH ,PDJH

)HDWXUH )LOH

Fig. 1. Block diagram of the feature extraction.

2.1. Hough transform The lines are first detected by the Hough transform (HT) as follows: 1. 2. 3. 4. 5.

Create a set of coordinates from the black pixels in the image. Transform each coordinate (x, y) into a parameterized curve in the parameter space. Increment the cells in the parameter space determined by the parametric curve. Detect local maxima in the accumulator array. Each local maximum may correspond to a parametric curve in the image space. Extract the curve segments using the knowledge of the maximum positions.

The parameter space is a k × k accumulator array where k can be tuned according to the image size, e.g. k = the size of the image. Usually, the (ρ , θ ) parameterization is used but the (a, b) one, corresponding to \ = D ⋅ [ + E , is also possible. The accumulation matrix is quantized with equal intervals.

2.2. End-point detection The Hough transform is capable to determine the location of a line (as a linear function) but it cannot resolve the end-points of the line. In fact, HT does not even guarantee that there exists any finite length line in the image but it only indicates that the pixels (x, y) along \ = D ⋅ [ + E may represent a line. The existence of a line segment must therefore be verified. The verification is performed by scanning the pixels along the line and checking whether they meet certain criteria. We use the scanning width, the minimum number of pixels, and the maximum gap between pixels in a line as the selection criteria. If predefined threshold values are met, a line segment is detected and its endpoints are stored for later use. 2.3. Reconstruction of the feature image A feature image of equal size is created from the extracted line segments to approximate the input image. The image is constructed by drawing one-pixel width straight lines using the end-points of the line features. The Hough transform does not determine the width of the lines but wider lines are represented by a bunch of collinear line segments, see Fig 2. The line segments may also be deviated from their original direction and/or have one-pixel positional error because of the quantization of the accumulation matrix. Therefore we do not utilize the feature image directly but process it first by consequent operations of morphological dilation and closing [8]. These operations make the lines one pixel thicker in all directions (dilation) and fill gaps between the line segments (closing). We apply a symmetric 3×3 structure element (Block) for the dilation, and a 3×3 cross structure element (Cross) for the closing, see Fig 3. The cross element is chosen to minimize the distortion in line intersections caused by closing.

Original

HT-image

Feature image

Fig. 2. Illustration of the feature image for an image sample of size 50×50 pixels.

Element

%ORFN

Element

&URVV

x

x

x

Origin

Fig. 3. Structure elements Block and Cross.

2.4. Storing the line segments The extracted line segments are stored as {(x1,y1), (x2,y2)} representing the end-points of the line. A single coordinate value takes log 2 n bits where n is the dimension of the image. For example, a line in an image of 4096×4096 pixels takes 4×12 = 48 bits in total. Somewhat more compact representation could be achieved if the line segments are sorted according to their first coordinate x1. Instead of storing the absolute value we could store the difference between two subsequent x1’s. Most of the differences are very small (about 40 % of them are in the range 0..2). An improvement of about 7 bits (from 12 to 5 bits) was estimated when entropy coding was applied to these difference values. In the present implementation this idea was not applied.

3. Lossless compression method There are two basic approaches for utilizing the feature image: (1) lossless compression of the residual between the original and the feature image, (2) compression of the original image using the feature image as side information. The first approach does not work in practice because taking the residue destroys spatial dependencies near the borders of the extracted line features. The residual image is therefore not any easier to compress than the original one. On the other hand, the effectiveness of the second approach has been proven in practice in the case of text images [4, 5]. We thus adopt the same idea here for line drawing images. The lossless compression method is outlined in Fig. 4. The original image is compressed using the baseline JBIG, which uses previously coded neighboring pixel as context. The context is determined by combining index out of the neighboring pixel values and accessing to the model using a look-up table. Additional context pixels are taken from the feature image. An important point is that any pixel in the feature image can be utilized, even the current pixel that is to be compressed. Here we use ten pixels from the original image as in three-line JBIG modelling and five pixels from the feature image, see Fig. 5. The actual coding is performed by QM-coder, the binary arithmetic coder of JBIG [3]. The line features are also stored in the compressed file.

COMPRESSION

DECOMPRESSION

Feature extraction

Reconstruction

)HDWXUH )LOH Context modelling

,QSXW ,PDJH

Context modelling

)HDWXUH ,PDJH

)HDWXUH ,PDJH JBIG compression

2XWSXW ,PDJH JBIG decompression

Fig. 4. Block diagram of the lossless compression system.

2ULJLQDO LPDJH

)HDWXUH LPDJH

" "

3L[HO WR EH FRGHG &RQWH[W SL[HO

Fig. 5. Two-level context template. The location of the current pixel is marked by gray color.

4. Near-lossless compression method The near-lossless compression method is outlined in Fig. 6. The image is preprocessed by a feature-dependent filtering for improving the image quality. The filtering removes noise by the restoration of the line contours and therefore it results in better compression performance. The line features are used only in the compression phase and therefore they need not be stored in the compressed file. The filtered image is compressed by the baseline JBIG without any modifications. Decompression is exactly the same as the baseline JBIG. The filtering is based on a simple noise removal procedure, as shown in Fig. 7. A difference (mismatch) image between the original and the feature image is constructed. Isolated mismatch pixels (and groups of two mismatch pixels defined by 8-connectivity) are detected and the corresponding pixels in the original image are reversed. This removes random noise and smoothes edges along the detected line segments. The method is near-lossless because the amount of changes is controlled – only isolated noise pixels are reversed. Undetected objects (such as text characters) are left untouched allowing their lossless reconstruction. The noise removal procedure is successful if the feature image is accurate. However, the feature extraction of HT does not always provide exact width of the lines. The noise removal procedure is therefore iterated three times as shown in Fig. 8. The first stage applies the feature image as such, but the feature image is dilated in the 2nd stage and eroded in the 3rd stage before input into the noise removal procedure. This compensates inaccuracies in the width detection. The stepwise process is illustrated in Fig. 9. Most of the noise is detected and removed in the first phase. However, the rightmost diagonal line in the feature image is too wide and its upper contour is therefore filtered only in the third stage where the feature image is eroded. The result of the entire filtering process is demonstrated in Fig. 10.

COMPRESSION

DECOMPRESSION

Feature extraction

JBIG compression

,QSXW ,PDJH

JBIG decompression

)HDWXUH ,PDJH

2XWSXW ,PDJH

Filtering

Fig. 6. Block diagram of the near-lossless compression system.

,QSXW LPDJH

)HDWXUH ,PDJH

,QSXW LPDJH

NOISE REMOVAL FILTERING Noise removal

XOR

)HDWXUH ,PDJH

Mismatch pixels Isolated pixel extraction Isolated mismatch pixels

Dilation

Noise removal

Erosion

Noise removal

XOR

2XWSXW ,PDJH

2XWSXW ,PDJH

Fig. 7. Block diagram of the noise removal procedure.

Fig. 8. Block diagram of the three-stage filtering procedure.

Feature image

Mismatch pixels

Filtered pixels

Filtered image

1

2

3

Fig. 9. Illustration of the three-stage filtering procedure. The first line corresponds to the first stage, the second line when the feature image is dilated, and the last line when the feature image is eroded. Input image

Feature image

Output image

Filtered pixels

Fig. 10. Overall illustration of the feature-dependent filtering process.

5. Test results The performance of the proposed compression methods was tested by compressing the test images shown in Fig. 11. Three different feature sets were constructed from each image with different amount of line segments (sets 1, 2, and 3). The amount of lines was controlled by varying the parameters as shown in Table 1. The corresponding feature images of image Bolt are shown in Fig. 12. The results of the lossless compression appear in Table 2, which shows the sizes of the feature files and the compressed raster images. Improvement of about 1 to 10 % is obtained when compressing the raster image. The improvement is greater when more line segments are extracted. The best results of the lossless method are obtained using the feature file set 1 (least line segments). The amount of saving, however, is rather small and in all cases too small to compensate the overhead required by the feature file. In the case of near lossless compression, the number of extracted line features does not affect on the output file size because the features are not stored. It is therefore better to use as many features as can be reliably detected. In our case, the set 3 (most line segments) gives the best results among the three tested sets. The best results of the lossless (HT-JBIG) and near-lossless compression (HT-JBIGNL) are summarized in Table 3. The results of the baseline JBIG (JBIG) and a near-lossless variant of JBIG (JBIGNL) are also included. The near-lossless variant of JBIG is obtained by applying the same noise removal process of Fig. 7 but without any feature image. This straightforward filtering process removes random noise but it cannot correct the scanning noise near the borders of the line segments. The corresponding compression ratios of the four tested methods (JBIG, HT-JBIG, JBIGNL, HT-JBIGNL) are (31.6, 31.1, 32.0, 36.1). To sum up, the near-lossless variant (HT-JBIGNL) improves the compression performance about 12 % compared to JBIG. At the same time, the quality of the decompressed images is visually the same as the original since only isolated mismatch pixels were reversed. The quality is sometimes even better because the reversed pixels are mainly random noise or scanning noise near the line segments. The lossless variant (JBIG-HT), on the other hand, gives about 1 % worse compression performance than JBIG and is therefore not recommended. The effect of the near-lossless variant of JBIG (JBIGNL) is also marginal – about 1 % improvement in total.

Table 1. Parameter setup for the line extraction: minimum length of line segment (l), maximum width of line segment (w), maximum length of gaps inside the line (g), accumulator threshold for accepting a segment as a line (t).

Set 1 Set 2 Set 3

l 150 70 30

w 1 1 1

g 2 2 3

t 20 20 17

Bolt (1765 × 1437)

Module (1480 × 2053)

Plus (2293 × 1787)

Fig. 11. Test images.

Image Bolt

117 line segments

289 line segments

752 line segments

Fig. 12. Feature images from test images Bolt.

Table 2. Lossless compression results (in bytes) with variable amount of extracted line segments.

Set 1 Set 2 Set 3

Feat. File 702 1,734 4,512

BOLT Raster image 12,598 12,177 11,549

Total 13,300 13,911 16,061

Feat. file 120 384 1,788

MODULE Raster Total image 7,647 7,767 7,615 7,999 7,354 9,142

Feat. file 114 426 2,532

PLUS Raster image 17,629 17,507 17,283

Total 17,743 17,933 19,815

Table 3. Summary of the lossless and near-lossless compression results (in bytes). LOSSLESS

NEAR-LOSSLESS

Image

JBIG

HT-JBIG

JBIGNL

HT-JBIGNL

BOLT MODULE PLUS TOTAL

12,966 7,671 17,609 38,246

13,300 7,767 17,743 38,810

12,609 7,652 17,566 37,827

10,577 6,525 16,415 33,517

6. Conclusions Two novel compression methods based on the Hough transform (HT) and the baseline JBIG were studied. The methods exploit global dependencies in line-drawing images by extracting line segments from the image. The extracted lines are represented as their end-points. A feature image is then reconstructed from the line segments and utilized in the compression phase. The lossless variant uses reconstructed feature image for improving prediction accuracy in the local context model. The compressed file consists of the extracted line features and the compressed raster image. The raster image is compressed 1-10 % more efficiently when using the feature image as side information. The improvement, however, is too small to compensate the overhead required by the feature file. The main reason is that the information of the extracted line features is mainly in all-black neighborhoods, inside the line segments. These are the pixels that are already compressed well by JBIG and therefore only small improvement can be achieved. On the contrary, most of the information (output bits) originates from the boundaries of the objects. These areas are not well predicted by the local modelling of JBIG but global information could be useful, especially if the input image is noisy. This emphasizes the importance of the exactness of the feature extraction. The near-lossless variant applies feature-dependent filtering for removing noise from the original image. It improves the image quality and therefore results in better compression performance. No feature file is stored and therefore the accuracy of the line features is more important than the amount of the extracted features. Overall, the method gives an improvement of about 12 % compared to the baseline JBIG. At the same time, the quality of the decompressed images is visually the same (or even better) as the original since only isolated mismatch pixels were reversed. A drawback of the method is its high complexity. A straightforward implementation of HT requires O(kn) time, where n is the image size and k×k is the size of the accumulator matrix. The decompression phase is identical to the baseline JBIG and therefore equally fast. The method is therefore suitable for off-line applications (like image archival) where the images are compressed only once but decompressed often. Moreover, HT can also be made faster using the randomized Hough transform (RHT) [7]. Instead of processing individual pixels, the image is randomly sampled by selecting pairs of pixel. Each pair determines only one value in the parameter space. The sampling is repeated until an evident maximum is emphasized in the parameter space. RHT reduces the size of the parameter array and it decreases the computation time since only a part of the pixels, possibly a small part, is needed to be transferred into the array. The compression performance of using RHT is only about 1 % worse than that of using HT.

Acknowledgements The work of Pasi Fränti was supported by a grant from the Academy of Finland and the work of Eugene I. Ageenko by a grant from Center for International Mobility (CIMO).

References 1.

R.B. Arps, T.K. Truong , “Comparison of international standards for lossless still image compression”. Proceedings of the IEEE, 82, 889-899, June 1994.

2.

JBIG, Progressive Bi-level Image Compression, ISO/IEC International Standard 11544, ITU Recommendation T.82, 1993.

3.

W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard. Van Nostrand Reinhold, 1993.

4.

P.G. Howard, “Text image compression using soft pattern matching”, The Computer Journal, 40, 146-156, 1997.

5.

I.H. Witten, A. Moffat and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York, 1994.

6.

Leavers V.F. “Survey: Which Hough Transform”, CVGIP Image Understanding, 58 (2), 250-264, September 1993.

7.

H. Kälviäinen, P. Hirvonen, L. Xu, and E. Oja, “Probabilistic and non-probabilistic Hough transforms: overview and comparisons”, Image and Vision Computing, 13 (4), 239-251, May 1995.

8.

J. Serra, Image Analysis and Mathematical morphology. Academic Press, London, 1982.