ArbitraryWarped Document Image Restoration ... - Semantic Scholar

1 downloads 0 Views 700KB Size Report
Some of them use 3D infor- mation to help 2D restoration, which define a certain 3D model or get the real 3D image of the warping, and then restore the image.
Arbitrary Warped Document Image Restoration Based on Segmentation and Thin-Plate Splines

a

Yu Zhanga , Changsong Liua , Xiaoqing Dinga , Yanming Zoub Dept. of Electronic Engineering, Tsinghua University State Key Laboratory of Intelligent Technology and Systems, Beijing, 100084, P. R. China b System Research Center, Beijing, Nokia Research Center [email protected]

Abstract Warping is a common appearance in camera captured document images. It is the primary factor that makes such kind of document images hard to be recognized. Therefore it is necessary to restore warped document image before recognition. In this paper, a novel restore method is presented. The method takes a rough line segmentation and character segmentation firstly in order to estimate the warping direction. Then several pairs of key points mapping between the original image and the restored image are determined and Thin-Plate Splines (TPS), which is an interpolation algorithm, is introduced to restore the image. Such process can effectively describe the warping direction of the document and successfully restore the image. Some experimental results show the effect of the image restoration and compare the recognition rate before and after the restoration based on a same OCR application.

1. Introduction Image restoration is a necessary step before a camera captured document image is recognized since the document might not be flattened and there might be some distortions in the image, which will greatly reduce the performance of an OCR application. There are several proposed approaches to restore warped document image. Some of them use 3D information to help 2D restoration, which define a certain 3D model or get the real 3D image of the warping, and then restore the image. A cylindrical warping model is appeared in [1], which assumes that a warped document’s generatrix is parallel to the image plane and the bounding is cylindrical. However, this approach needs several input parameters and confines the restoration to a cer-

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

tain type of warping, making it hard to generalize. Approach in [2] uses the luminance of a document image to estimate the 3D surface shape of the document and then restore it. It requires a certain ruled illumination condition and therefore is not practical in common camera based OCR utility. If there are devices that can acquire the actual 3D image of a warped image, definitely we can restore arbitrary warping as [3] illustrates. However, it is only useful in certain fields such as ancient book recovery, whereas not camera based OCR. There are also many pure 2D approaches, which are more general, that do not confine the distortion type and directly restore images by 2D information. The approach in [4] divides a warped document image into many grids base on estimated line and character direction and then transforms each grid into a square and puts them together to get a full restoration of the image. Such method seems to over-divide the image and may face some discontinuity problems when piecing each grid together. A recent introduced approach in [5] is segmentation based, which divides the grid based on word segmentation and estimates each grid’s slant angle by segmentation result and at last restores each word to put them together. The approach proposed in this paper is a 2D method, which can be applied in different distortion situations. The method first performs a rough line and character segmentation in order to estimate the baseline and word direction that can provide several pairs of key points between the original image and the destination image, and then uses Thin-Plate Splines (TPS) algorithm [6] to calculate a global point to point mapping between them, in which way we can get the restored image just by those key points. TPS restoration dewarps the image as well as remains the neighborhood relationship in the original image. Therefore the restored image is more satisfactory and can resist some estimation error. In Section 2, the detail of the approach will be illus-

trated. There will also be some experimental results in Section 3 which will show the validity of the approach. At last a conclusion will be drawn in Section 4.

2. Document Image Restoration Approach The proposed approach works on binary document images. A camera captured document image is first binarized for the following analysis. As it is not the primary problem this paper tends to solve, we do not discuss it here. All the following discussions are base on the condition that the images are already binarized. In addition, this paper concerns horizontal text that is mostly appeared in both Arabic and Chinese characters. Warped document images should have text lines with a main direction of horizon.

2.1. Rough Line and Character Segmentation In order to estimate the warping direction, it is important to get baselines in a document image and we use word segmentation result to fit baseline of each line. Here a combination of projection and connect component (CC) analysis method is performed on line segmentation. Similarly, character segmentation step uses projection and CC together with envelope analysis line by line after line segmentation step. Such method is simple and there may be some segmentation errors (especially character segmentation errors) when the warping is relatively great. However, such segmentation step does not tend to cut each line and character accurately in order to be recognized but tends to get evidence for baseline fitting. So we do not care whether the segmentation is accurate. We just need some information that can reflect the warping direction. As a result, a rough segmentation result is acquired for further steps. A result of line and character segmentation of part of a warped document image is shown in Fig. 1. The red curves in the figure are estimated baselines that reflect line segmentation result and the blue squares are character segmentation result. There are cases that two lines are judged to be one line and several characters are segmented as one, but there is enough information to reflect the warping direction and we can get a satisfactory baseline fitting result based on the segmentation result.

2.2. Baseline Fitting Baseline is the most obvious evidence of document warping and the restoration can be simply seemed as to straighten all baselines. Therefore baseline detection is the most essential step in the restoration.

Figure 1. Example of line and character segmentation and baseline fitting result.

In the previous step we have got a rough segmentation of lines and characters. For each detected line, a baseline is fitted as the red lines in Fig. 1. In this step, we first extract the bottom envelope for each character, and find several neighborhood lowest points of the envelope as the fitting points. Commonly, these points are mostly the lowest pixel of different characters. Then baselines are fitted using these points. Since it requires that the fitted baselines should be valid in arbitrary warping situations, polynomial fitting is not very suitable because we cannot decide the degree of the polynomial. A neighborhood linear fitting is therefore more reasonable. In addition, in order to eliminate the influence of out lier points such as noise and character descender, the fitting method should be robust. Commonly RANSAC algorithm is used to deal with outliers, yet Least Median Square (LMS) algorithm [7] is reported to have a slightly better performance in [8]. So we adopt LMS for neighborhood linear baseline fitting and the overall process of baseline fitting is listed in the following: Step1 For each segmented line, extract the bottom envelope of every segmented character and then detect neighborhood lowest points. For a character, its bottom envelope is denoted as: y = g(x)

(1)

Correspondingly, neighborhood lowest points are denoted as: {(x, y)|y =

min

x−Δ