Text Extraction from Document Images Using Fourier ...

45 downloads 1099 Views 535KB Size Report
Nov 9, 2014 - ABSTRACT. Text extraction in document images has been an important ..... displayed in the upper left corner rather than the center. You can fix ...
International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

Text Extraction from Document Images Using Fourier Transform Based Method 1

Vikas K. Yeotikar

2

Manish T. Wanjari

Department of Computer Science, SSESA, Science College, Congress Nagar, Nagpur,

Department of Computer Science, SSESA, Science College, Congress Nagar, Nagpur,

(M. H.), India

(M. H.), India

ABSTRACT Text extraction in document images has been an important research area. Extraction of this information in the form of text involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given document image. A large number of techniques have been proposed to address this problem and the purpose of this paper is to review and implementation of Fourier Transform based methods such as Spatial domain filter, 2D-Fast Fourier Transform, 2D Fourier Transform, Image Enhancement and Edge Reinforcement for some of the Document images is carried out in this paper.

Keywords Fourier Transform, Document Image Analysis (DIA), DFT, FFT, Text Extraction, Text Detection, Text Localization, Text Enhancement.

1. INTRODUCTION Achieving human functions like reading, recognizing and thinking through machines is an ancient dream. Much of our interaction with environment requires recognition of `things' such as sounds, smells, shapes in a scene (text characters, faces, flowers, plants), etc. Analysis of document images for information extraction has gained immense importance in recent past. Wide variety of information, which has been conventionally stored on paper, is now being converted into electronic form for better storage and intelligent processing. This needs processing of documents using image analysis algorithms. Locating text image blocks and tables, and defining appropriate algorithm is the major challenge in document image analysis [1, 2].

89

3

Mahendra P. Dhore

Department of Electronics & Computer Science, R.T.M. Nagpur University Campus, Nagpur, (M. H.), India

In order to understand how different document image processing filters work, it is a good idea to begin by understanding what frequency has to do with document images. A document image is in essence a two dimensional collection of discrete signals. Therefore, the signals have frequencies associated with them. For instance, if there is relatively little change in grayscale values as you scan across an image, then there is lower frequency content contained within the document image. If there is wide variation in grayscale values across a document image then there will be more frequency content associated with the document image. This may seem somewhat confusing, so let us think about this in terms that are more familiar to us. From signal processing, we know that any signal can be represented by a collection of sine waves of differing frequencies, magnitudes and phases. This transformation of a signal into its constituent sinusoids is known as the Fourier Transform.[4] This collection of sine waves can potentially be infinite, if the signal is difficult to represent, but is generally truncated at a point where adding more signals does not significantly improve the resolution of the recreation of the original signal. In digital systems, we use a Fourier Transform designed in such a way that we can enter discrete input values, specify our sampling rate, and have the computer generate discrete outputs. This is known as the Discrete Fourier Transform, or DFT. MATLAB uses a fast algorithm for performing a DFT, which is called the Fast Fourier Transform, or FFT, whose MATLAB command is fft. The FFT can be performed in two dimensions, fft2 in

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

MATLAB. This is very useful in image processing because we can then determine the frequency content of an image. Picture a document image as a two dimensional matrix of signals. If you plotted just one row, so that it showed the grayscale value stored within each pixel, you might end up with something that looks like a bar graph, with varying values in each pixel location. Each pixel value in this signal may appear to have no correlation to the next one. However, the Fourier Transform can determine which frequencies are present in the signal. In order to see the frequency content, it is useful to view the absolute value of the magnitude of the Fourier Transform, since the output of a Fourier Transform is complex in nature.

1.1.Document Image Analysis(DIA) Over the last few decades, machine reading is changing from a dream to reality. Optical Character Recognition systems (OCR) deal with the recognition of printed or handwritten characters. Methodically, character recognition is a subset of the pattern recognition area. However, it was character recognition that gave the impetus to the Pattern Recognition and Image Analysis to become matured fields of science. At present, reasonably efficient and inexpensive OCR packages are commercially available to recognize printed texts in languages such as English, Chinese, and Japanese. On the contrary, there is only limited research effort made in the recognition of Indian and other oriental languages. While research in OCR for printed Roman script has reached a point of diminishing returns, OCR for handwriting and for printed nonRoman scripts continues to be a very active field. Apart from character recognition, recognizing the font of a printed document is not even attempted on Indian language documents; while some successful studies are made in English. Typographically, a font is a particular instantiation of a typeface design, often in a particular size, weight and style. In many occasions, printed documents may contain words in various font faces and sizes. For Indian and many other oriental languages, OCR systems are not yet able to successfully recognize printed document images of varying scripts, quality, size, style and font.[9]

    

To identify different textured and non-textured regions in an document image. To classify/segment different texture regions in an image. To extract boundaries between major texture regions. To describe the Texel unit. 2-D, 3-D shape from texture [3]

Larger number of approaches is available. In this paper we discuss some of Fourier Transform method:

1.3 Fourier Transform: The Fourier transform is a representation of an image as a sum of complex exponentials of varying magnitudes, frequencies, and phases. The Fourier transform plays a critical role in a broad range of image processing applications, including enhancement, analysis, restoration, and compression. Some Properties of Fourier transform:  Frequency Response of Linear Filters:  Fast Convolution:  Locating Document Image Features[4]  Some Method of Fourier transforms:     

Spatial Domain Filter 2D Fourier Transform 2D-Fast Fourier Transform Image Enhancement and Edge Reinforcement In this paper, we used some of Fourier Transform methods to remove the noise from the document image which consists of text. After that we used one of the method of Fourier transform on document image to extract the text.

2. APPLICATIONS OF THE FOURIER TRANSFORM This paper presents a few of the many document image processing-related applications of the Fourier transform.

2.1. Filters Image processing is based on filtering the content of images. Filtering is used to modify an image in some way. This could entail blurring, deblurring,

1.2 Purpose of texture analysis: 90 Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

locating certain features within an image, etc… Linear filtering is accomplished using convolution, as discussed above. A filter, or convolution kernel as it is also known, is basically an algorithm for modifying a pixel value, given the original value of the pixel and the values of the pixels surrounding it. There are literally hundreds of types of filters that are used in image processing. However, we will concentrate on several common ones. The first filters we will talk about are low pass filters. These filters blur high frequency areas of images. This can sometimes be useful when attempting to remove unwanted noise from an image. However, these filters do not discriminate between noise and edges, so they tend to smooth out content that should not be smoothed out. Median Filters can be very useful for removing noise from images. A median filter is like an averaging filter in some ways. The averaging filter examines the pixel in question and its neighbor’s pixel values and returns the mean of these pixel values. The median filter looks at this same neighborhood of pixels, but returns the median value. In this way noise can be removed, but edges are not blurred as much, since the median filter is better at ignoring large discrepancies in pixel values.

2.2. Frequency Response of Linear Filters The Fourier transform of the impulse response of a linear filter gives the frequency response of the filter. The function freqz2 computes and displays a filter’s frequency response. The frequency response of the Gaussian convolution kernel shows that this filter passes low frequencies and attenuates high frequencies.

Convolution is a linear filtering method commonly used in image processing. Convolution is the algebraic process of multiplying two polynomials. An image is an array of polynomials whose pixel values represent the coefficients of the polynomials. Therefore, two images can be multiplied together to produce a new image through the process of convolution. If the convolution kernel, or filter, is large, this can be a very tedious process involving many multiplication steps. However, the convolution theorem states that convolution is the same as the inverse Fourier Transform of the multiplication of two Fourier Transforms. 

Create two matrices.



Zero-pad A and B so that they are at least (M+P-1)-by-(N+Q-1). (Often A and B are zero-padded to a size that is a power of 2 because fft2 is fastest for these sizes.) The example pads the matrices to be 8-by-8.  Compute the two-dimensional DFT of A and B using fft2, multiply the two DFTs together, and compute the inverse two-dimensional DFT of the result using ifft2  Extract the nonzero portion of the result and remove the imaginary part caused by roundoff error. 8.0000 9.0000 15.0000 7.0000 6.0000 11.0000 17.0000 30.0000 19.0000 13.0000 15.0000 30.0000 45.0000 30.0000 15.0000

Fig. 1. Frequency Response of a Gaussian Filter

2.3. Fast Convolution A key property of the Fourier transform is that the multiplication of two Fourier transforms corresponds to the convolution of the associated spatial functions. This property, together with the fast Fourier transform, forms the basis for a fast convolution algorithm.

91

7.0000

21.0000 30.0000 23.0000

9.0000

4.0000

13.0000 15.0000 11.0000

2.0000

Table: 1 Two by Two Matrices for testing convolution

2.4. Locating Document Image Features The Fourier transform can also be used to perform correlation, which is closely related to convolution. Correlation can be used to locate features within an document image; in this context correlation is often called template matching.[7]

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

3.2 Compute and visualize the 30-by-30 DFT.

Fig. b. DFT computed without padding

Fig: 2.a. Original Document Image from NIST database.

Fig: b. Text Extracted from above document image

3.3 This plot differs from the Fourier transform displayed in “Visualizing the Fourier Transform”. First, the sampling of the Fourier transform is much coarser. Second, the zero-frequency coefficient is displayed in the upper left corner instead of the traditional location in the center.

3. DISCRETE FOURIER TRANSFORM Working with the Fourier transform on a computer usually involves a form of the transform known as the discrete Fourier transform (DFT). A discrete transform is a transform whose input and output values are discrete samples, making it convenient for computer manipulation. There are two principal reasons for using this form of the transform: • The input and output of the DFT are both discrete, which makes it convenient for computer manipulations. • There is a fast algorithm for computing the DFT known as the fast Fourier transform (FFT).

Fig. c. Color DFT computed without padding

3.4 To obtain a finer sampling of the Fourier transform, add zero padding to f when computing its DFT. The zero padding and DFT computation can be performed in a single step.

Visualizing the Discrete Fourier Transform 3.1 Construct a matrix f that is similar to the function f(m,n) in the example in “Definition of Fourier Transform”. Remember that f(m,n) is equal to 1 within the rectangular region and 0 elsewhere. Use a binary image to represent f(m,n).

Fig. d. DFT computed with Padding

3.5 The zero-frequency coefficient, however, is still displayed in the upper left corner rather than the center. You can fix this problem by using the function fftshift, which swaps the quadrants of F so that the zerofrequency coefficient is in the center. [4,5]

Fig.a. Rectangular Region by DFT

92

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

4.3. Spatial Domain Filter Spatial domain filtering are generally defined as small windows that are convolved with the original document image. In the spatial domain, the action of a filter can be seen by looking the structure of filter. The concept of filtering has its roots in the use of the Fourier transform for signal processing in the so-called frequency domain. Spatial filtering term is the filtering operations that are performed directly on the pixels of an image.[9]

Fig. e. Log of the Fourier Transform of a Rectangular Function Fig. 3. Discrete Fourier transform

4. IMPLEMENTED METHODS OF THE FOURIER TRANSFORM 4.1 2D Fourier Transform The Fourier transform is a representation of an image as a sum of complex exponentials of varying magnitudes, frequencies, and phases. The Fourier transform plays a critical role in a broad range of document image processing. If f(m,n) is a function of two discrete spatial variables m and n, then the two-dimensional Fourier transform of f(m,n) is defined by the relationship:

F (1,  2) 





  f (m, n)e

 j1m  j 2 n

e

4.4. Image Enhancement It refers to accentuation, or sharpening, of image features such as boundaries, or contrast to make a graphic display more useful for display & analysis. This process does not increase the inherent information content in data. It includes gray level & contrast manipulation, noise reduction, edge crispening and sharpening, filtering, interpolation and magnification, pseudo coloring, and so on. Sometimes images obtained from satellites and conventional and digital cameras lack in contrast and brightness because of the limitations of imaging sub systems and illumination conditions while capturing image. Images may have different types of noise. In image enhancement, the goal is to accentuate certain image features for subsequent analysis or for image display.

m  n 

F(ω1,ω2) is a complex-valued function that is periodic both in ω1 and ω2, with period 2.

Examples include contrast and edge enhancement, pseudo-coloring, noise filtering, sharpening, and magnifying. Image enhancement is useful in feature extraction, image analysis and an image display. The enhancement process itself does not increase the inherent information content in the data. It simply emphasizes certain specified image characteristics. [5]

4.2 2D-Fast Fourier Transform

4.5 Edge Reinforcement

The 2-D FFT block computes the fast Fourier transform (FFT) of a two-dimensional M-by-N input matrix in two steps. First it computes the onedimensional FFT along one dimension (row or column). Then it computes the FFT of the output of the first step along the other dimension (column or row). The dimensions of the input matrix, M and N, must be powers of two.[8]

A probabilistic approach to edge reinforcement is proposed that is based on Bayesian networks of twodimensional (2-D) fields of variables. The proposed net is composed of three nodes, each devoted to estimating a field of variables. The first node contains available observations. The second node is associated with a coupled random field representing the estimates of the actual values of observed data and of their

The variables ω1 and ω2 are frequency variables; their units are radians per sample. F(ω1,ω2) is often called the frequency-domain representation of f(m,n).

93

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

discontinuities. At the third node, a field of variables is used to represent parameters describing the membership of a discontinuity into a group. The edge reinforcement problem is stated in terms of minimization of local functional; each associated with a different node, and made up of terms that can be computed locally. It is shown that a distributed minimization is equivalent to the minimization of a global reinforcement criterion. Results concerning the reinforcement of straight lines in synthetic and real document images are reported, and applications to synthetic aperture radar (SAR) images are described. [6]

Fig.4. d. switch image Fig.5. d. switch image Fig.4 2D Fourier Transform

5. EXPERIMENTAL RESULT

Fig. a. Original Image

Fig. b. Filter Image

Fig. 6. 2D-Fast Fourier Transform Fig. 4. a.Original document Image Fig. 5. a.Original Document Image

Fig. 4 & 5 b. perform inverse 2D FFTs on switched images

Fig.4 & 5 c. magnitude and phase of 2D FFTs

94

5.1. Spatial Domain Filter (using sobel)

Fig. a. Original Document Image Fig. b. Filter Image Fig.7. Spatial Domain Filtering

Fig. b. Histogram image c. Remove all values below 0.4 Fig. 8. Image Enhancement

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore

International Journal of Computer & Mathematical Sciences IJCMS ISSN 2347 – 8527 Volume 3, Issue 9 November 2014

Fig. a. Original Image Fig. b. Edge reinforcement Fig. 9. Edge Reinforcement

In figure 4 & 5 We show how to perform a twodimensional Fast Fourier Transform in Matlab.a) Displayed the original document images, implementing these document images, The document image files are imported as uint8, so they should be converted to double arrays before doing the FFTs. The FFT of real, non-even data is complex, so the magnitude and phase of the 2D FFTs should be displayed. b) we get result from perform inverse 2D FFTs on switched images, in fig. c) get result magnitude and phase of 2D FFTs and The phase of the FFTs is somewhat hard to interpret visually and generally looks like noise. However, the phase holds a great deal of the information needed to reconstruct the document image. To demonstrate the role of the phase of the FFT, We switched the magnitude and phase of the two images in the Fourier domain then performed an inverse 2D FFT to view the results. The two mixed images are shown fig. 4 & 5 d. In figure 6. Create spatial domain filter to get the horizontal and vertical edge of the document images. Spatial domain filters are generally defined as small document image. In the spatial domain, the action of a filter can be seen by looking at the structure of filter. In figure 7. a) Original document image and b) The edges can be seen clearly by creating a threshold binary image obtained filter image. In figure 8. a) Shows histogram document image, Histogram has a lot of importance in image enhancement. b) Image enhancement to accentuation, or sharpening, of image features such as boundaries, or contrast to make a graphic display more useful for display & analysis. In figure 9. In edge reinforcement, detection is very useful for locating objects within images.

6. CONCLUSION A text-image-analysis is needed to enable a text information extraction system to be used for any type of document image. In the above work, we have

95

presented five methods for selection and separation of text element from document image and remove the noise from the document. The document image can be obtained by scanner or hand held device like camera. The concept of filtering has its roots in the use of the Fourier transform for signal processing in the frequency domain. In edge reinforcement detection is very useful for locating objects within text images and better result should be displayed. Although many researchers have already investigated text localization, text detection and tracking for images is required for utilization in real applications and need more research in this area.

REFERENCES [1] Shuichi Tsujimoto And Haruo Asada. Invited Paper .Major Components of a Complete Text Reading System. Proceedings of the IEEE, Vol. 80, No. 7, pp.1133-1149, July 1992. [2] Gaurav Harit,Santanu Chaudhari, Gupta P., Vohra N., Joshi S. D. .A Model Guided Document Image Analysis Scheme. proceedings of IEEE pp. 1137-1141, 2001. [3] Haralick, R.M. 1979. Statistical and Structural Approaches to Texture. Proceedings of the IEEE, 67:786-804; (also 1973, IEEE-T-SMC). [4] Rafael C. Gonzalez, Richard E. Woods, and Steven L. Eddins, Digital Image Processing, Using MATLAB. [5] Gonzalez R. C.,Woods R. E. .Digital Image Processing. University of Tennessee & MedData Interactive. [6] Ragazoni C. S., “Group-membership reinforcement for straight edges based on Bayesian networks”, Image processing, IEEE Transaction on (volume 7, issue 9) sep. 1998. [7] Allier B., Emptoz H. .Font Type Extraction and Character Prototyping Using Gabor Filters. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR’03) pp.799-803, 2003. [8] Negishiy H., Katoy J., Hasey H. . Watanabez T. .Character Extraction from Noisy Background for an Automatic Reference System . Dept. of Intellectual Information Systems Engineering, Toyama University, Japan-2010 [9] Ghadiyaram A. .Agarwal A. & Rao C. R. Doctor of Philosophy .An Investigation into Telugu Font and Character Recognition. Ph.D. Thesis, University of Hyderabad ,April 2009.

Vikas K. Yeotikar, Manish T. Wanjari, Mahendra P. Dhore