An improved Hough transform voting scheme ... - Semantic Scholar

17 downloads 3709 Views 3MB Size Report
May 18, 2009 - journal homepage: www.elsevier.com/locate/patrec ..... The C language coding was done in Borland C++ Builder® (Version 5.0) integrated.
Pattern Recognition Letters 30 (2009) 1241–1252

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

An improved Hough transform voting scheme utilizing surround suppression Siyu Guo a,b,*, Tony Pridmore b, Yaguang Kong c, Xufang Zhang d a

College of Electrical and Information Engineering, Hunan University, Changsha 410082, PR China School of Computer Science, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham NG8 1BB, United Kingdom School of Automation, Hangzhou Dianzi University, Xiasha Higher Education Park, Hangzhou 310018, PR China d Women’s Hospital School of Medicine, Zhejiang University, Xueshi Road, Hangzhou 310006, PR China b c

a r t i c l e

i n f o

Article history: Received 27 February 2008 Received in revised form 14 April 2009 Available online 18 May 2009 Communicated by P. Bhattacharya Keywords: Line detection Hough transform Surround suppression

a b s t r a c t The Hough transform has been a frequently used method for detecting lines in images. However, when applying Hough transform and derived algorithms using the standard Hough voting scheme on realworld images, the methods often suffer considerable degeneration in performance, especially in detection rate, because of the large amount of edges given by complex background or texture. It is very likely that these edges form false peaks in Hough space and thus produce false positives in the final results, or even suppress true peaks and cause missing lines. To reduce the impact of these texture region edges, a novel method utilizing surround suppression is proposed in this paper. By introducing a measure of isotropic surround suppression, the new algorithm treats edge pixels differently, giving small weights to edges in texture regions and large weights to edges on strong and clear boundaries, and uses these weights to accumulate votes in Hough space. In this way, false peaks formed by texture region edges are suppressed, and the quality of detection results is improved. An efficient computation method for calculating the isotropic surround suppression was also given, accelerating the proposed algorithm. Experimental results on a real-world image base show that the new method improves line detection rate significantly, compared with the standard Hough transform and the Hough transform using gradient direction information to guide the voting process. Though slower than the other two methods, the new algorithm can be preferable in applications where detection rate is of the most concern and where there is no very strict requirement for high speed performance. Ó 2009 Elsevier B.V. All rights reserved.

1. Introduction The Hough transform (HT) (Hough, 1960) is widely regarded as one of the classic techniques in image analysis and computer vision, and has long been used to detect and locate geometric features such as lines and curves (Duda and Hart, 1972) in images. The Hough transform finds constructs that can be represented by a small number of parameters, accumulating evidence for possible parameter values in a discrete representation of a continuous space in which each axis represents a single numerical parameter. Features extracted from the input image are mapped into this parameter space, generating votes for parameter sets representing constructs upon which they might lie. Constructs with a large

Abbreviations: HT, Hough transform/standard Hough transform; SSHT, surround suppression Hough transform. * Corresponding author. Address: College of Electrical and Information Engineering, Hunan University, Changsha 410082, PR China. Tel.: +86(0) 731 863 3775; fax: +86(0) 731 882 2224. E-mail addresses: [email protected] (S. Guo), [email protected] (T. Pridmore), [email protected] (Y. Kong), [email protected] (X. Zhang). 0167-8655/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2009.05.003

number of votes, identified by searching for significant local maxima in the array of accumulators making up the discretised parameter space, are considered to be present in the image. Many variations on Hough’s original transform have been proposed, and excellent reviews are provided by Illingworth and Kittler (1988) and Leavers (1993). The constrained Hough transform (Olsen, 1999) and the optimizing line finder of Palmer et al. (1997) focus on reducing peak localization error. Li et al.’s Fast Hough transform (Li et al., 1986) addresses issues of efficiency, significantly reducing the amount of computation and storage required to implement the transform and making its application to more complex objects feasible. The Generalised Hough Transform of Ballard (1981) replaces the formal parameterisation of the target object with a look-up table, allowing the Hough approach to be used to detect arbitrary shapes, while Stephens (1990) presents a probabilistic interpretation of the Hough approach. By using the pixel neighbourhood information, Kälviäinen et al. (1995) proposed some variants of RHT, such as the Dynamic RHT and the Windowed RHT, aiming to achieve high detection speed with low memory cost. Bandera et al. (2006) combined Kälviäinen et al.’s RWRHT and the variable band-width mean shift algorithm for line

1242

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

segment detection, using the former to construct the Hough space, and the latter to extract base lines from the Hough space for further detection. Furukawa and Shinagawa also made detailed analysis on the vote distribution around peaks in Hough space, and proposed a line segment extraction method (Furukawa and Shinagawa, 2003). Chung et al. (2004) employed an affine transformation to promote the memory usage efficiency of the Hough space for line detection using slope–intercept parameters. Computational benefits are also gained through the transformation. Cha et al. (2006) extended the Hough space for line detection to a 3D array rather than the conventional 2D accumulator matrix, using the extra dimension to hold the image coordinate information, either the x or y coordinates, of the voting pixels. By this extension, a feature extraction method was proposed to make the finding of line segments easier. Statistical analysis on the accumulation process of Hough voting for line detection was done in (Matas et al., 2000), and the Progressive Probabilistic Hough transform was proposed based on the analysis. A study on more formal statistical properties of Hough transform, such as the consistency of the estimator and the rate of convergence, was reported by Dattner (2009). Hough transform has been successfully used in many applications. A Hough transform based line cluster detection method for localizing culture rows by Leemans and Destain (2006) and rock mass discontinuity detection and analysis by Deb et al. (2008) are some recent examples. Different implementation approaches of the Hough transform have also been reported, for instances, a real-time architecture for Hough transform that can be easily fitted into FPGA devices (Karabernou and Terranti, 2005) and neural networks inspired by the idea of Hough transform, or Hough transform neural networks (Basak and Das, 2003). The mechanism by which image features cast votes in the parameter space is a key component of the Hough transform, and a variety of schemes have been proposed. In the standard Hough voting scheme, every data point votes for every parameter set to which it might contribute. This raises the computational cost of the transform, often to unacceptable levels. A number of probabilistic Hough transforms (Walsh and Raftery, 2002) have been proposed which address this problem. Rather than consider every data point, probabilistic Hough methods apply a selection step, with only the selected data items contributing to the Hough accumulator array. The simplest probabilistic Hough transform is the randomised Hough transform of Xu and Oja (1993). Here, n items are selected randomly from the input data and used to solve for the n parameters defining the Hough space. Only the accumulator containing the hypothesized parameter set is incremented. A number of other n-to-1 voting schemes have also been proposed, e.g. Kimura and Watanabe (2002). More recently Fernandes and Oliveira (2008) proposed an efficient scheme in which data points are first clustered into near-collinear segments. Each cluster then casts a set of votes using an elliptical Gaussian kernel. Though promising and theoretically plausible by its nature, the standard Hough transform, along with derived methods adopting the same voting scheme, often gives performance, especially detection rates, considerably below expectations when applied to realworld images. Analyses show that one major reason for this deterioration is the presence of large amounts of pixels on nonlinear edges. In this paper we call such pixels noise pixels, or simply noise, for the disturbances they cast on the line detection task concerned, though they should be distinguished from stochastic imaging noises in the common sense. First of the disturbances is that there may be sufficiently many collinear points among the noise pixels to form false peaks in the Hough space, giving perceptually incorrect detection results. Second, under certain settings of peak detection, a false peak in the Hough space may be strong enough to

suppress a nearby true peak, leading to a missing line in the results or, if the two peaks are within an acceptable distance, to a higher error in the detected parameters of a real line. Furthermore, if we think about the ranks of true peaks among the list of all reported peaks sorted by descending number of votes, the presence of many strong false peaks in the result may very likely ‘‘push” some true peaks downwards in the list. This implies that if, for example, a post process is employed to find out the true lines out of the coarse result given by the HT, the post processing method will have to examine more peaks to find the same number of true lines. We can thus regard this ranking of true peaks among obtained peaks as a quality measure of results given by HT, and it is safe to say that the existence of noise edge pixels in large amount, which is often unavoidable in real-world images, decreases the quality of results obtained by HT. It is thus critical to eliminate or at least reduce the impact of noise edges. From observations, it can be concluded that in general real-world images, a significant portion of noise edges come from complex backgrounds or texture regions in the images, such as lawns, foliage, and bricks forming walls of buildings. On one hand, these regions produce a large number of noise edges which will degrade the quality of line detection results. On the other hand, these detailed edges within the complex background or texture regions usually do not have high levels of perceptual importance when separating key objects from background or from each other, e.g., finding the buildings of interest out from the surrounding plantations or other artificial objects. It is possible that some texture regions do have real lines, for instance, lines produced by bricks forming a wall. But even thus, these lines should be regarded as components of the texture rather than distinct features defining the object. So it is reasonable and promising to suppress the impact of edge pixels formed by the complex background and texture regions in hope of lifting the quality of peaks detected in Hough space. This point is also backed up by results from neurophysiology, which show that the existence of a complex surround does decrease the perceptual importance of the point under concern in human visual system (Knierim and van Essen, 1992; Jones et al., 2001). We therefore propose a novel method utilizing surround suppression (Grigorescu et al., 2003, 2004) at edge pixels which gives each edge a weight according to the strength of surround suppression at the position during its voting into Hough space. By assigning lower weights to edges with strong surround suppression, the votes of edges in complex backgrounds and texture regions are of less significance, thus the peaks formed by these edges are lowered or even demolished, while those peaks accumulated from edges on clear boundaries between different objects, for instance, buildings and sky, or roads and lawn, are maintained or, if decreased, less affected by surround suppression. Results thus obtained display more emphases on the latter peaks, and a lift in result quality can be expected. It should be noted that our aim is to use a measure to describe the complexity of edge distribution in the neighbourhood of a pixel rather than to discriminate different types of textures for classification or segmentation. The surround suppression used in the paper is chosen mainly for its simplicity and intuitive way of describing the chaos of edge pixels in regions whose impacts are supposed to be eliminated. A number of texture descriptors can be found in (Sonka et al., 1993), but for the expensive computational requirements, they are not adopted in our work. The rest of the paper is arranged as follows: In Section 2, we introduce the novel method. Specifically, in Section 2.1 we introduce the isotropic surround suppression to be used in our algorithm; Section 2.2 gives the proposed vote weighting index; an efficient computation method of surround suppression is provided

1243

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

in Section 2.3; and in Section 2.4 the main steps of the novel method are given. In Section 3, are results of experiments on an image base, namely, descriptions of experimental setting and preparation (Section 3.1), definition of matching used to quantitatively measure the quality of detected lines (Section 3.2), comparisons of detection rate and accuracy (Section 3.3), running speed performances (Section 3.4), and roles of parameters of the proposed algorithm (Section 3.5). Finally, in Section 4, conclusions are given. 2. The proposed method 2.1. Isotropic surround suppression In our novel voting scheme, we use the idea of surround suppression proposed by Grigorescu et al. (2004), where two kinds of surround suppression were given, namely, anisotropic surround suppression and isotropic surround suppression. In our study, general complex background and texture regions are of concern. Since general a priori models of anisotropy of these regions are not available so far as we know, we adopt the isotropic surround suppression, as given below. Consider the following difference of two 2D Gaussians

DoGr ðx; yÞ ¼ g 4r ðx; yÞ  g r ðx; yÞ ¼

1 2

2pð4rÞ

exp 

x2 þ y 2 2ð4rÞ

!

2



1 2pr2

 2  x þ y2 : exp  2r2 ð1Þ

S½DoGr ðx; yÞ ; kSðDoGr Þk1

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½rx Ir ðx; yÞ2 þ ½ry Ir ðx; yÞ2 ;

  @ðI  g r Þðx; yÞ @g r ðx; yÞ; ¼ I  @x @x



I 

 @g r ðx; yÞ @y

ð5Þ

are gradient components along x- and y-axis. It is noteworthy that in order to measure isotropic surround suppression effect, any approximations to image gradient can be used, e.g., edge images obtained by applying the Sobel or Prewitt operators (Sonka et al., 1993). From the perspective of computation efficiency, it is natural and desirable to use the edge magnitude obtained as an intermediate result during edge detection with certain operator, thus no extra computation is needed. Once the magnitude E of an edge image is obtained, the isotropic surround suppression rr at any given pixel (x, y) is calculated through the convolution of E and the weighting function xr at (x, y), i.e.,

rr ðx; yÞ ¼

Z Z

Eðx  u; y  vÞxr ðu; vÞ du dv;

ð6Þ

X

where X is the image coordinate domain. The relative strength of suppression of surround edges on the edge at (x, y) is measured by

wðx; yÞ ¼ arctan

rr ðx; yÞ : Eðx; yÞ

ð7Þ

wis called the suppression slope, which is dependent upon the com-

2.2. Vote weighting based on surround suppression

ð3Þ

where

rx Ir ðx; yÞ ¼

@ðI  g r Þðx; yÞ ¼ @y

ð2Þ

where S(x) = x for x P 0 and 0 for x < 0, and kk1 is the L1 norm. The weighting function xr with r = 1 is shown in Fig. 1. In practise, the contribution of points whose distances from the origin are larger than 8r is negligible, and in this case, the weighting function in fact gives a weighted ring-shaped neighbourhood centred at the pixel under consideration. When this 8r-radius region is adopted, the whole region size is (2  8r + 1)  (2  8r + 1) = (16r + 1)  (16r + 1). In (Grigorescu et al., 2004), an edge image given by the gradient of the convolution of the original intensity image I and the 2D Gaussian function gr was used. The edge magnitude E is given as

Eðx; yÞ ¼

ry Ir ðx; yÞ ¼

plexity of the patterns surrounding the point concerned. The larger a w (x, y) is, the stronger the suppression of surrounding edges on pixel (x, y).

A weighting function xr(x, y) is derived from the DoGr as

xr ðx; yÞ ¼

and

ð4Þ

The above suppression slope can be used as an index to measure the complexity of surround centred at any specific pixel. Following research results in neurophysiology as well as our own observations of real-world images, it has been pointed out in the Introduction that edge pixels located in texture regions make a less significant contribution to the human perceptual recognition of object contours in images. Deriving from this idea, we modify the standard Hough voting scheme in such a way that for a pixel with strong surround suppression, its vote for the corresponding parameter bins in Hough space is of a small weight, while for a pixel located in a clear neighbourhood, and thus more likely to be part of salient and well-defined boundaries between heterogeneous objects in the images, a large weight should be assigned to its vote. In other words, we would like the vote cast by each pixel to be a monotonically decreasing weighting function for vote of its surround suppression. Note that the suppression slope can actually be regarded as an angle between 0° and 90°. Since we are expecting a monotonically decreasing weighting function for the votes, it is natural to use cosine function as the weighting function, i.e.,

vðx; yÞ ¼ cos wðx; yÞ:

ð8Þ

However, in order to make the effect of surround suppression on the vote weight controllable, we introduce a surround suppression strength factor a, and the actual weighting function becomes

vðx; yÞ ¼ cosa wðx; yÞ:

ð9Þ

2.3. Efficient computation of the isotropic surround suppression

Fig. 1. The weighting function xr for suppressive surround.

Unfortunately, the convolution in Eq. (6) calculating the isotropic surround suppression term is usually time consuming. For a

1244

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

normal setting of r = 1.5, the weighting function xr is a 25  25 kernel, a relatively large one for convolution computation. Since available schemes for quickening the voting and peak detection processes do not affect the computation of the surround suppression, this convolution becomes a major bottleneck in our algorithm. It is then important to compute this data efficiently for the sake of the overall speed of our algorithm. Consider again the weighting function in Eq. (2):

xr ðx; yÞ ¼ S½DoGr ðx; yÞ=kSðDoGr Þk1 ¼ c  S½DoGr ðx; yÞ;

ð10Þ

where

c ¼ 1=kSðDoGr Þk1

ð11Þ

is a constant. It is well known that convolution of a function f(x, y) with a 2D Gaussian

g 2D;r ðx; yÞ ¼

1 2pr2

 2  x þ y2 exp  2r 2

ð12Þ

can be decomposed into two successive convolutions of the function with a 1D Gaussian

  1 x2 g 1D;r ðxÞ ¼ pffiffiffiffiffiffiffi exp  2 ; 2r 2pr

ð13Þ

first along the x-axis, then along the y-axis, or mathematically,

ðf  g 2D;r Þðx; yÞ ¼ ¼

Z Z Z

f ðx  u; y  vÞg 2D;r ðu; vÞ du dv

Fðx; y  vÞg 1D;r ðvÞ dv;

ð14Þ

where

Fðx; yÞ ¼

Z

f ðx  u; yÞg 1D;r ðuÞ du:

ð15Þ

Since DoGr = g4r  gr is a linear combination of two Gaussians, its convolution with function f(x, y) is

DoGr  f ¼ g 4r  f  g r  f ;

ð16Þ

where the two convolutions on the right hand side can each be calculated through two successive decomposed convolutions with corresponding 1D Gaussian kernel, first along the x-axis and then the y-axis. Normally the convolution of f with a 1D kernel is much faster than that with the original large 2D kernel, and execution speed can thus benefit from the decomposition if the total execution time of the four fast convolutions is less than that of one slow convolution. Now for the weighting function xr, we can make use of kernel decomposition in a similar fashion. Let us rewrite xr as

xr ¼ c  SðDoGr Þ ¼ cðDoGr  DoGr Þ;

ð17Þ

1. Calculate the 1D kernels g1D,r and g1D,4r the 2D kernel DoG r, and the constant c by using Eqs. (11), (13), and (18). 2. Convolve the edge magnitude E with g1D,4r first along x-axis and then along y-axis, giving intermediate result T1. 3. Convolve the edge magnitude E with g1D,r first along x-axis and then along y-axis, giving intermediate result T2. 4. Convolve E with DoG r , giving intermediate result T3. 5. The isotropic surround suppression rr = c(T1  T2  T3). In our experiments, the overall execution time on the five fast convolutions was about a half less than that on one slow convolution using the original weighting function xr, leading in turn to a 25% gain in the running speed of our algorithm as a whole. 2.4. A Hough transform for line detection using the new voting scheme The Hough transform for line detection using the new voting scheme is done in a straightforward way. The main steps are listed as follows. Given an input intensity image I, the parameter r that determines the size and weighting function of the suppressing surround, the surround suppression strength factor a, and a neighbourhood radius e for local non-maximum suppression operation to extract peaks in Hough space: 1. Edge detection including thresholding and thinning, which gives a binary contour image C. The gradient components rx Ir and ryIr obtained as intermediate results during the edge detection process are also reserved. In order to make sure that as many important object contours as possible can be detected and present in C, the Canny detector (Canny, 1986) was employed in this study due to its robust performance and accurate edge localization. Actually, however, the new voting scheme can be applied on any binary contour images as long as the surround suppression on each contour pixel can be calculated. In fact, we will give an algorithm of using surround suppression both in contour detection, as given in (Grigorescu et al., 2004), and in the Hough voting procedure. 2. Calculate the isotropic surround suppression term rr according to the given r, using the method given in the last subsection. 3. Traverse the binary contour image C. For each edge pixel (x, y), calculate its vote weight v(x, y) from rr(x, y), E(x, y) and a using Eqs. (7) and (9). Increase the vote number of the corresponding parameter bins by v(x, y) in Hough space. 4. Peaks are extracted in the Hough space as local maxima in (2e + 1)  (2e + 1) neighbourhood. Line parameters corresponding to the peaks are output as the final result of the algorithm.

where

DoGr ¼ SðDoGr Þ

ð18Þ

is a function valued as the same as DoGr where DoGr is negative, and 0 otherwise. What we can gain from this rewriting is that when convolving xr with a function f, the convolution can be divided into two convolutions, one of f with DoGr, and the other with DoG r . The first convolution can further be done in the way we described earlier, through four fast convolutions with 1D kernels, and for the second one, though still a 2D kernel, the kernel is much smaller than the original kernel, xr, and thus also very fast. For example, for a setting of r = 1.5, which gives an xr of size 25  25, the 2D kernel of DoG r is just of size 3  3. We thus summarize the computation of isotropic surround suppression as following.

3. Results 3.1. Implementation and data preparation Three voting schemes, namely, the conventional Hough voting scheme, the voting scheme utilizing gradient information, and the proposed new scheme, were implemented by using MATLABÒ (Version 7.0 R14) and C-MEX hybrid programming. The C language coding was done in Borland C++ BuilderÒ (Version 5.0) integrated development environment and the code is compiled and built into DLL executables that could be called as subroutines in MATLAB. Two contour detection approaches were also implemented using MATLAB, one being the Canny detector provided by the MATLAB function edge, with minor modifications to retrieve necessary edge gradient information, and the other being the method proposed by

1245

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

Grigorescu et al. (2004), which can be practically regarded as a contour simplification stage using surround suppression following the standard Canny contour detection. The hybrid programming was adopted in order to take the advantages of the high efficiency of C programs and MATLAB’s built-in functions along with its simplicity and rapid development capability. It should be noted that in (Grigorescu et al., 2004), a same r is used both for smoothing the image and doing the surround suppression, while these two stages can actually be treated separately, i.e., one can use a small r, rc, for smoothing to maintain edge details, and use a relatively large r, rs, for surround suppression. In our experiments, three different (rc, rs) configurations were used, Configuration 1 being (1, 1), Configuration 2, (1.5, 1.5), and Configuration 3, (1, 1.5). Algorithms combining contour detection and voting approaches were built to carry out the comparison. These algorithms, along with their parameter settings, are listed in Table 1. The traditional normal angle (h)-normal distance(q) parameter space was used, and the Hough space was constructed with a angle resolution of 1° ranging from 0° to 179° and an distance resolution of 1 pixel. The computer used an IntelÒ PentiumÒ 4 CPU of 2.6 GHz frequency with 480 MB RAM. The operating system was WindowsÒ XP Professional (Version 2002 SP2). Experiments were done on a real-world image set consisting of 105 pictures with resolutions of either 644  483 or 483  644 pixels, where the main objects are buildings in natural environments or objects in indoor environments, some of whom are given later as examples. Line segments making the contours of objects, buildings or main structural components of buildings were selected by human experts and were stored as ground truth. Three experts were involved in the selection of ground truth line segments, each independently selecting the segments based on the RGB color images and the binary contour images. The selected segments were collected, and those whose ends were close enough to each other (in the sense of a predefined Euclidean distance tolerance) were merged into a single segment, its two ends being the centroids of the original end points. If a segment was selected by an expert, it held one vote. All those segments having two or three votes were directly saved as ground truth. Segments holding one vote were then presented to all the three experts for manual judgement. 3.2. Searching for matches to ground truth One straightforward way to find the best matching peak in the Hough space to a given line of parameter (h, q) is to find the peak nearest to it. This measure, however, can be misleading, especially when matching peaks to line segments. Consider the situation as illustrated in Fig. 2. The line segments s0, s1, and s2 are collinear and thus correspond to a same parameter point, say (h, q), in the Hough space. Suppose the nearest peak to (h, q) is (h, q). It can be clearly seen in Fig. 2 that depending on the actual position of a seg-

ment (the solid line segments in Fig. 2) on the line where it is located (the dotted line), the error between the segment and the matching counterpart line (the dashed line) can vary considerably. This implies that a direct nearest peak search in the Hough space with respect to the line parameter (h, q) of a given ground truth segment is not a proper measure. We therefore use the error defined below as measure of matching quality between a line segment and a line defined by parameter (h, q). The error is the average distance of points along the segment to the line (h, q). From Fig. 3 it is clear that this error is equivalent to the quotient of the area of region AA0 B0 B (shaded region in Fig. 3) between the segment and its projection onto the line l defined by the line parameter (h, q) divided by the length of the projection A0 B0 . Suppose the coordinates of ends of a segment are (xA, yA) and (xB, yB) in the image coordinate system, the normal angle-distance parameter pair of a line l is (h, q), and the coordinates of the image centre are (x0, y0) in the image coordinate system. From Fig. 3a and b it is very easy to derive the error measure err by simple geometry as

errðA; B; h; qÞ ¼

8 < 0:5  jdA þ dB j; : 0:5 

Contour detection approach

Voting scheme

H or HT G or Use

Conventional Canny, with default rc = 1

Ei, i = 1, 2, 3 V or SSHT

Canny, with default rc = 1 GradHT gradient. Voting slope range is ±25° Grigorescu’s, with the ith (rc, rs) configuration Canny, with default rc = 1

Ci, i = 1, 2, 3

Grigorescu’s, with the ith (rc, rs) configuration

Conventional Use surround suppression. r = 1.5, and a = 10 Use surround suppression. r = rs, and a = 10

d2A dA dB

dA dB P 0; ð19Þ

dA dB < 0;

;

where

di ¼ qi  q ¼ cos h  ðxi  x0 Þ þ sin h  ðy0  yi Þ;

i ¼ A; B

ð20Þ

are distances of the two segment ends A and B to their projections A0 and B0 on the line l, respectively. With this matching quality measure of average error, we search for the best matching peak in the Hough space to a given ground truth segment in the following way. The peak list given by peak detection is first sorted by the number of votes received by each peak in descending order. For each peak in the list, we define its rank as its position index in the sorted list, and thus the peak with the most votes is of rank 1. Let Te be an error tolerance. For the given ground truth segment, all peaks that give average errors with respect to the segment smaller than or equal to the tolerance are extracted, and the peak with the highest rank, or in other words, the most significant peak, among these acceptable peaks is reported as the best matching line to the ground truth segment. If there is no peak giving an error within the tolerance, the peak that gives the smallest error is reported as the best match. 3.3. Detection rate and accuracy Detection rates were measured as the percentage of correctly detected ground truth segments with respect to some user-defined Te, or hits, out of the total number of ground truth segments in our

s1

Table 1 Algorithms and parameter settings. Algorithm

d2A þd2B

s0 (θ, ρ)

(θ0, ρ0)

s2

Fig. 2. Different matching errors between a straight line (h, q) and different collinear line segments s0, s1, and s2.

1246

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

B

A

B’

l

A’

(a) B’

A O

l

B

A’

(b) Fig. 3. Error between a segment AB and a line l when (a) they have no intersection point and (b) they have an intersection point O.

image base. Accuracies were defined as the mean error over these hits. To see how the detected segments are distributed among the peaks reported, the Np highest ranked peaks were inspected, and those hits among them were used as the correctly detected segments under the specific Np. Detection rates are shown in Fig. 4, with peak detection parameter e set to 1, 3, and 5, number of peaks inspected Np set to 5, 10, 15, 25, 50, 100, and 1, and Te = 1.5. The Np value of 1 actually corresponds to the highest detection rate that each algorithm can achieve under the given e and Te. It can be seen from Fig. 4 that in terms of detection rate, methods using the new voting scheme (algorithms V and Ci) outperform other algorithms in almost all cases. When Np values are relatively small, say 15 or 50, the detection rate differences are significant. The differences clearly demonstrate the rank lifting effect of our new voting method. Applying surround suppression in the contour

detection phase to completely remove some pixels in texture regions can be helpful, as implied by the higher detection rate achieved by algorithms Ei than HT for Np values of no greater than 100. Nevertheless, using surround suppression in the voting phase can still lead to better detection performance, as shown by the better results of Ci algorithms. This is probably because the presence of a considerable amount of noisy pixels with relatively strong surround suppression effects which survive the contour detection phase because of the hysteresis thresholding used. Once having gotten through the contour detection, these noisy pixels can impact the Hough transform more greatly when the conventional voting scheme is used than when the new scheme is adopted. As a whole, the results imply that the ranks of hits obtained by using the new voting scheme are promoted significantly compared with the conventional one, or in our terms, the results are of better quality. It is also noteworthy that the ultimate detection rates achievable, i.e., detection rates for Np = 1, by algorithms E2 and C2 are lower than algorithms E1, E3, V, C1 and C3, which use the same contour detection and voting combination but with different parameter settings. This is due to the larger rc used for smoothing, which gives coarser contours and thus lead to the loss of some real lines. It is also shown that compared with the other algorithms, the ultimate detection rate performance of algorithm G is relatively low, indicating the failure of a considerable portion of edges on real lines to vote around their true parameter bin in Hough space. This is caused by incorrect estimation of gradient direction at these edge pixels due to the presence of imaging noise. This observation also reduces the plausibility of combining the voting scheme of our method and the use of gradient direction information to achieve better time performance while maintaining the promotion of result quality. For detection accuracy, all the algorithms exhibit similar performances. The mean error of the algorithms under all settings varies within 0.81 and 0.95, while the standard deviation of errors ranges from 0.31 to 0.38, making us assert that the differences between the detection accuracies are not drastic. For any specific algorithm,

Fig. 4. Detection rates of the algorithms.

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

the distribution of the errors seems like a uniform distribution but with some significant spikes. Since we are not sure about the distribution of the errors, statistical tests are not carried out. Anyway, we believe that the mean values and standard deviations of the errors should be enough to support our assertion. Following are results on some sample images from the image set. For all these results, the 10 most significant peaks reported by each algorithm are shown, with Te = 2.5 and e = 3. Hits are shown as solid line segments, and false positives are shown by dashed lines. Since our purpose of giving these resulting images is to illustrate the rank lifting effect of the new voting scheme, we provide results obtained by algorithms H(HT), G(GradHT), and V(SSHT) only for saving pages. Fig. 5 shows results on the image St. Andrews Cathedral. Noise pixels mainly come from the texture of walls, grass and flowers, whereas the most interesting lines are located on the boundaries between the cathedral walls and windows and the sky background. The impact of these noise pixels can be clearly seen in the result given by HT, where 9 out of the 10 most significant peaks reported

1247

are false positives produced by noise pixels. This impact is not obviously suppressed by GradHT. On the other hand, SSHT shows drastic improvements in the result, as shown in the figure. To observe the suppressing effects in Hough space given by GradHT and our method, a part of the final Hough space obtained by HT, GradHT and SSHT is shown in Fig. 6a–c, respectively. It can be clearly seen that HT gives a relatively very noisy distribution in the Hough space. GradHT suppresses some noisy distribution, but SSHT puts much more emphases on significant peaks, and the distribution in Hough space is much smoother. Figs. 7 and 8 show the results on similar images, where important lines are suppressed by complex texture regions such as walls, lawns, and wallpaper textures in results obtained by HT and GradHT, but are emphasized in those given by our method. Figs. 9 and 10 illustrate another type of surround suppression. In results given by both HT and GradHT, there are some perceptually reasonable lines. These lines are, however, more or less texture details of objects, rather than important lines that define the main objects in those images. It can thus be seen from these figures that,

Fig. 5. Comparison of results on the image St. Andrews Cathedral. Original image, binary edge image, and ground truth line segments are shown in (a), (b), and (c), respectively. Result given by HT, GradHT, and SSHT is illustrated in (d), (e), and (f), respectively. Hits are shown as solid line segments, and false positives are shown by dashed lines. The 10 highest ranked peaks are displayed.

Fig. 6. The part of Hough space defined by slope range of [101, 160] and intersection range of [70, 11] after voting. Results are given by (a) HT, (b), GradHT, and (c) SSHT, respectively.

1248

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

Fig. 7. Results on the image Calton Hill. (a) Original image. (b) Edge image. Results given by HT, GradHT, and our algorithm are shown in (c), (d), and (e), respectively.

Fig. 8. Results on the image Newstead Abbey 1. (a) Original image. (b) Edge image. Results given by HT, GradHT, and our algorithm are shown in (c), (d), and (e), respectively.

with the same number of most significant peaks for investigation, those resulting from SSHT give a better and more complete abstraction of the major objects in the scenes, which can be very useful when the post processing procedure following line detection is expensive.

3.4. Execution time Execution times of the algorithms when applied to each image in the image base were obtained using the profile function in MATLAB. The execution time includes the time spent on contour detec-

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

tion and Hough voting procedure, and on surround suppression computation and contour simplification if used. The peak detection in Hough space was regarded as a post processing after the different Hough transforms, and its speed varied with different e values. Besides, time consumed by peak detection is just a small fraction of the total time. We therefore omitted this stage in the speed tests for clarity. The results are given in Fig. 11. In terms of running speed, GradHT and algorithms using simplified contours are the best ones among the methods. These champion algorithms all take a much less voting time, with GradHT taking the advantage of accessing a much fewer Hough accumulator units, while Ei and Ci algorithms benefiting from a much small number of voting pixels. When detection rate and execution time are both concerned, the C algorithm, or the algorithms using surround suppression both in contour detection and Hough voting stage, provides a good choice. 3.5. Selection of parameters There are two parameters to determine in our algorithm, the suppressive surround radius r and the surround suppression strength a. We are more concerned with the effects of the parameters on detection quality, i.e., detection rate and accuracy. We tested a range of both parameters, and results on detection rate are shown in Fig. 12. In Fig. 12a it can be observed that for small a values, as a increases, detection rate obtained on the test image base also increases. After a reaches a certain level, here when a is equal to or greater than 10, however, the detection rate achieved is relatively stable, suggesting the establishment of relations between peaks in Hough space and their neighbouring suppressed surround. As for r, the detection rate is also stable when r value is relatively small, not greater than 2.5 in our case. But with the further in-

1249

crease of r, detection rate shows obvious decline. We will discuss this phenomenon later. Detection accuracy is, again, robust under different parameters. For results with varying a values, the mean detection error is within 0.84 and 0.91, and the standard deviation of detection error is from 0.34 to 0.38, and for results with different r settings, the ranges are 0.83–0.90 and 0.34–0.38, respectively. Now let us inspect the roles of parameters a and r in the new voting scheme. It should be noted that surround suppression not only influence the edge pixels in texture regions but also those on the clear contours. The key idea is that surround suppression does influence these two types of pixels with different strength. Though normally all edge pixels are suppressed somehow, those in texture regions are suppressed more than the ones on clear contours. Suppose there is a clear contour line adjacent to a texture region, pixels on this line then have larger voting weights than those in the texture region. These larger weights may or may not form a peak in the Hough space corresponding to the line, because there may be enough texture pixels to compensate the inferiority in voting weight. With a increasing, however, the advantages in voting weight of the line pixels are enhanced against the texture pixels to kill their superiority in pixel number, and the line peak will eventually emerge from the noisy surround. Once the dominance of the line peak is established, further increasing a will not bring extra benefits, and the detection rate is stabilized. On the other hand, suppressive surround radius r is obviously relevant to the nature of the texture to be suppressed: the sparser the texture, the larger r is supposed to be to include enough texture pixels for the suppression to work. If r is large enough for a certain texture configuration, the suppressive surround will commonly include more texture pixels in texture regions than on clear boundaries, thus achieve different suppression to allow the new voting scheme to do its job. But this does not mean that any large

Fig. 9. Results on the image Newstead Abbey 2. (a) Original image. (b) Edge image. Results given by HT, GradHT, and SSHT are shown in (c), (d), and (e), respectively.

1250

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

Fig. 10. Results on the image Facade of British Museum. (a) Original image. (b) Edge image. Results given by HT, GradHT, and SSHT are shown in (c), (d), and (e), respectively.

Fig. 11. Execution time of the algorithms shown by stages and as a whole.

r are equivalent. From Fig. 12b it can already be seen that a larger r value has negative effect on detection rate. Now we explain how this can happen. As shown in Fig. 13, suppose there is a texture region (the shaded rectangle) in an image with uniform edge magnitude over the region, and a clear boundary (the solid line) is located between the region and the rest part of the image, which has uniform gray levels, generating no edges at all. Let us place two suppressive surround disks with a same radius of r (the two dashes circles) in the image, one on the boundary, the other at a pixel in the texture re-

gion with a distance d to the boundary. The actual suppressive surround is a ring-shaped area, but we use disks here for simplicity without changing the conclusion that can be drawn. The surround suppression can be expressed through the areas of texture regions covered by the disks, i.e., the shaded part in each disks. The areas are

Ab ¼

1 pr2 ; 2

Ar ¼ pr2  r2 arccos

d

r

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 þ d r2  d :

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

1251

Fig. 12. Detection rate as function of (a) surround suppression strength a, and (b) suppressive surround radius r.

4. Conclusions

σ d Ar σ

Ab

Fig. 13. Surround suppression for analyzing the effects of suppressive surround radius r.

It can be shown that Ar > Ab for any r, meaning that the surround suppression at the boundary pixel is weaker than that at the texture pixel. Now consider the rate RSS(r) = Ab/Ar denoting the relative surround suppression strength between the two pixels. The derivative of the relative strength is

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 dRSS ðrÞ prd r2  d ¼ > 0; 2 dr Ar

r > d:

In this paper, we proposed a new Hough transform for line detection. The improvements mainly come from the modification to the standard Hough voting scheme. To suppress the impact of noise edges on accumulation of votes in Hough space produced by complex background or texture regions in images, surround suppression is utilized to assign different weights to votes of different edges according to the region in which they are located. Peaks formed by noise edges are thus lowered compared with those formed by clear edges, which often give the boundaries between different objects of interest. Experimental results on an image base show that our method leads to significant suppression of false peaks in Hough space and is thus effective. Though two parameters are required by our method, for certain category of images and certain resolution of the images, the parameters can be determined empirically in advance, which is advantageous for use of our method in fully automated line detection applications. Acknowledgements

The positive derivative implies that the relative surround suppression strength RSS(r) is a monotonically increasing function, meaning that although Ab is always less than Ar, the difference between them, and thus the suppression strength and the subsequent voting weight, growing smaller and smaller. In fact, the following equation can be easily proved

This paper is supported by Hunan University Science Foundation Grant No. 521101872. The authors would also like to thank the University of Nottingham and Hunan University Faculty Staff Training Programme for granting Siyu Guo a scholarship to visit the UK and undertake the research reported here.

r!1

lim RSS ðrÞ ¼ 1:

References

This means that with a very large r, the two pixels are practically suppressed by the same degree, and the r eventually cancels the effects of the new voting scheme. Though the impact of a too large r can be compensated somehow by a correspondingly large a value, the running speed of the algorithm, anyway, suffers an unnecessary loss. From the above considerations, we empirically chose r = 1 or 1.5 and a = 10 throughout our experiments, and observed relatively stable performance over different images in the image base.

Ballard, D., 1981. Generalised Hough transform to detect arbitrary shapes. IEEE Trans. Pattern Anal. Machine Intell. 13 (2), 111–122. Bandera, A., Perez-Lorénzo, J.M., Bandera, J.P., Sandoval, F., 2006. Mean shift based clustering of Hough domain for fast line segment detection. Pattern Recognition Lett. 27 (6), 578–586. Basak, J., Das, A., 2003. Hough transform network: A class of networks for identifying parametric structures. Neurocomputing 51, 125–145. Canny, J.F., 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. 8 (6), 679–698. Cha, J., Cofer, R.H., Kozaitis, S.P., 2006. Extended Hough transform for linear feature detection. Pattern Recognition 39 (6), 1034–1043.

1252

S. Guo et al. / Pattern Recognition Letters 30 (2009) 1241–1252

Chung, K.L., Chen, T.C., Yan, W.M., 2004. New memory- and computation-efficient hough transform for detecting lines. Pattern Recognition 37 (5), 953–963. Dattner, I., 2009. Statistical properties of the Hough transform estimator in the presence of measurement errors. J. Multivariate Anal. 100 (1), 112–125. Deb, D., Hariharan, S., Rao, U.M., Ryu, C.H., 2008. Automatic detection and analysis of discontinuity geometry of rock mass from digital images. Comput. Geosci. 34 (2), 115–126. Duda, R.O., Hart, P.E., 1972. Use of the Hough transform to detect lines and curves in pictures. Commun. ACM 15 (1), 11–15. Fernandes, L.A.F., Oliveira, M., 2008. Real-time line detection through an improved Hough transform voting scheme. Pattern Recognition 41 (1), 299–314. Furukawa, Y., Shinagawa, Y., 2003. Accurate and robust line segment extraction by analyzing distribution around peaks in Hough space. Computer Vision and Image Understanding 92 (1), 1–25. Grigorescu, C., Petkov, N., Westenberg, M.A., 2003. Contour detection based on nonclassical receptive field inhibition. IEEE Trans. Image Process. 12 (7), 729– 739. Grigorescu, C., Petkov, N., Westenberg, M.A., 2004. Contour and boundary detection improved by surround suppression of texture edges. Image Vision Comput. 22 (8), 609–622. Hough, P.V.C., 1960. Method and means for recognising. US Patent 3 069 654. Illingworth, J., Kittler, J., 1988. A survey of the Hough transform. Computer Vision Graphics and Image Understanding: Image Processing 44 (1), 87–116. Jones, H.E., Grieve, K.L., Wang, W., Sillito, A.M., 2001. Surround suppression in primate V1. J. Neurophysiol. 86 (10), 2011–2028. Kälviäinen, H., Hirvonen, P., Xu, L., Oja, E., 1995. Probabilistic and non-probabilistic Hough transforms: Overview and comparisons. Image Vision Comput. 13 (4), 239–252. Karabernou, S.M., Terranti, F., 2005. Real-time FPGA implementation of Hough transform using gradient and CORDIC algorithm. Image Vision Comput. 23 (11), 1009–1017.

Kimura, A., Watanabe, T., 2002. An extension of the generalized Hough transform to realize affine-invariant two-dimensional (2D) shape detection. In: Proc. ICPR’02, pp. 65–69. Knierim, J.J., van Essen, D.C., 1992. Neuronal responses to static texture patterns in area V1 of the alert macaque monkey. J. Neurophysiol. 67 (4), 961–980. Leavers, V.F., 1993. Which Hough transform? Computer Vision Graphics and Image Understanding: Image Processing 58 (2), 50–64. Leemans, V., Destain, M.-F., 2006. Line cluster detection using a variant of the Hough transform for culture row localisation. Image Vision Comput. 24 (5), 541–550. Li, H., Lavin, M.A., LeMaster, R.J., 1986. Fast Hough transform: A hierarchical approach. Computer Vision Graphics Image Process. 36, 139–161. Matas, J., Galambos, C., Kittler, J., 2000. Robust detection of lines using the progressive probabilistic Hough transform. Computer Vision and Image Understanding 78 (1), 119–137. Olsen, C.F., 1999. Constrained Hough transforms for curve detection. Computer Vision and Image Understanding 73 (3), 329–345. Palmer, P.L., Kittler, J., Petrou, M., 1997. An optimising line finder using a Hough transform. Computer Vision and Image Understanding 67 (1), 1–23. Sonka, M., Hlavac, V., Boyle, R., 1993. Image Processing, Analysis and Machine Vision, first ed. Chapman & Hall, London. pp. 80–81. Stephens, R.S., 1990. A probabilistic approach to the Hough transform. In: Proc. 1990 British Mach. Vis. Conf., pp. 55–60. Walsh, D., Raftery, A., 2002. Accurate and efficient curve detection in images: The importance sampling Hough transform. Pattern Recognition 35 (7), 1421–1431. Xu, L., Oja, E., 1993. Randomized Hough transform (RHT): Basic mechanisms, algorithms, and computational complexities. CVGIP: Image Understanding 57 (2), 131–154.