Contour grouping with prior models - IEEE Xplore

1 downloads 0 Views 3MB Size Report
James H. Elder, Member, IEEE, Amnon Krupnik, and Leigh A. Johnston, Member, IEEE ...... [41] M. Sigman, G.A. Cecchi, C.D. Gilbert, and M.O. Magnasco, “On a.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 25, NO. 6,

JUNE 2003

661

Contour Grouping with Prior Models James H. Elder, Member, IEEE, Amnon Krupnik, and Leigh A. Johnston, Member, IEEE Abstract—Conventional approaches to perceptual grouping assume little specific knowledge about the object(s) of interest. However, there are many applications in which such knowledge is available and useful. Here, we address the problem of finding the bounding contour of an object in an image when some prior knowledge about the object is available. We introduce a framework for combining prior probabilistic knowledge of the appearance of the object with probabilistic models for contour grouping. A constructive search technique is used to compute candidate closed object boundaries, which are then evaluated by combining figure, ground, and prior probabilities to compute the maximum a posteriori estimate. A significant advantage of our formulation is that it rigorously combines probabilistic local cues with important global constraints such as simplicity (no self-intersections), closure, completeness, and nontrivial scale priors. We apply this approach to the problem of computing exact lake boundaries from satellite imagery, given approximate prior knowledge from an existing digital database. We quantitatively evaluate the performance of our algorithm and find that it exceeds the performance of human mapping experts and a competing active contour approach, even with relatively weak prior knowledge. While the priors may be taskspecific, the approach is general, as we demonstrate by applying it to a completely different problem: the computation of human skin boundaries in natural imagery. Index Terms—Perceptual organization, grouping, contours, edges, graph search, Bayesian probabilistic inference, segmentation, remote sensing, skin detection.

æ 1

INTRODUCTION

P

ERCEPTUAL organization is the problem of grouping local image features that project from a common structure (e.g., an object) in the scene. The main tradition in both biological and computational vision research is to treat perceptual organization as a problem that can be solved in complete generality, in bottom-up fashion [1], [2], [3], [4]. In this view, it is assumed that visual context and history and the higher-level knowledge and goals of the perceiver have no role to play. To support this view, Kanizsa [3] has demonstrated that humans are able to perceptually organize artificial images composed of unfamiliar, nonsense shapes. On the other hand, to say that perceptual organization can proceed in some fashion without higher-level knowledge or top-down constraints is not to say that it always does and that such constraints, when available, are not exploited. Convincing arguments have been made over the years that the human visual system does make use of higher-level knowledge to speed and simplify the task of perceptual organization and to resolve ambiguities [5], [6], [7]. Our focus in this paper is to explore how such prior knowledge might be combined with more general grouping expertise, within a modern probabilistic framework. We argue that there are many practical scenarios in both human and machine vision where such prior knowledge may be available:

.

Visual search. In searching for an object, an observer has some knowledge of what she/he/it is looking for: sometimes only approximate information about

. J.H. Elder and L.A. Johnston are with the Centre for Vision Research, York University, 4700 Keele Street, North York, Ontario, Canada. E-mail: [email protected], [email protected]. . A. Krupnik is with the Department of Civil Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel. E-mail: [email protected]. Manuscript received 25 Oct. 2001; revised 23 Sept. 2002; accepted 24 Oct. 2002. Recommended for acceptance by D. Jacobs and M. Lindenbaum. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 118167. 0162-8828/03/$17.00 ß 2003 IEEE

shape and appearance, and, typically, poor knowledge of the object’s position and attitude. . Object recognition. In many recognition applications, the rough location of an object can be computed before recognition. This prior knowledge may then be useful in perceptually organizing the object to facilitate recognition. . Recurrent organization. One approach to locating objects of potential interest is to search for locations that maximize the correlation with a simple parametric template that can then serve as a very simple prior model to guide detailed grouping. The existence of high-bandwidth feedback pathways at multiple stages of visual cortex [8] suggests that this style of computation is biologically feasible. . Database Revision. There are numerous applications where existing models must be updated and/or refined. For example, an out-of-date Geographic Information System (GIS) model may be updated using recent high-resolution satellite imagery. In many machine vision applications not all objects are of equal interest. Some applications may target faces, others hands, or cars, or shadows, etc. A vision system for a robotic manipulator may be interested only in things of the scale of the end-effector. But, in most or all of these applications, perceptual grouping is an important part of the problem. We argue that it is important to have a general framework for grouping which accommodates prior domain knowledge as a kind of “plug-in” module. In this paper, we will focus on the particular problem of contour grouping, that is, of integrating the local edges or curve tangents that lie on a common contour in the image. Although the goal of contour grouping is sometimes defined as the inference of subsets of curve elements belonging to distinct curves [9], we define contours as sequences of local curve elements. This stronger inference directly supports the computation of higher-order curve properties (e.g., curvature) and leads naturally to standard representations (e.g., polygons, splines). Published by the IEEE Computer Society

662

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

There are numerous existing algorithms for partially grouping contour elements into disjoint contour fragments [10], [11], [12], [13], [14], [15], [16], [17] and such partiallygrouped representations have been shown to be useful for object recognition [15]. However, the goal of computing complete bounding contours has proven to be more elusive. While recent approaches exploiting the global property of contour closure have yielded limited success [18], [19], the general problem of computing the complete bounding contour of an object of arbitrary shape in a complex natural image remains essentially unsolved. We pose the problem of contour grouping as a problem of probabilistic inference: The goal of the computation is to compute the most probable closed contour that completely bounds the object of interest. We formulate a Bayesian inference problem based on a combination of probabilistic object cues providing information about whether individual tangents lie on the object boundary, with probabilistic grouping cues providing evidence about whether sequences of tangents should be grouped together. In order to use this information to estimate the most probable object boundary, we introduce an approximate, constructive search algorithm that allows critical global constraints to be applied throughout the search. Perceptual organization algorithms are often evaluated informally on somewhat arbitrary natural images. One goal of the present work is to objectively and quantitatively evaluate algorithm performance on real data. To this end, we apply our approach to the computation of exact lake boundaries from satellite imagery. We compile a ground truth database derived from hand segmentations of lakes from remotesensed imagery by eight human mapping experts and use this database to compare the performance of our algorithm both to expert human performance and to the performance of a more conventional active contour approach. We also demonstrate our algorithm on the problem of detecting the boundaries of major skin regions in natural imagery.

2

RELATED WORK

2.1 Contour Grouping Early contour grouping systems [20], [21] employed search trees through which optimal paths were estimated using dynamic programming or the A* algorithm. More recently, multiscale smoothness criteria have been used to impose organizations over image curves [10], [22], [23], [11]. Methods for computing local “saliency” measures based on contour smoothness and total arclength have been studied [24], [25]. Contour grouping constraints have also been employed to complement region-based segmentation techniques (e.g., [26]). One of the first attempts to place the problem of contour grouping in a probabilistic framework is due to Cox et al. [14], who employed a multiple-hypothesis tree representation, using a Bayesian approach to determine where to extend and where to prune the tree. Their system computes a segmentation in which each edge in the image is assigned to one of a number of contours or is labeled as a noise edge. The result is a set of incomplete, open contour fragments. While these fragments can potentially form the input to a number of higher-level vision modules, each is only a partial representation of an object boundary. Murino et al. [27] posed the problem of edge grouping in a Markov Random Field framework, using cues of parallelism, colinearity and convergence, and employing simulated

VOL. 25,

NO. 6,

JUNE 2003

annealing to compute good configurations. Castano and Hutchinson [28] proposed a hierarchical probabilistic model in which grouping was determined by colinearity and bilateral symmetry relationships. Amir and Lindenbaum [9] posed the problem as a maximum-likelihood graph partitioning problem in which contours are treated as sets of edges, rather than sequences. Williams and Jacobs [29], [30] advanced a theory for illusory contour shape and salience in which perceived paths are modeled as maximum likelihood random walks. This work was later extended [19] to the computation of closed curves. Lindenbaum and Berengolts [31] have recently shown that the original saliency algorithm of Sha’ashua and Ullman [24] can be reformulated in a statistical framework as an iterative algorithm for maximizing the expected contour length. Elder and Zucker [18] developed a probabilistic framework for contour grouping in which maximum-likelihood contours are computed using a shortest path (Dijkstra’s) algorithm. They have applied the approach to the unsupervised grouping of maximum-likelihood closed contours [18] and to computing the maximum likelihood contour connecting two interactively-selected edges within an image editing application [32]. This probabilistic approach has been adapted by Crevier [16], [17] to a multiscale framework that allows grouping of circular arcs as well as straight contour segments. Liu and Geiger [33] have also used a shortest-path approach to compute maximum-likelihood paths connecting automatically selected keypoints. In all of these approaches, the underlying probability distributions have been chosen with a combination of common sense and trial-and-error and so do not necessarily fit the natural image data on which the algorithms are expected to work. Clearly the probabilistic approach would benefit if these distributions were based on real empirical statistical data from natural images. Almost 50 years ago, Brunswik and Kamiya [34] proposed that the classical Gestalt principles of perceptual organization should be quantitatively related to the statistics of the natural world. However, their suggestion remained largely untested until 1998, when Kruger [35] first reported data on the secondorder spatial statistics of Gabor filter responses to natural images and Elder and Goldberg first described the ecological statistics of Gestalt laws for contour grouping [36], [37], [38]. More recently, there has been an interesting study of natural image statistics relevant to the problem of image segmentation [39] and three studies of natural image statistics relevant to the perceptual organization of contours [40], [41], [42]. In spite of this progress, very few probabilistic computer vision algorithms for contour grouping use real natural image statistics. A notable exception is Geman and Jedynak’s work on road tracking in satellite images [43]. In their system, simple nonlinear filters are used to locally test for roads and response distributions are learned from real data. Assuming independence and a uniform prior, the problem of road tracking is expressed as the maximization of a product of local likelihood ratios. Exact solution is intractable: instead, they build outward from their starting point a tree of possible tracks and use an entropy testing heuristic to extend the tree where uncertainty is maximal. This problem and probabilistic framework have been studied further by Yuille and Coughlan [44], who focus on the computational complexity of the probabilistic search task, determining bounds on algorithm performance. Like Geman and Jedynak, they seek to maximize a product of likelihood

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

ratios, but rather than an entropy testing heuristic they define a reward function involving the likelihood ratio for each partial contour hypothesis, partially corrected with a heuristic estimate of the reward for completing the contour. This approach has recently been applied to the grouping of contours between detected junctions [45], although this work was not based on real natural image statistics.

2.2

Other Approaches to Grouping: Active Contours and Region Segmentation All of the approaches described above attempt to compute contours by grouping together local edges into progressively larger sequences or sets. But there are at least two other welldeveloped approaches to computing closed contours. Active contour approaches [46] begin with an initial estimate of the closed contour of interest, typically represented as a spline. This spline is then adapted to minimize an energy functional that (hopefully) drives the contour to lie near the intensity gradient assumed to signal the boundary of the target object. In the segmentation approach [47], two-dimensional regions of the image are grown, split, and/or merged to maximize measures of homogeneity within regions and simplicity of region boundaries. Zhu and Yuille [48] have argued that these two approaches are closely related and can be unified within the frameworks of Bayesian theory and the Minimum Description Length principle. However, this relationship assumes that the target object is homogeneous (or, at least, slowly-varying) in some property, e.g., intensity or texture. In fact, if an active contour is initialized to be relatively near the actual boundary of the target object, its convergence will depend only upon the details of the image near that boundary and so no homogeneity condition on the interior of the object is required. Chesnaud et al. [49] have recently developed a statistical algorithm in which they assume this condition does apply. Specifically, they address the problem of segmenting a foreground object of constant intensity in a background of constant intensity, where both foreground and background are corrupted by noise from a known distribution with unknown parameters. The algorithm is initialized by hand: An operator draws a simple polygon roughly on top of the object of interest and the system is capable of computing a coarse polygonal approximation of the object boundary.

3

OUR APPROACH

Our approach to contour grouping differs from prior approaches in a number of important ways.

3.1

No Free Parameters: Basing Distributions on Natural Image Statistics As with other recent approaches, we will assume a modern probabilistic framework for perceptual grouping. In contrast to almost all previous probabilistic approaches [14], [29], [27], [28], [48], [18], [30], [33], [9], [49], [19], [16], [17], [45], our approach is grounded in the actual statistics of grouping cues and object properties in natural images [38]. Ours is also seemingly the first technique to exploit photometric as well as geometric grouping cues and the first to actually measure and compare the statistical power of the cues employed. Geman and Jedynak [43] do learn general response distributions for local “road detectors” from real data; however, their tracking model essentially has no grouping component. Rather, they assume that roads either continue

663

straight or bend to either side by 5 per segment, with uniform probability. Note that, while the segmentation approaches of Zhu and Yuille [48] and Chesnaud et al. [49] do allow the parameters of distributions to be estimated online, the functional distributions are fixed and are not verified against natural image data.

3.2 Computing Complete Contours Most contour grouping algorithms (e.g., [14], [27], [28], [9], [49], [19], [16], [17], [45]) group only partial contours: They are not able to compute complete boundaries. One key to computing complete contours is the effective use of the constraint of closure. Contour closure has been shown to be a powerful cue to perceptual grouping in human vision [2], [50], [51]. Jacobs [13], [15] has studied the problem of inferring highly-closed convex cycles of line segments from an image, to be used as input for a part-based object recognition strategy. More recently, closure has been demonstrated as a very potent cue in computer vision algorithms for grouping fairly simple contours [18], [19]. However, in this paper, we seek to compute highly complex contours with very rough, fractal-like boundaries, up to 1,000 segments in length, in images of up to 40,000 segments. 3.3 Application of Global Constraints Prior methods for computing complete contours [18], [19] use a constraint of closure, but their formulations prohibit the application of other important global constraints. For example, although the shortest path approach used by Elder and Zucker [18] guarantees an exact solution in polynomial time, the algorithm cannot accommodate constraints of completeness, simplicity (no self-intersections), or nontrivial scale priors. In this paper, we propose as an alternative an approximate, constructive search technique which finds a good (not necessarily optimal) solution, but which can accommodate these important constraints. 3.4 Heterogeneous Objects Although segmentation approaches (e.g., [48], [49]) guarantee a closed boundary, they require an assumption of stationary (or, at the very least, smoothly varying) statistics over the interior of an object. This is simply not realistic for many objects in the world, e.g., the human body (clothed or unclothed). In such cases, it is critical to accurately capture and exploit the statistical regularities of the object boundary. In our case, we are interested in computing the boundaries of lakes in satellite imagery and skin patches in natural imagery. In the first case, the statistics are not stationary because of islands and swampy areas. In the second case, hair and facial features break the homogeneity of skin patches. While segmentation approaches are unlikely to successfully group entire structures under these conditions, we will demonstrate that our contour-based approach can. 3.5 Weak versus Strong Priors Active contour approaches typically require good initial conditions to converge correctly. Even in simplified conditions [49] an explicit, initial polygon that substantially overlaps the target object must be provided. In our approach, the nature of the prior knowledge is much more flexible. While we will demonstrate the superiority of our approach for relatively strong priors providing both position and shape information, we will also demonstrate that our algorithm performs well with only very coarse size and/or photometric priors, making it much more useful for

664

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 25,

NO. 6,

JUNE 2003

Fig. 1. (a) Example test lake. Initial prior model shown in blue. Prior model after registration shown in red. (b) Edge map. (c) Map of posterior probability of object membership pðti 2 T o jdoi Þ for the four object cues combined. Intensity is inversely proportional to the posterior probability. Prior model shown in red. (d) Results: Black: nonlake edges, Blue: Example human expert trace. Red: Strong prior result. Where the human expert trace is not visible, it is coincident with the strong prior result.

“visual search” scenarios where the location of the object of interest is completely unknown. We will show how these priors can be rigorously combined with more generic grouping cues in a consistent probabilistic framework.

3.6 Quantitative Evaluation Contour grouping systems are typically evaluated subjectively and qualitatively. We are aware of only two exceptions. Borra and Sarkar [52] evaluated three existing edge grouping systems in terms of their utility for object recognition in aerial imagery. Williams and Thornber [53] evaluated six saliency measures for the detection of natural shapes in synthetic noisy line element images. However, to our knowledge, ours is the first contour grouping study in which the algorithm is directly evaluated (without reference to a specific high-level task), quantitatively and objectively, on natural image data, and compared against other systems, both human and machine.

4

FRAMEWORK

4.1 Definitions We assume an input representation in which contours are locally represented by variable length tangents. These tangents are formed in a two-stage process. First, edges are detected in the image using an adaptive multiscale edge detection algorithm [54]: An example edge map is shown in Fig. 1b. Second, a greedy algorithm is employed to group local edges into relatively short linear segments of variable length that closely approximate the underlying curve and whose position and orientation are represented to floatingpoint accuracy. The primary purpose of this second stage is

to reduce artifacts induced by the discrete pixel representation of the edge map. In addition to carrying position and orientation information, these tangents are augmented with estimates of the luminance intensity on both sides of the contour. Details of the representation can be found in [18]. The set of such tangents T in an image may be enumerated: T ¼ ft1 ; . . . ; tN g. The set S of possible contours may then be represented as tangent sequences:1 s ¼ ft 1 ; . . . ; t m g 2 S; iff : i 2 f1; . . . ; Ng; 8i 2 f1; ::; mg; i 6¼ j ; 8i 2 f1; ::; mg; j 2 f2; . . . ; m ÿ 1g; i 6¼ j: This definition restricts the mapping to be injective (tangents cannot be repeated), with the exception that the first and last tangent may be the same. In this case, the contour is closed. We assume that there exists a correct organization of the image C  S. C includes all tangent sequences that correspond to actual contours in the image and all subsequences of these. In other words, contours in C need not be complete. C includes the contour bounding the object(s) of interest, but also many other “distractor” contours that may exist in the image. We do not restrict these contours to be disjoint since there are many instances in real images where contours bounding distinct structures may share one or more contour elements. For example, in our GIS application, the visible boundary of a lake is in fact composed largely of trees and bushes that surround the lake. Thus there are many contour elements that may be 1. In our formulation, the local detection and grouping processes are decoupled: Contours are restricted to pass through precomputed tangents.

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

665

considered either part of the boundary of a single tree or part of the boundary of an entire lake. We define the set of object fragments S o such that s ¼ ft 1 ; . . . ; t m g 2 S o  S

iff

t i 2 T o ; 8i 2 f1; . . . ; mg;

where T o is the set of tangents on the boundary of the object of interest. We define the set of object contours C o as theTset of correctly organized object fragments, i.e., C o ¼ S o C. We assume that the object boundary co and therefore all object contours are simple (not self-intersecting).

4.2 Constructive Algorithm While the shortest path algorithm used in previous grouping approaches [18], [16], [17] is guaranteed to find an optimal solution, it cannot accommodate the constraints of simplicity (no self-intersections), completeness, or nontrivial global priors, all of which are important parts of the problem we address. We therefore propose as an alternative a heuristic algorithm that uses a greedy, monotonic strategy. While not guaranteed to find an optimal solution, it can handle these important global constraints. The proposed constructive method estimates the complete object boundary co from contours that are likely to be object contours. We begin by constructing the set S2 ¼ fs21 ; . . . ; s2n2 g of contours s2i of length m ¼ 2 tangents and estimating the posterior probability pðs2i 2 C o jDÞ that each is an object contour, given observed data D. The means for estimating these posterior probabilities are discussed in Section 4.4. The data cues we use in our particular application are detailed in Section 5.2. To limit memory and time complexity, we discard all but the N2 most probable contours C^2o and from these we construct a new set S3 ¼ fs31 ; . . . ; s3n3 g of length m ¼ 3 contours. This process is iterated to some maximum contour length m ¼ M tangents. Self-intersecting and closed contours may form at all stages m > 2; these are o o and the closed contours C^m are stored. subtracted from C^m Note that self-intersection is a global property that could not be handled easily using a shortest-path algorithm. In order to make the memory requirements for our algorithm invariant with respect to contour length, we fix the total number N of tangents represented at each stage (after pruning) to N ¼ 4; 000. This means that the number of contours Nm that are represented declines as the contour length m increases: Nm ¼ N=m. Approximate prior knowledge about the object of interest allows the maximum contour length M to be chosen in a principled manner (Section 5.3). In order to limit the time requirements of our algorithm, we have applied two additional constraints in our implementation. First, we only consider tangent pairs whose endpoints are within 20 pixels of each other. This constraint limits the time required to compute the pairwise grouping probabilities. Second, we select and store only the top 10 local grouping hypotheses for each tangent endpoint. This constraint limits the computation time required for the constructive search. 4.3

Computing Posterior Probabilities of Contour Hypotheses We seek to estimate pðs 2 C o jDÞ ¼

pðDjs 2 C o Þpðs 2 C o Þ : pðDÞ

ð1Þ

S Here, D ¼ Do Dc represents the set of observable cues available, where Do represents the set of object cues determining the likelihood that tangents lie on the boundary of the target object and Dc represents the grouping cues determining the likelihood that a sequence of tangents should be grouped together. These can, in turn, be broken down into local components: Do ¼ fd1 ; . . . ; dN g, where each di is a set of lo observations about tangent ti : di ¼ fd1i ; . . . ; dlio g. Similarly, Dc ¼ fdij g; i; j 2 f1; . . . ; Ng; i 6¼ j, where each dij is a set of lc observations about the relationship between tangents ti and tj : dij ¼ fd1ij ; . . . ; dlijc g. We assume that local cues di ; dij pertaining to a particular tangent or tangent pair are independent of cues pertaining to other tangents or tangent pairs when conditioned upon a contour hypothesis s 2 C o . We also assume that local cues are independent of all components of a contour hypothesis s except the tangent or tangent pair to which they directly pertain. Let I o represent the set of indices of all tangents in the image: I o ¼ f1; . . . ; Ng. Let I1o ¼ f 1 ; . . . ; m g represent the indices of tangents on a particular contour hypothesized to be on the boundary of the object. Let I2o represent the indices of tangents not on the contour: I2o ¼ I o n I1o . Similarly, let I c represent the set of index pairs of all tangent pairs in the image: I c ¼ f1; . . . ; Ng  f1; . . . ; Ng. Let I1c ¼ ff 1 ; 2 g; . . . ; f mÿ1 ; m gg represent the indices of directly successive tangent pairs on the hypothesized contour. Let I2c represent all other tangent pairs: I2c ¼ I c n I1c . We observe that the normalizing term pðDÞ in (1) can be Q Q replaced by i2I o pðdi Þ fi;jg2I c pðdij Þ without affecting the computation since both are constant over all hypotheses. Thus, we can expand (1) as: pðs 2 C o jDÞ / Q Q o o o i2I o pðdi js 2 C Þ ði;jÞ2I c pðdij js 2 C Þpðs 2 C Þ Q Q : i2I o pðdi Þ ði;jÞ2I c pðdij Þ Observing that di ; i 2 I2o and dij ; fi; jg 2 I2c are independent of the current hypothesis s 2 C o , the likelihood terms for the tangents not on the contour hypothesis cancel and we have: pðs 2 C o jDÞ / Q Q o o ð2Þ i2I1o pðdi jti 2 T Þ ði;jÞ2I1c pðdij jfti ; tj g 2 CÞpðs 2 C Þ Q Q : i2I o pðdi Þ ði;jÞ2I c pðdij Þ 1

1

Now, recall that the purpose of computing the posterior probabilities of contour hypotheses in the constructive phase is to decide which contours to prune and which to continue. In making this decision, the contours we compare all consist of the same number of tangents. Since the prior term pðs 2 C o Þ depends only upon Q the number of Q tangents in s,2 it can be o replaced by the term i2I o pðti 2 T Þ ði;jÞ2I c pðfti ; tj g 2 CÞ, as 1

1

2. Generally in this paper we use the terms “prior models” and “prior knowledge” to refer to task, domain, or application-specific information that may be applied to assist the grouping process. This must be distinguished from the technical use of the word “prior” in this section to refer to the unconditioned probabilities of the tangents or tangent sequences that we are trying to infer. In this technical usage, the “priors” do not involve any properties such as tangent position or orientation that must be measured from the image. Thus, most of the “prior knowledge” in the colloquial sense of the term is actually incorporated in the conditioned and unconditioned probability distributions for the observable cues di and dij .

666

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

both are constant over all hypotheses at each stage of the constructive computation. Substituting into (2): Y Y poi pcij ; ð3Þ pðs 2 C o jDÞ / F ðs; DÞ ¼ i2I1o

fi;jg2I1c

where poi ¼ pðti 2 T o jdi Þ ¼

pðdi jti 2 T o Þpðti 2 T o Þ pðdi Þ

ð4Þ

and pðdij jfti ; tj g 2 CÞpðfti ; tj g 2 CÞ : ð5Þ pcij ¼ pðfti ; tj g 2 Cjdij Þ ¼ pðdij Þ In summary, the monotonic nature of the constructive algorithm allows the posterior probability of each contour hypothesis to be represented by the product of local posteriors. We emphasize that, in this derivation, we have not assumed that the prior terms are independent. Indeed, in the evaluation stage of the computation, we will explicitly account for global priors that are not locally independent. It is simply the nature of the constructive algorithm that allows the local posteriors to be treated as independent.

4.4 Local Prior and Likelihood Ratios The local posterior probabilities can be rewritten as: poi ¼

1 1 þ ðLoi Pio Þÿ1

pcij ¼

pðs ¼ co jDÞ ¼

pðDjs ¼ co Þpðs ¼ co Þ : pðDÞ

Pio ¼

pðti 2 T o Þ pðti 62 T o Þ

ð6Þ

Again, we Q observe that Q we can replace the normalizing term pðDÞ by i2I o pðdi Þ fi;jg2I c pðdij Þ, without affecting the computation since both are constant over all hypotheses. Then (6) can be written as a product of three factors: pðs ¼ co jDÞ ¼ F  ðs; DÞB ðs; DÞP  ðsÞ F  ðs; DÞ ¼

1 þ ðLcij Pijc Þÿ1

Y pðdi js ¼ co Þ Y pðdij js ¼ co Þ pðdi Þ pðdij Þ i2I o ði;jÞ2I c 1

Y pðdi js ¼ co Þ Y pðdij js ¼ co Þ B ðs; DÞ ¼ pðdi Þ pðdij Þ i2I o ði;jÞ2I c 2

pðdki jti 2 T o Þ pðdki jti 62 T o Þ k¼1

JUNE 2003

where

1

where

Loi ¼

NO. 6,

determine the contour selected: 1) a foreground factor, reflecting the probability of the contour computed in the constructive phase, 2) a background factor, reflecting the probability that all other tangents lie in the background, and 3) a prior factor, that accounts for the number of tangents on the hypothesis. It is the background factor that embodies the completeness constraint: if the hypothesized contour does indeed represent the entire object boundary, it must be the case that all tangents that are not part of the contour do not lie on the object boundary. This constraint turns out to be the crucial factor in preventing a bias to short contours inherent in the foreground term. Specifically, the maximum a posteriori estimate c^o of the object boundary co is given by c^o ¼ arg maxs2C^o pðs ¼ co jDÞ, where

1

lo Y

VOL. 25,

Lcij ¼

Pijc ¼

lc Y

pðdkij jfti ; tj g pðdcij jfti ; tj g k¼1

2 CÞ 62 CÞ

pðfti ; tj g 2 CÞ : pðfti ; tj g 62 CÞ

Here, the observable data for each tangent di and pair of tangents dij have been broken down into distinct cues, assumed to be mutually independent.3 For example, object cues may be the expected intensities on either side of the tangent, while the grouping cues are the classical Gestalt cues [1], [2], [38], e.g., proximity, good continuation, similarity. These distributions are estimated from a set of training images in which instances of a specific object or class of objects appear in various contexts. The distributions can then be used to estimate novel instances in novel contexts. In Section 5.2, we detail the particular observable cues we employ in our GIS application and show how we estimate their distributions.

4.5

Evaluation Stage: Selecting the Most Probable Complete Boundary The constructive of the computation yields a set C^o ¼ S ^o S Sstage o o ^ ^ fC3 ::: CM g of closed, nonintersecting contours C4 that are relatively likely to lie on the boundary of the object of interest. From this set, we wish to select the contour most likely to form the complete boundary. Three factors will 3. See [38] for evidence of mutual independence for our encoding of the Gestalt grouping cues.

2

and  P ðsÞ ¼ pðs ¼ co Þ: The foreground term F  ðs; DÞ is derived from the posterior F ðs; DÞ computed in the constructive phase. Since I1o consists only of tangents on the contour hypothesis, and the current hypothesis is assumed to be the true object boundary (s ¼ co ), we have that Y pðdi js ¼ co Þ i2I1o

pðdi Þ

¼

Y pðdi jti 2 T o Þ : pðdi Þ i2I o 1

Similarly, since I1c consists only of successive tangent pairs on the contour hypothesis and the current hypothesis is assumed to be the true object boundary (s ¼ co ), we have that Y pðdij jfti ; tj g 2 CÞ Y pðdij js ¼ co Þ ¼ : Þ pðd pðdij Þ ij ði;jÞ2I c ði;jÞ2I c 1

1

Thus, for a contour hypothesis of length m, we have F  ðs; DÞ ¼ ðpo pc Þÿm F ðs; DÞ where po ¼ pðti 2 T o Þ and pc ¼ pðfti ; tj g 2 CÞ: To compute the background term B ðs; DÞ, we continue to assume that the contour hypothesis only influences grouping cues relating successive tangents on the hypothesized

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

contour. Thus, the second product over local grouping likelihoods vanishes and B ðs; DÞ simplifies to B ðs; DÞ ¼

Y pðdi jti 2 6 T oÞ : pðdi Þ i2I o 2

However, we can no longer make the same assumption for the object cues: given a complete contour hypothesis, all tangents not on the hypothesis must be explained by the background model. Thus, we have: Y ð1 ÿ poi Þ B ðs; DÞ ¼ ð1 ÿ po ÞmÿN i2I2o

Q o o ð1 ÿ pi Þ ¼ ð1 ÿ po ÞmÿN Qi2I o : i2I o ð1 ÿ pi Þ 1

Finally, we must consider the prior term P  ðsÞ which embodies both our prior knowledge about the number of tangents m likely to be on the object boundary and the number of possible contours of length m that could be formed from the N tangents in the image. Our prior pðmo Þ on the number of tangents on the object boundary will be derived from training data, modeled as a normal distribution, and the number of possible closed contours of length m in the image is given by N!ð2mðN ÿ mÞ!Þÿ1 .4 Thus, P  ðsÞ ¼ pðmo ¼ mÞ

2mðN ÿ mÞ! : N!

In practice, we find that the prior term has very little effect on the computation: It is the foreground and background terms that together determine the contour selected to represent the object boundary.

5

GIS APPLICATION

We first test our approach on the problem of computing exact lake boundaries from panchromatic satellite (IKONOS) imagery. We have approximate polygonal models of the boundaries of these lakes from an existing digital database. A prior registration stage is required to register these polygonal models to the IKONOS data. This registration was based on the observation that lakes encompass relatively few contours (save for those generated by islands, marshlands, etc.). Registration thus involves a search over the set of 2D translations and rotations to minimize the number of edge pixels within the polygon. Fig. 1a shows an example model before (blue) and after (red) registration.

5.1 Estimating the Priors We derived statistical models for the prior terms and likelihood distributions using data from seven lakes for which the bounding contours were hand-traced by a mapping expert on the edge map. The prior ratios Pio and Pijc for object membership and grouping are determined solely from the number of tangents in the image and the length of the prior model. The prior ratio for object membership Pio is simply the ratio of the number of tangents on the object boundary to the number off the object boundary. Of course we do not know the exact number of tangents on the boundary, but we can use 4. The factor of 2m arises because each closed contour can be represented 2m ways, starting at any one of the m tangents and traversing in either clockwise or counterclockwise directions.

667

the prior model to estimate this. From our training data, we find that the relationship between the length of the polygonal model and the number of tangents on the human-traced object boundary for our training data is nearly linear, with a Pearson correlation of 0.97. Thus, we use the training data to  and standard deviation  of the ratio of estimate the mean the number of tangents on the lake boundary to the length of  ¼ 0:28 tangents/pixel,  ¼ 0:06 the polygonal model: tangents/pixel. Given a novel image, we estimate the ^ o on the lake boundary to expected number of tangents m o L, and the standard deviation to be mo ¼  L, ^ ¼ be m where L is the length of the prior model (in pixels). We then ^ o =ðN ÿ m ^ o Þ, where N is the total number have that Pio ¼ m ^ o and mo define the prior on of tangents in the image and m contour length used in the evaluation stage (Section 4.5). We estimate the prior ratio Pijc for grouping by assuming that, on average, each tangent groups with one other tangent in each direction. Note that this is simply an estimate of the expected case and does not imply that every tangent in the image will group with exactly one tangent in both directions. This assumption implies that Pijc ¼ 1=ðN ÿ 2Þ, where N is the total number of tangents in the image.

5.2 Estimating Likelihoods and Posteriors It is common to use Gaussian, log Gaussian, or exponential models for probability density functions (PDFs) in the computational modeling of perceptual processes. However, we have found that statistical distributions of grouping cues are typically kurtotic [38]. To model these distributions, we employed a generalized Laplacian distribution that has been used in the past to model kurtotic wavelet response histograms [55], [56], [57]:

pðxÞ ¼ Aeÿðcjxÿj=Þ ; where sffiffiffiffiffiffiffiffiffiffiffiffiffiffi ÿð3= Þ : c¼ ÿð1= Þ and A¼ A¼

c 2ÿð1= Þ c ÿð1= Þ

if x is defined on ðÿ1; 1Þ if x is defined on ½0; 1Þ:

This distribution is symmetric and unimodal.  is the mean of the distribution,  is the standard deviation, and

2 ð0; 1Þ controls the kurtosis. If ¼ 2, the distribution is Gaussian. If < 2, the distribution has positive kurtosis. If

> 2, the distribution has negative kurtosis, approaching a uniform distribution as ! 1. Given a target kurtosis calculated from the data, the required is found using standard nonlinear optimization techniques. Each of the object and grouping cues is assumed to be independent5 so that the joint likelihood for the cue vector for a given tangent or pair of tangents is simply the product of the individual likelihoods. In this application, as object cues dki for tangent ti , we use: 1. 2.

The intensity on the dark side of ti : Lakes tend to be dark in panchromatic IKONOS imagery. The distance between ti and the nearest tangent in the direction opposite to the local intensity gradient

5. For evidence of the independence of the classical Gestalt grouping cues, see [38].

668

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 25,

NO. 6,

JUNE 2003

Fig. 2. Likelihood and posterior distributions for the four object cues and four grouping cues. Empirical data shown in blue, models shown in red.

(i.e., toward the interior of the lake if ti is on the lake boundary): Lakes tend to have few interior contours. 3. The distance between ti and the nearest point on the model. 4. The angle of ti with respect to the nearest model segment. Likelihood distributions for tangents on and off the lake boundaries and the resulting posterior distributions for all four object cues are shown in Fig. 2. The difference between the distributions for tangents on and off the boundary attests to the useful information in each of these cues. Fig. 1c shows a map of the posterior probability of object membership pðti 2 T o jdoi Þ for the four object cues combined. The intensity of the tangents is inversely proportional to the probability that they lie on the lake boundary. It is clear that prior knowledge of the object of interest focuses the problem considerably.

As grouping cues dij between tangents ti and tj , we use: The distance (tip to tail) between the tangents (proximity cue). 2. A measure of good continuation [1], [2], [18] between the two tangents, approximated as the absolute value of the sum of the angles formed by a linear interpolant. 3. The change in the mean intensity (brightness) between the two tangents. 4. The change in edge contrast between the two tangents. Likelihood distributions for grouped and nongrouped tangent pairs and the resulting posterior distributions for all four grouping cues are shown in Fig. 2. Again, the degree to which the distributions for grouped and nongrouped tangents differ indicates the strength of the cue. 1.

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

669

Fig. 3. (a) Inferential power of the object cues. (b) Inferential power of the grouping cues. (c) Number of closed contours computed as a function of contour length (in tangents) for a typical lake in the test database. (d) Quantitative results of grouping algorithm and active contour approach. Strong prior uses all object cues, including the GIS polygon. Weak prior uses only the priors that do not depend on the GIS polygon, as well as an approximate estimate of the length of the boundary.

5.3 Selecting the Maximum Contour Length The approximate polygonal model can be used to select the maximum contour length M considered in the search for the object boundary. In Section 5.2, we noted the linear ^ o of tangents on the relationship between the number m L,  ¼ 0:28, ^ o ¼ lake and the length L of the prior model: m  ¼ 0:06 tangents/pixel. Given a new image and new prior model of length L pixels, we estimate the 95th percentile upper bound M on the number of tangents on the boundary  þ 2 Þ. as M ¼ Lð An interesting pattern is observed in the number of closed contours generated at each contour length in our constructive algorithm: Fig. 3c shows an example. A large number of closed contours less than 10 tangents in length are generated; this is presumably due to the large number of small contours represented and the numerous complete boundaries of trees located near the lakeshore. A smaller number of closed contours (including the maximum probability contour) between 280 and 300 tangents in length are also found. No closed contours between 10 and 280 tangents in length are computed. In a typical case, we find between 1,000 and 2,000 very short closed contours and on the order of 100 closed contours of the scale of the object. The preponderance of relatively short closed contours underlines the importance of the completeness criterion (Section 4.5) in avoiding a bias toward short contours.

5.4 Inferential Power of Object and Grouping Cues We selected cues based upon our intuitions about which observable properties would be most powerful in discriminating object tangents from nonobject tangents and tangents that should be grouped from those that should not. One advantage of our probabilistic framework and the groundtruth database is that they allow us to actually quantify the inferential power of each of these cues. We describe our method for measuring statistical power for the grouping cues: the object cues are treated in similar fashion. To measure the statistical power of each cue for the grouping of two tangents fti ; tj g, we define a random variable Gij 2 fG1 ; G2 g to represent the actual grouping relationship between tangents ti and tj , where G1 represents the event fti ; tj g 2 C and G2 represents the event fti ; tj g 62 C. Prior to making any observations about the relationship between the two tangents, there is an inherent uncertainty in the decision variable Gij that can be formally characterized by its informational entropy HðGij Þ [58]: HðGij Þ ¼ ÿpðG1 Þ log2 pðG1 Þ ÿ pðG2 Þ log2 pðG2 Þ: If it were the case that two arbitrary tangents were as likely to be grouped as not, the prior entropy of the grouping decision would be exactly one bit. However, since it is far more likely that two arbitrary tangents are not grouped, the

670

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

prior entropy is much lower. If we assume that, on average, a tangent groups with one other tangent in each direction,6 then pðG1 Þ  1=n and pðG2 Þ  1 ÿ 1=n, where n is the number of tangents in the image. For our sample of images, in which the mean number of tangents per image is roughly 5,000, the prior entropy HðGij Þ of the grouping decision is roughly 0.0027 bits. Grouping cues provide knowledge about the relationship between tangents that can reduce the uncertainty in the grouping decision. The size of this reduction in uncertainty can be characterized by the mutual information IðGij ; dij Þ between the grouping decision Gij and the cue dij : Z pðG1 ; dij Þ ddij IðGij ; dij Þ ¼ pðG1 ; dij Þ log2 pðG1 Þpðdij Þ Z pðG2 ; dij Þ ddij : þ pðG2 ; dij Þ log2 pðG2 Þpðdij Þ Given observation of the grouping cue dij , the remaining uncertainty in the grouping decision Gij is given by HðGij jdij Þ ¼ HðGij Þ ÿ IðGij ; dij Þ: Thus, the mutual information measures the information, in bits, that the cue dij provides about the decision Gij of whether to group tangents ti and tj . Normalizing the mutual information IðGij ; dij Þ by the prior uncertainty HðGij Þ yields the percentage reduction in informational uncertainty provided by the cue. Fig. 3a shows the information provided by each object cue in deciding whether tangents are on or off the lake boundary. Interestingly, the two most important cues are appearance cues (1 and 2): the intensity and distance to the nearest neighboring tangent on the dark side of a tangent. Of less, but still significant importance, are the position cues (3 and 4): the distance to the prior model and the relative orientation of the tangent with respect to the nearest model segment. Fig. 3b shows the information provided by each grouping cue in deciding whether two tangents should be grouped. Proximity is by far the most powerful cue. Good continuation and brightness similarity cues are of less value and the contrast cue is of almost no value.

6

PERFORMANCE EVALUATION

We tested the algorithm on five novel lakes. These images are relatively complex: The number of tangents in an image averaged 19,000, ranging from roughly 3,000 to 42,000. The number of tangents on the lake boundary averaged 490, ranging from roughly 100 to 1,000. Computation time is roughly linear with the number of tangents in the image: The algorithm takes an average of 15 minutes on a 1 GHz Pentium to process each of these relatively complex test images. Fig. 1d shows an example result of our algorithm (red), together with a contour traced by a human observer (blue). Note that, for most of the contour, the human and machine observers agree almost exactly, but there are key ambiguous regions where the contours diverge significantly. Results for three of the test lakes are shown in Fig. 4 (labeled “Strong Prior”). 6. This is a statement of the expected case and does not preclude contour terminations, where a tangent does not group at all in one direction, nor bifurcations, where a tangent groups with more than one other tangent in one direction.

VOL. 25,

NO. 6,

JUNE 2003

6.1 Ground-Truth Database An ideal ground-truth database would be obtained by a field survey of the actual lake perimeters. To obtain such data at the resolution of our satellite imagery (1 m) is impractical. Typically, such high-resolution contours are mapped by hand from remote-sensed imagery. Thus, for this study, we constructed a ground-truth database from hand segmentations of lake boundaries on the image data by eight human mapping experts.7 Each expert was instructed to trace the perimeter of each of the test lakes using the software tool of their choice. We found that human experts varied considerably in their segmentations, and that these variations were not uniformly distributed, but concentrated at ambiguous areas, such as marshland. Given this nonuniform variability in human consistency, we used the following method for evaluating machine performance: For each lake, we computed the deviation between each point placed by the observer and the nearest point on the contour computed by the algorithm. We then normalized this deviation by the mean deviation between the point and the contours traced by all other human observers. The median of these normalized deviations over all points and all observers was then computed for each lake. The resulting median relative error is less than 1 if the algorithm is more consistent with the human experts than they are with each other and greater than 1 if the algorithm is less consistent. Fig. 3d shows the quantitative results of the algorithm (labeled Grouping (Strong Prior)) for all five test lakes. We found a median relative error of less than 1 for all lakes, indicating that the algorithm is more consistent with the human experts than they are with each other. 6.2 Contour Grouping with Weak Priors It is of interest to know how the algorithm would fare when given far weaker prior knowledge. In particular, we are interested in the case when the two cues based on the distance and angle between each tangent and the GIS model (cues 3 and 4) are not available, so that the system has no prior knowledge of the shape, position, or pose of the object. To test this scenario, we turn off these cues, but continue to use the first two cues (the intensity on the dark side of the tangents and the distance to the nearest tangent on that side). Note that these two cues are general to all lakes and do not depend upon any prior knowledge of the specific lake to be grouped. We also augment these two general cues with a new third cue that embodies weak prior knowledge about the size of the object, obtained by estimating the length of the bounding contour from the length of the GIS polygon. This gives the length of the contour to within roughly 20 percent accuracy but provides no information about location or shape. This knowledge of the approximate size of the object allows the distance between each pair of tangents on the bounding contour to be very roughly predicted and when incorporated into the contour probability (3), has the effect of pruning paths that wander off with no sign of returning. 7. Note that these test lakes were traced in the image domain. In contrast, the lake contours used to estimate the object and grouping statistics were traced by a single expert (one of the authors) in the contour domain, using an in-house software tracing tool. This gave us direct statistics on the estimated tangents of the contours.

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

671

Fig. 4. Results on three test images.

Fig. 4 shows results of the Weak Prior formulation of the algorithm for three of the test lakes. The quantitative results are shown in Fig. 3d. While the median error is greater than when a strong prior is used, the algorithm still performs about as well as the average human expert.

6.3 A Competing Approach: Active Contours Active contour models are widely used for approximating contours in computer vision [46], [59] and remote sensing applications [60], [61]. One advantage of the active contour approach is that one can initialize the model as a closed contour and thus maintain the constraint of closure throughout the computation. In our constructive approach, on the other hand, we begin only with a large number of open contour fragments and then search for promising closed contours.

We tested a public-domain implementation of active contours,8 using the prior GIS model as an initial contour. The algorithm is governed by 14 parameters, such as resistance to tension and bending, attraction to image features, feature threshold, image smoothing, etc.; these parameters were tuned by trial-and-error to optimize performance. Fig. 4 shows results of this active contour algorithm for three of the test lakes. Quantitative performance is shown in Fig. 3d. We find that, on average, the active contour performance is about 68 percent worse than the strong-prior formulation of our grouping algorithm and about 29 percent worse than the weak-prior formulation. This is in spite of the 8. Available from http://www.s2.chalmers.se/ghassan/phd/illus/ snakes/index.htm. We selected this implementation on the basis of its availability and completeness. Other active contour algorithms may produce different results.

672

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 25,

NO. 6,

JUNE 2003

Fig. 5. Application to skin boundary detection. The two most probable complete closed contours bounding skin regions are shown for each image.

fact that the GIS polygon is given to the active contour implementation as an initial condition, but is not given to the weak-prior version of our grouping algorithm. It is possible that a region-based approach [49] might perform better on this application, although, due to the presence of islands and marshes within the lakes, the homogeneity condition required by such an approach is not exactly satisfied in this application. However, such an approach would still require good initialization (strong position and shape priors). Also, our approach has the potential to generalize to the demarcation of much less homogeneous objects.

7

GENERALIZING

TO

OTHER APPLICATIONS

One of the main goals of this work is to determine how prior knowledge about the grouping target can be integrated with more general grouping cues within a rigorous probabilistic framework. While the priors may change over

time, depending upon the goals of the perceiver, the probabilistic machinery and algorithm remain the same. To illustrate this, we demonstrate the approach on a completely different application: that of finding the boundaries of major skin regions in natural images. Due to space limitations, we describe this application in less detail: we wish only to illustrate the generality of the approach. The test and training images are a selection from the Corel Photo Library. As object cues, we use only the RGB color on either side of the tangents [62]. The likelihood functions for skin and nonskin color channels are modeled by fourth order Gaussian mixture models, trained on a set of five images from the database. We derive the statistics for proximity and good continuation by manually tracing the skin region boundaries on the set of five training images. Note that this is weak prior knowledge: The algorithm is given no information on the location, shape, pose, or size of the objects to be grouped.

ELDER ET AL.: CONTOUR GROUPING WITH PRIOR MODELS

To compute multiple contours, after each maximum a posteriori contour is computed, the probabilities of the associated tangents are set to zero and the algorithm is iterated. Fig. 5 shows the top two bounding contours computed for the example images. Complete contours bounding the two skin regions in each image are computed, despite the heterogeneities within these regions. Skin-colored background regions are ignored (e.g., the wood surfaces in the background of Fig. 5b). This was made possible by combining the color object cue with the grouping cues that exploit the geometric regularity of the boundaries. An alternative to using separate statistical models for separate types of imagery (e.g., satellite data, natural imagery) is to develop adaptive “bootstrapping” models, capable of estimating grouping statistics online. We hope to explore this idea in future work.

673

[7]

[8]

[9]

[10] [11]

[12]

[13]

[14]

8

SUMMARY

We have introduced the problem of grouping complete bounding contours using strong or weak prior knowledge of the objects of interest and have shown how the problem can be formulated as a problem of Bayesian inference. We developed an approximate constructive search technique that computes candidate object boundaries. This approach allows important global constraints, such as simplicity, completeness, and nontrivial priors to be applied. Candidate boundaries are evaluated rigorously using foreground and background data as well as prior probabilistic knowledge of the object size. We demonstrated the effectiveness of the method on the problem of computing exact lake boundaries from highresolution satellite imagery. We found that the algorithm outperformed both human mapping experts and a competing active contour model. We also demonstrated the generality of the approach by applying it to the problem of skin segmentation. The algorithm is capable of combining object and grouping cues to segment complete regions of human skin in natural images despite heterogeneities within these regions. The approach is essentially parameter-free: all distributions are based upon the measured statistics of natural images.

[15]

[16]

[17]

[18] [19]

[20] [21] [22]

[23]

[24]

[25]

ACKNOWLEDGMENTS This work was supported by the Natural Sciences and Engineering Research Council of Canada, the Canadian GEOIDE Network of Centres of Excellence, and the Ontario Centre for Research in Earth and Space Technology.

[26]

REFERENCES

[28]

[1] [2] [3] [4] [5] [6]

M. Wertheimer, “Laws of Organization in Perceptual Forms,” A Sourcebook of Gestalt Psychology, W. D. Ellis, ed. pp. 71-88, London: Routledge and Kegan Paul, 1938. K. Koffka, Principles of Gestalt Psychology. New York: Harcourt, Brace and World, 1935. G. Kanizsa, Organization in Vision. New York: Praeger, 1979. S.W. Zucker, K.A. Stevens, and P. Sander, “The Relation between Proximity and Brightness Similarity in Dot Patterns,” Perception and Psychophysics, vol. 34, no. 6, pp. 513-522, 1983. I. Rock, The Logic of Perception. Cambridge, Mass.: MIT Press, 1983. P. Cavanagh, “What’s Up in Top-Down Processing?” Representations of Vision: Trends and Tacit Assumptions in Vision Research, A. Gorea, ed. pp. 295-304, Cambridge, U.K.: Cambridge Univ. Press, 1991.

[27]

[29]

[30]

[31]

[32]

P. Cavanagh, “Top-Down Processing in Vision,” MIT Encyclopedia of Cognitive Science, R.A. Wilson and F.C. Keil, eds. pp. 844-845, Cambridge, Mass.: MIT Press, 1999. K.S. Rockland, K.S. Salem, and K. Tanaka, “Divergent Feedback Connections from Areas V4 and TEO in the Macaque,” Visual Neuroscience, 1994. A. Amir and M. Lindenbaum, “Quantitative Analysis of Grouping Processes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 2, pp. 168-185, Feb. 1998. D.G. Lowe, “Organization of Smooth Image Curves at Multiple Scales,” Int’l J. Computer Vision, vol. 3, pp. 119-130, 1989. E. Saund, “Labelling of Curvilinear Structure Across Scales by Token Grouping,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 817-830, 1992. D.P. Huttenlocher and P.C. Wayner, “Finding Convex Edge Groupings in an Image,” Int’l J. Computer Vision, vol. 8, no. 1, pp. 727, 1992. D.W. Jacobs, “Robust and Efficient Detection of Convex Groups,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 770771, 1993. I.J. Cox, J.M. Rehg, and S. Hingorani, “A Bayesian MultipleHypothesis Approach to Edge Grouping and Contour Segmentation,” Int’l J. Computer Vision, vol. 11, no. 1, pp. 5-24, 1993. D.W. Jacobs, “Robust and Efficient Detection of Salient Convex Groups,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 23-37, Jan. 1996. D. Crevier, “A Probabilistic Method for Extracting Chains of Collinear Segments,” Computer Vision and Image Understanding, vol. 76, no. 1, pp. 36-53, Oct. 1999. D. Crevier, “Bayesian Extraction of Collinear Segment Chains from Digital Images,” Recent Advances in Perceptual Organization for Artificial Vision Systems, K.L. Boyer and S. Sarkar, eds. pp. 163-171, Kluwer, 2001. J.H. Elder and S.W. Zucker, “Computing Contour Closure,” Proc. Fourth European Conf. Computer Vision, pp. 399-412, 1996. S. Mahamud, K.K. Thornber, and L.R. Williams, “Segmentation of Salient Closed Contours from Real Images,” Proc. IEEE Int’l Conf. Computer Vision, pp. 891-897, 1999. U. Montanari, “On The Optimal Detection of Curves in Noisy Pictures,” Comm. ACM, vol. 15, no. 5, pp. 335-345, May 1972. A. Martelli, “An Application of Heuristic Search Methods to Edge and Contour Detection,” Comm., vol. 19, pp. 73-83, 1976. P. Parent and S.W. Zucker, “Trace Inference, Curvature Consistency, and Curve Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, pp. 823-839, 1989. G. Dudek and J.K. Tsotsos, “Recognizing Planar Curves Using Curvature-Tuned Smoothing,” Proc. 10th Int’l Conf. Pattern Recognition, 1990. A. Sha’ashua and S. Ullman, “Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network,” Proc. Second Int’l Conf. Computer Vision, pp. 321-327, 1988. T.D. Alter, “The Role of Saliency and Error Propagation in Visual Object Recognition,” PhD thesis, Massachusetts Inst. of Technology, May 1995. T. Leung and J. Malik, “Contour Continuity in Region Based Image Segmentation,” Proc. Fifth European Conf. Computer Vision, June 1998. V. Murino, C.S. Regazzoni, and G.L. Foresti, “Grouping as a Searching Process for Minimum-Energy Configurations of Labelled Random Fields,” Computer Vision and Image Understanding, vol. 64, no. 1, pp. 157-174, 1996. R.L. Castano and S. Hutchinson, “A Probabilistic Approach to Perceptual Grouping,” Computer Vision and Image Understanding, vol. 64, no. 3, pp. 399-419, Nov. 1996. L. Williams and D. Jacobs, “Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience,” Proc. Fifth Int’l Conf. Computer Vision, pp. 408-415, 1995. L. R. Williams and D. W. Jacobs, “Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience,” Neural Computation, vol. 9, no. 4, pp. 837-858, 1997. M. Lindenbaum and A. Berengolts, “A Probabilistic Interpretation of the Saliency Network,” Proc. European Conf. Computer Vision, vol. II, pp. 257-272, 2000. J.H. Elder and R.M. Goldberg, “Image Editing in the Contour Domain,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 291-296, Mar. 2001.

674

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

[33] T.L. Liu and D. Geiger, “Visual Deconstruction: Recognizing Articulated Objects,” Proc. Energy Minimization Methods in Computer Vision and Pattern Recognition, vol. 1223, pp. 295-309, 1997. [34] E. Brunswik and J. Kamiya, “Ecological Cue-Validity of ‘Proximity’ and of Other Gestalt Factors,” Am. J. Psychology, vol. LXVI, pp. 20-32, 1953. [35] N. Kruger, “Collinearity and Parallelism Are Statistically Significant Second Order Relations of Complex Cell Responses,” Neural Processing Letters, vol. 8, pp. 117-129, 1998. [36] J.H. Elder and R.M. Goldberg, “The Statistics of Natural Image Contours,” Proc. IEEE CS Workshop Perceptual Organization in Computer Vision, June 1998. Available from http:// marathon.csee.usf.edu/sarkar/pocv_program.html. [37] J. H. Elder and R. M. Goldberg, “Inferential Reliability of Contour Grouping Cues in Natural Images,” Perception, vol. 27 (supplement), p. 11, Aug. 1998. [38] J.H. Elder and R.M. Goldberg, “Ecological Statistics of Gestalt Laws for the Perceptual Organization of Contours,” J. Vision, vol. 2, no. 4, pp. 324-353, 2002. http://journalofvision.org/2/ 4/5/, DOI10.1167/2.4.5. [39] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics,” Proc. Eighth Int’l Conf. Computer Vision, pp. 416-424, 2001. [40] W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly, “Edge CoOccurence in Natural Images Predicts Contour Grouping Performance,” Vision Research, vol. 41, no. 6, pp. 711-724, Mar. 2001. [41] M. Sigman, G.A. Cecchi, C.D. Gilbert, and M.O. Magnasco, “On a Common Circle: Natural Scenes and Gestalt Rules,” Proc. Nat’l Academy of Sciences, 2001. [42] M. Kaschube, F. Wolf, T. Geisel, and S. Lowel, “The Prevalence of Colinear Contours in the Real World,” Neurocomputing, vols. 3840, pp. 1335-1339, 2001. [43] D. Geman and B. Jedynak, “An Active Testing Model for Tracking Roads in Satellite Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 1-14, Jan. 1996. [44] A.L. Yuille and J.M. Coughlan, “Fundamental Limits of Bayesian Inference: Order Parameters and Phase Transitions for Road Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 2, pp. 160-173, Feb. 2000. [45] M. Cazorla, F. Escolano, D. Gallardo, and R. Rizo, “Junction Detection and Grouping with Probabilistic Edge Models and Bayesian a*,” Pattern Recognition, vol. 35, no. 9, pp. 1869-1881, 2002. [46] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active Contour Models,” Int’l J. Computer Vision, vol. 1, no. 4, pp. 321-331, 1987. [47] E.M. Riseman and M.A. Arbib, “Computational Techniques in the Visual Segmentation of Static Scenes,” Computer Graphics and Image Processing, vol. 6, pp. 221-276, 1977. [48] S.C. Zhu and A. Yuille, “Region Competition: Unifying Snakes, Region Growing and Bayes/MDL for Multiband Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 9, pp. 884-900, Sept. 1996. [49] C. Chesnaud, P. Refregier, and V. Boulet, “Statistical Region Snake-Based Segmentation Adapted to Different Physical Noise Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 11, pp. 1145-1157, Nov. 1999. [50] J.H. Elder and S.W. Zucker, “The Effect of Contour Closure on the Rapid Discrimination of Two-Dimensional Shapes,” Vision Research, vol. 33, no. 7, pp. 981-991, 1993. [51] I. Kovacs and B. Julesz, “A Closed Curve Is Much More than an Incomplete One: Effect of Closure in Figure-Ground Discrimination,” Proc. Nat’l Academy of Science USA, vol. 90, pp. 7495-7497, 1993. [52] S. Borra and S. Sarkar, “A Framework for Performance Characterization of Intermediate Level Grouping Modules,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 1306-1312, Nov. 1997. [53] L.R. Williams and K.K. Thornber, “A Comparison of Measures for Detecting Natural Shapes in Cluttered Backgrounds,” Int’l J. Computer Vision, vol. 34, nos. 2/3, pp. 81-96, Sept. 1999. [54] J.H. Elder and S.W. Zucker, “Local Scale Control for Edge Detection and Blur Estimation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 699-716, July 1998. [55] S.G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 11, no. 7, pp. 674-693, July 1989.

VOL. 25,

NO. 6,

JUNE 2003

[56] E.P. Simoncelli and E.H. Adelson, “Noise Removal via Bayesian Wavelet Coring,” Proc. Third IEEE Int’l Conf. Image Processing, pp. 379-382, Sept. 1996. [57] E.P. Simoncelli, “Modeling the Joint Statistics of Images in the Wavelet Domain,” Proc. SPIE 44th Ann. Meeting, vol. 3813, pp. 188195, July 1999. [58] C.E. Shannon and W. Weaver, The Mathematical Theory of Communication. Urbana, Ill.: Univ. of Illinois Press, 1949. [59] L.D. Cohen, “On Active Contour Models and Balloons,” Computer Vision, Graphics, and Image Processing: Image Understanding, vol. 53, no. 2, pp. 211-218, 1991. [60] P. Fua, “Fast, Accurate and Consistent Modeling of Drainage and Surrounding Terrain,” Int’l J. Computer Vision, vol. 26, no. 3, pp. 215-234, 1998. [61] I. Laptev, H. Mayer, T. Lindeberg, W. Eckstein, C. Steger, and A. Baumgartner, “Automatic Extraction of Roads from Aerial Images Based on Scale Space and Snakes,” Machine Vision and Applications, vol. 12, pp. 23-31, 2000. [62] M.J. Jones and J.M. Rehg, “Statistical Color Models with Application to Skin Detection,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 274-280, 1999. James H. Elder received the BASc degree in electrical engineering from the University of British Columbia in 1987 and the PhD degree in electrical engineering from McGill University in 1995. From 1987 to 1989, he was with BellNorthern Research in Ottawa, Canada. From 1995 to 1996, he was a senior research associate in the Computer Vision Group at the NEC Research Institute in Princeton, New Jersey. He joined the faculty of York University, Canada, in 1996, where he is presently an associate professor. His current research addresses problems in computer vision, biological vision, and virtual reality. He is a member of the IEEE and the IEEE Computer Society. Amnon Krupnik received the BSc degree in geodetic engineering and the BA degree in computer science in 1988 and the MSc degree in geodetic engineering in 1990, all from the Technion-Israel Institute of Technology. He received the PhD degree from Ohio State University in 1994. He has been with the department of Civil Engineering at the Technion since 1994. His major research activities are in the areas of computer vision, digital photogrammetry, and remote sensing. Leigh A. Johnston received the BSc and BE degrees in computer science and electrical engineering, respectively, in 1997 and the PhD degree in electrical engineering in 2000 from the University of Melbourne, Australia. From 2000 to 2001, she was a research associate with the Centre for Systems Engineering and Applied Mechanics at the Universite´ Catholique de Louvain, Belgium, following which she returned to the University of Melbourne as a research associate in the Centre for Ultra-Broadband Information Networks. Since 2002, she has been a research associate in the Centre for Vision Research at York University, Canada, where she is interested in statistical signal processing algorithms for computer vision applications. She is a member of the IEEE.

. For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.