An Approach Towards Fast Gradient-based Image

1 downloads 0 Views 8MB Size Report
represent fundamental concepts of image segmentation, such as incorporation of ... graph of a function that corresponds to the optimal solution of the problem. ...... time for the gPb-owt-ucm algorithm, using code supplied by the authors, .... 2004. [22] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

1

An Approach Towards Fast Gradient-based Image Segmentation Benjamin Hell*, Student Member, IEEE, Marc Kassubeck, Pablo Bauszat, Martin Eisemann, Marcus Magnor, Senior Member, IEEE

Abstract—In this paper we present and investigate an approach to fast multi-label color image segmentation using convex optimization techniques. The presented model is in some ways related to the well-known Mumford-Shah model, but deviates in certain important aspects. The optimization problem has been designed with two goals in mind: The objective function should represent fundamental concepts of image segmentation, such as incorporation of weighted curve length and variation of intensity in the segmented regions, while allowing transformation into a convex concave saddle point problem that is computationally inexpensive to solve. This paper introduces such a model, the nontrivial transformation of this model into a convex-concave saddle point problem, and the numerical treatment of the problem. We evaluate our approach by applying our algorithm to various images and show that our results are competitive in terms of quality at unprecedentedly low computation times. Our algorithm allows high-quality segmentation of megapixel images in a few seconds and achieves interactive performance for low resolution images. Index Terms—unsupervised image segmentation, convex optimization

I. I NTRODUCTION

T

HE importance of grouping performed by the human visual system has been evident at least since the Gestalt movement in psychology [1]. Not surprisingly, grouping is also considered essential for a wide range of computational vision problems. For instance, intermediate-level vision problems such as stereo or motion estimation benefit from appropriate region support for correspondence estimation. Higher-level problems, such as image indexing [2], foreground-background separation [3], or recognition by parts [4] can also benefit from a useful segmentation of the image. The problem of image segmentation and grouping is, therefore, a long-standing challenge in computer vision. In 1989, Mumford and Shah introduced their famous image model [5] and showed how it can be applied to many segmentation problems. Unfortunately, it is highly non-linear and cannot be easily optimized. Existing convex approaches typically suffer from high computational costs, especially when separating into many distinct regions. In this paper, we propose a model incorporating all the basic aspects of the gradient-based Mumford-Shah approach while maintaining an easy and efficiently-to-handle convex saddle point structure to enable fast computation. Our work is inspired Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. Authors are with the Computer Graphics Lab at TU Braunschweig, Germany. Please see http://graphics.tu-bs.de/people/ for contact.

Fig. 1. High Resolution Image Segmentation. Using the CUDA implementation of the presented algorithm, a megapixel image (top left) is segmented in 8 regions (top right) in about 3 seconds, while a quarter megapixel image (bottom left) can be segmented in 3 regions (bottom right) in a fraction of a second.

by the seminal previous work on the Mumford-Shah functional by Alberti et. al. [6] and the works of Pock et. al [7] and Strekalovskiy et. al. [8], which build upon Alberti’s investigations. In Section II we introduce our model by considering the binary segmentation case. We derive its unique saddle point representation and present non-trivial extensions that allow for multi-region segmentation and proper color information treatment. In Section III, we show how a primal-dual convex optimization algorithm can be applied to solve the saddle point problem. To efficiently treat multi-region segmentation, we introduce a very fast, approximative projection scheme in Section III-C. We conclude with an evaluation of our approach and compare our results to other state-of-the-art techniques. A. Related Work Major contributions to image segmentation and similar problems have been made by Mumfurd-Shah in [5] and by research on active contours [9]–[11]. These approaches have in common, that they are non-convex, hence it is in general difficult to obtain a global optimum. Their numerical method of choice

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

to obtain a local optimal solution is the level set method. Later on researchers turned to convex problems which lead to expectation maximization approaches and considered convex relaxations for the piecewise-constant Mumford-Shah model [12], [13] as well as the scalar piecewise smooth MumfordShah model [7], which incorporates ideas closely related to the transformation process used in this paper and is based on [6]. Our goal is to make use of aspects, that are relevant for image segmentation, without the need of involving the entire graph of a function that corresponds to the optimal solution of the problem. This way, at the cost of not being able to use our approach for image smoothing techniques, we do not have to introduce an additional dimension to the problem, leading to faster computation times. We also use directional information obtained from a segment’s boundary normal, a topic which has also been touched in [14]. Another challenge is the extension of binary segmentation techniques to multi-region segmentation. In this regard, prior work most closely related to ours is [15], [16] and, most of all, [12]. The latter paper also contains summaries on crucial facts about projection algorithms that play an essential role in convex multi-region segmentation. In contrast to [12] this paper introduces a vastly different and novel image segmentation model and a very fast projection scheme, which is essential for achieving short computation times while maintaining a high quality segmentation output. Fast first order primal-dual algorithms for convex optimization problems relevant to this work have also been investigated in [17], [18] and [19]. B. Contributions Overall this paper makes the following contributions: ●

● ●







We introduce a unique model that shows resemblance to the famous Mumford-Shah functional and can be transformed into an easy-to-handle, convex saddle point problem without introducing additional dimensionality. We show how this model can be transformed into its unique saddle point representation. We show how to exploit a well established primaldual convex optimization algorithm to obtain the unique solution of the saddle point problem. To calculate projections onto the specific convex sets that evolve naturally from the introduced model and the specific application of the first order primal-dual algorithm, we introduce a very fast, approximative projection scheme. The introduced model is flexible and can be adjusted to quite some degree. The presented ideas can be applied to a wide class of optimization problems and are not restricted to the form of the model presented in this paper. The presented problem can be solved without introducing any additional dimensionality. Together with a very fast projection algorithm this leads to very fast computation times, which is one of the major benefits of our approach.

2

I∶ ⟨.,.⟩X ∶ ∥.∥p ∶ ∂∶ D∶ ∇∶ ˜∶ ∇ Ω ⊂ Rn ∶ BV ∶ V∶ Φ∶ l∶ k∶

image intensity (scalar respectively color vector) inner product on Hilbert space X p-norm boundary (on sets) resp. differential operator weak derivative gradient vector discrete gradient vector rectangular domain of the given picture space of functions of bounded variation convex restriction set of the primal variable convex restriction set of the dual variable number of image channels k + 1 is the number of segments

II. M ODELING P ROCESS This section introduces the optimization model that will be investigated throughout this paper. For convenience, we start with presenting the binary segmentation case first (Section II-A) and consider the extension to the multi-region case afterwards (Section II-C). Also, we will focus on basic color image treatment first and will offer an extension of the model to treat intensity gradients for each color channel individually later on. From a modeling perspective, both extensions are straightforward but involve some technical difficulties that need to be overcome. All our observations hold true for arbitrary domain dimension n; for standard 2D image segmentation we have n = 2. A. The Novel Binary Segmentation Model We first consider a rather general form of a binary image segmentation model. To label the occurring segments we use an indicator function v ∶ Ω → {0,1} which operates on the image domain Ω. The space of all these binary functions will be called V˜ ∶= {v ∣ v ∶ Ω → {0,1}}. Jv is the jump set of the binary function v which represents the boundary of the individual image segments. Hn−1 represents the n−1-dimensional Hausdorff-measure. For dimension n = 2 (2D image segmentation), this n−1-dimensional measure corresponds to curve length measurement. The general form of our binary model is then given by inf

v∈V˜

∫ ∥DI(x)∥2 dx + Ω/Jv

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶

intensity variation on segments 2

λ1 ∫ ∥d(x)−νJv(x)∥2 dHn−1 +λ2∫ g(x)dHn−1 + Jv

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ directional deviation

Jv

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶

(1)

weighted boundary length

λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx Ω

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ image data

C. Notation and Abbreviations The following notation will be used throughout the whole paper.

λ1 ,λ2 ,λ3 ∈ R are scalar weights to balance the individual terms in the objective function, in essence controlling their respective influence. In the following we will explain and

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

Fig. 2. Approximation in the discrete setting The figure shows the exact boundary (blue line) and its discrete approximation (red blocks). As the calculation eventually takes place on an equidistant grid (pixel grid in the 2D case), we can approximate the integral ∫Ω/Jv ∥DI(x)∥2 dx (integration ˜ without blue curve) via subtracting ∫Jv ∥∇I(x)∥ 2 dx (integration red boxes) ˜ ˜ is the numerical approximation of the image from ∫Ω ∥∇I(x)∥ 2 dx. ∇I gradient vector and Jv the jump set of the binary function v, which corresponds to the boundary (cf. (3) and (4)).

3

incides with the ability to transform a relaxed version of the problem into a convex saddle point problem. Alberti et. al. [6] show that this is possible for the Mumfurd-Shah functional [5]. Unfortunately, the saddle point problem can only be obtained by extending the solution space by one additional dimension. This leads to high computational costs for obtaining an optimal solution numerically because a higher dimensional space has to be discretized. In contrast, our approach works directly with image gradients (cf. the intensity variation term) and without any similarity to the initial image term like the beforementioned Mumford-Shah functional. Hence, we are able to preserve the dimensionality of the solution space by transforming problem (1) first and obtaining a convex saddle point representation for this problem afterwards in Section II-B. The only term that needs to be modified is the intensity variation term ∫Ω/Jv ∥DI(x)∥2 dx. We assume I ∈ BV, so according to Appendix A we have the following decomposition of the distributional derivative DI:

motivate the individual terms of the objective function above. intensity variation on segments: This term essentially penalizes any large deviation of intensity values on each segment, ignoring intensity changes that occur at a segment’s boundary, which is described by the jump set Jv . A similar term can, for example, be found in the Mumford-Shah functional [5]. Note that Jv is a null set with respect to the n-dimensional Hausdorff measure on Ω. DI has to be understood as the distributional derivative of the intensity I, which we assume to be a function of bounded variation and to exhibit discontinuities on the jump set Jv . This way the exclusion of the null set Jv in the set Ω/Jv makes sense and cannot be omitted.

DI = ∇I +CI +(I + −I − )νI dHn−1 ⌞JI From this representation it immediatelly follows: ∫ ∥DI(x)∥2 dx = ∫ ∥DI(x)∥2 dx− ∫ ∥DI(x)∥2 dx = Ω

Ω/Jv

weighted boundary length: This term penalizes large boundary measure and additionally weighs the penalization at each point x by g(x) > 0, an idea also occurring in edge-based segmentation approaches [11]. There are lots of meaningful choices for a weighting function g(.), such as ideas from geodesic active contours [11]. For one possible choice see Section II-E. image data: This term serves as a regularizer and involves the data terms ρi (x). Essentially, ρi (x) yields an image data related cost of assigning label i at point x. For a specific choice of ρi see Section II-E. Without this image data term, v ≡ const. on Ω, i.e. the jump set Jv is empty and no segmentation takes place, would tend to be the optimal solution of problem (1), which has to be avoided. Problem (1) has been designed with the purpose of obtaining the solution while minimizing computational effort. This co-

Jv +



n−1 ∫ ∥DI(x)∥2 dx− ∫ (I (x)−I (x))dH Ω

(3)

Jv

˜ as the numerical derivative In a discrete setting, i.e. with ∇I and calculation taking place on a equidistant grid, we get + − n−1 ≈ ∫ ∥DI(x)∥2 dx− ∫ (I (x)−I (x))dH Ω

directional deviation: νJv is the inner unit normal to the boundary of the set {x ∶ v(x) = 1}. The boundary itself coincides with the jump set Jv . In (9) we will see that this normal vector is closely related to the distributional derivative of the indicator function v. d(x) represents a suggestion for the unit normal vector of the boundary passing through x. We will use the normalized image gradient for d(x) later on (cf. Section II-E). Direction-based terms, though with a different intention, have also been used in [14].

(2)

Jv

n−1 ˜ ˜ 2 dx− ∫ ∥∇I(x)∥ 2 dH ∫ ∥∇I(x)∥ Ω

(4)

Jv

which is visualized in Figure 2. Substituting (3) into (1) and making use of the fact that the term ∫Ω ∥DI(x)∥2 dx has no influence on the optimal argument of the whole optimization problem we arrive at the final version of our binary segmentation model: 2

inf λ1 ∫ ∥d(x)−νJv(x))∥2 dHn−1 +

v∈V˜

Jv + − n−1 ∫ λ2 g(x)−(I (x)−I (x)) dH + Jv ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ f (x,λ2 )=f (x)∶=

(5)

λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx Ω

B. Transformation into Saddle Point Problem One essential property of the optimization problem (5) is that it can be rewritten as a convex-concave saddle point (min/max) problem by introducing additional dual variables and relaxing the set V˜ . This property will be essential for obtaining the optimal solution from an analytical and numerical standpoint. We make use of a similar technique as the one presented in [6] by Alberti et. al. for the Mumford-Shah functional. Note

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

4

that the approximation process, and hence the particular optimization problem (5), can be transformed into the saddle point problem without introducing an additional dimensionality as in the Mumford-Shah case (cf. [6]). Another deviation from Alberti’s work, and at the same time the key to obtaining the saddle point formulation, is the derivation of an appropriate restriction set Φ for the dual variable.

−2λ1 d(x) with radius f (x)+λ1 (1+∥d(x)∥22 ). To do so, we take a look at the possible radius:

Theorem 1 (Curve Integral Representation).

and C(x) being the supposed to be center of the circle. Substituting into (13) we obtain

2

n−1 = ∫ λ1 ∥d(x)−νJv(x)∥2 +f (x,λ2 )H Jv

φ∈Φ

with φ(x) being on the line described by 2

⟨φ(x),η(x)⟩ = f (x)+λ1 ∥d(x)−νJv(x)∥2

2

with Φ being the convex restriction set for the dual variable φ given by: Φ = {φ ∈ C0∞ (Ω)n ∶ ∥φ(x)+2λ1 d(x)∥2 ≤ f (x)+λ1 (1+∥d(x)∥22 ), x ∈ Ω}

(7)

Proof. Let η(x) ∈ R2 be an arbitrary unit vector, i.e. ∥η(x)∥2 = 1. Let us further define a half space dependent on η(x) by Hη(x) ∶= {φ(x) ∈ Rn ∶ ⟨φ(x),η(x)⟩ ≤ 2

f (x)+λ1 ∥d(x)−η(x)∥2 }

(8)

We now turn to one of the most important facts for this proof, a general form of the coarea formula, also used in [6], which is according to Appendix A - a special case of the decomposition of a function of bounded variation: Dv = νv ⋅Hn−1 ⌞Jv

(9)

with Hn−1 being the n−1-dimensional Hausdorff measure, i.e. curve length measure in the 2D case, and νJv being the inner unit normal vector to the boundary of the set {x ∶ v(x) = 1}. With these results from measure theory we obtain for φ ∈ C0∞ (Ω)n : n−1 ∫ ⟨φ(x),Dv(x)⟩ = ∫ ⟨φ(x),νJv(x)⟩dH

(10)

(14)

(15)

If we choose C(x) = −2λ1 d(x), we obtain R(x) = f (x)+λ1 (1+∥d(x)∥22 )





(13)

R(x) = f (x)+λ1 ∥d(x)−η(x)∥2 −⟨C(x),η(x)⟩ (6)

sup ∫ ⟨φ(x),Dv(x)⟩

R(x) ∶= ⟨φ(x)−C(x),η(x)⟩

(16)

which is independent of η(x). Remarks ● At each point x the set Φ describes a circle around center −2λ1 d(x) with radius f (x)+λ1 (1+∥d(x)∥22 ). ● The radius of the circle has to be positive for equality (6) to hold. In practice, this can be easily achieved at any point x by choosing suitable parameters λ1 , λ2 and an appropriate weighting function g(x). As the value of λ1 has, in general, only little impact on the final segmentation result, this does not pose any serious restriction. ● It is possible to choose negative values for f (x). This is an interesting property of the model. Essentially, f (x) < 0 forces the boundary to pass through point x. Using Theorem 1, the optimization problem (5) can be rewritten as: inf sup ∫ ⟨φ(x),Dv(x)⟩+

v∈V˜ φ∈Φ



λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx

(17)



with V˜ ∶= {v ∶ Ω → {0,1}}. Convex relaxation of V˜ into V ∶= {v ∶ Ω → [0,1]} makes (17) a convex-concave saddle point problem, which will be considered throughout the rest of this paper:

Jv

Recall that ∥νJv(x)∥2 = 1. So the following equality holds true: ∫

2 λ1 ∥d(x)−νJv(x)∥2 +f (x)dHn−1 =

Jv

sup ∫ ⟨φ(x),Dv(x)⟩

φ∈Φ



λ3 ∫ (1−v(x))ρ1 (x)+v(x)ρ2 (x)dx (11)

(18)



with Φ defined in (7).

˜ φ∈Φ Ω

˜ is the space of all C ∞ (Ω)n -functions, which, evaluwhere Φ 0 ated at a certain point x, lie in the intersection of all possible half spaces defined by (8): ˜ ∶= {φ ∈ C0∞ (Ω)n ∶ φ(x) ∈ Hη(x) ,∥η(x)∥2 = 1,x ∈ Ω} Φ

inf sup ∫ ⟨φ(x),Dv(x)⟩+

v∈V

(12)

˜ = Φ, as The only thing left to do is to show that indeed Φ defined in (7). This coincides with the intersection of all possible half spaces Hη(x) forming a circle around the center

C. Extension to Multi-Region Case In this section, we extend the saddle point representation (18) to the multi-region case. This extension is in a certain way straightforward, but the representation for multiple segments introduces some technical difficulties that need to be addressed accurately. To represent multiple regions mathematically we introduce a concatenation of binary functions, each one splitting the image

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

5

area into one more segment. This representation follows the ideas presented in [16], which in return is based on [15]. The new set of binary variables is defined by: V˜ = {vi ∶ Ω → {0,1} ∶ 1 ≥ v1 (x) ≥ ⋅⋅⋅ ≥ vk (x) ≥ 0, x ∈ Ω}

(19)

with region Ri (i = 1,...,k +1) being represented by Ri = {x ∈ Ω ∶ vi−1 (x)−vi (x) = 1} (i = 1,...,k +1)

k

(20)

with v0 (.) ≡ 1 and vk+1 (.) ≡ 0. Straightforward extension of the objective function in (18) by using the scalar product ∑ki=1 supφi ∈Φ ∫Ω ⟨φi ,Dvi ⟩ leads to counting certain parts of the boundary multiple times, which unfortunately produces undesirable results (especially for f (x) < 0). Hence, to obtain an appropriate curve integral representation, analogous to the one in Theorem 1, we need to restrict the corresponding dual variables to an appropriate convex set Φ, which is given by the following theorem. Theorem 2 (Curve Integral Representation for Multiple Segments). ∫

2 λ1 ∥d(x)−νJv(x)∥2 +f (x,λ2 )dHn−1 =

Jv

(21)

k

sup ∑ ∫ ⟨φi (x),Dvi (x)⟩ φ∈Φ i=1 Ω

with Dvi (.) (i = 1,...,k) being the weak derivatives of the binary functions vi (.) ∶ Ω → {0,1} (i = 1,...,k) representing the segments according to scheme (20). Jv being the combined jump set of all these binary functions vi (i = 1,...,k), i.e. the boundary of all k +1 segments, νJv being the inner normal to at least one of the sets {x ∶ vi (x) = 1} (i = 1,...,k) (boundaries of segments specified by primal variable v may overlap, but if they do they still share the same normal) and Φ being the convex restriction set for the dual variable φ(.) = (φ1 (.),...,φk (.)) given by: Φ ={(φ1 ,...,φk ) ∶ φi ∈ C0∞ (Ω)n ,∥ ∑ φi (x)+2λ1 d(x)∥2 ≤ i1 ≤i≤i2 f (x)+λ1 (1+∥d(x)∥22 ), x ∈ Ω, 1 ≤ i1 < i2 ≤ k} (22) Proof. Let Ai = {x ∶ vi (x) = 1}, with v = (v1 ,...,vk ) ∈ V˜ (cf. definition (19)). Due to the structure of V˜ we immediately obtain the property: x ∈ ∂Ai2 ∧x ∈ ∂Ai1 ⇒ x ∈ ∂Ai (i1 ≤ i ≤ i2 )

(23)

So for each x ∈ Ω there exist i1 ,i2 ∈ {1,...,k} such that k

i2

∑⟨φi (x),Dvi (x)⟩ = ⟨ ∑ φi (x),Dvi1 (x)⟩

i=1

From Theorem 2 in conjunction with a straightforward extension of the image data term in the objective function and convex relaxation of the primal restriction set V˜ , we obtain the following convex-concave saddle point problem for multi-region image segmentation:

(24)

inf sup ∑ ∫ ⟨φi (x),Dvi (x)⟩+

v∈V

φ∈Φ i=1 Ω k+1

(25)

λ3 ∑ ∫ (vi−1 (x)−vi (x))ρi (x)dx i=1 Ω

with convex restriction set V = {vi ∶ Ω → [0,1] ∶ 1 ≥ v1 (x) ≥ ⋅⋅⋅ ≥ vk (x) ≥ 0, x ∈ Ω}

(26)

and Φ defined in (22), v0 (.) ≡ 1, vk+1 (.) ≡ 0 and ρi (i = 1,...,k +1) the image data term. For a specific choice of ρi see Section II-E. D. Handling Color Information So far, color information only appeared in the image data term (volume integral) of the objective function. The curve integral term does not make any use of multiple color channels yet. The image gradient ∇I(.) has been pre-calculated on a grayscale picture. While this leads to faster calculation times to obtain the segmentation, obviously information that is easily available is not used in the segmentation process. To do so, we introduce additional dual variables (one for each color channel) in the saddle point problem (25), which leads us to a simple, full-color multi-region model: 3

k

inf ∑ sup ∑ ∫ ⟨φij (x),Dvj (x)⟩+

v∈V

i=1 φi ∈Φi j=1 Ω k+1

(27)

λ3 ∑ ∫ (vi−1 (x)−vi (x))ρi (x)dx i=1 Ω

with φi (.) = (φi1 (.),...,φik (.)) (i = 1,...,3) and Φi ={(φi1 ,...,φik ) ∶ φij ∈ C0∞ (Ω)n , ∥ ∑ φij (x)+2λ1 di (x)∥2 ≤ fi (x)+λ1 (1+∥di (x)∥22 ), j1 ≤j≤j2

x ∈ Ω, 1 ≤ j1 < j2 ≤ k}

(i = 1,...,3) (28)

with Ii (.) representing the image intensity according to channel i (i = 1,...,3). The definition of fi (.) follows directly from applying the definition in (5) channelwise, i.e.: fi (x) = fi (x,λ2 ) ∶= λ2 gi (x)−(Ii+ (x)−Ii− (x))

(29)

i=i1

where equality has to be understood according to the definition of the weak derivative D and i1 > i2 is admissible, in which case the sum yields 0. The rest of the proof follows by considering the right hand side of equation (24) and applying the proof of Theorem 1.

E. Specific Choices for the Implementation Let N (.) represent the normalization operator, i.e. ⎧ y ⎪ ⎪ N (y) ∶= ⎨ ∥y∥2 ⎪ ⎪ ⎩0

y≠0 y=0

(30)

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

For all our specific implementations of the algorithm based on the models presented in Section II-A, Section II-C and Section II-D, we use the following choices for the respective functions introduced so far: d(x) ∶= N (DI(x)): With this choice, the directional deviation term describes the deviation of the normalized image gradient (gradient of intensity I) from the unit normal vector of the separating curve (described by Jv ). This favors a region boundary which has a normal vector pointing in the direction of intensity increase. Normalization of the image gradient vector is necessary to be able to compare it to a unit normal vector. 1 g(x) = g(x,α) ∶= 1+α∥DI(x)∥ (α > 0): This specific choice fa2 vors the segment’s boundary to run through points with large image gradient. Similar terms occur in edge-based image segmentation approaches, such as geodesic active contours [11]. For the color model in Section II-D we simply define this weight channelwise, i.e. gi (x) ∶= 1/(1+α∥DIi (x)∥2 ).

ρi (x) ∶= ∥I(x)− I¯i ∥2 (i = 1,...,k): Describes the local (pointwise) deviation of the image intensity I(x) - for color images this is a 3D-vector - from some appropriate mean color value I¯i . Those values need to be obtained once in a precomputation step. One option to calculate those mean color values would be to use k-means clustering, although a lot of other appropriate methods like histogram analysis or manual selection are available and may be more appropriate depending on the scenario at hand. Once again, it is intentional to keep ρi simple as the according term in the objective function acts as a regularizer (cf. Section II-A). Note that all these choices could be replaced by different meaningful terms that do have certain intentional properties and depend on the same arguments. III. O PTIMIZATION This section deals with the actual optimization process. The main purpose of transforming the initial optimization problem into a convex-concave saddle point problem is to be able to use fast primal-dual convex optimization algorithms to solve the problem for its unique solution. It is well known that those algorithms perform very well on optimization problems exhibiting a saddle point structure. This has been shown for many computer vision problems like image denoising, image inpainting, image deconvolution and zooming, motion estimation, and image smoothing, just to name a few. An application to those examples, a detailed comparison to other algorithms and proof of the stability and convergence properties of first order primal-dual algorithms for convex saddle point problems can be found in [17] and [19]. In this section we will give a short overview of the primaldual algorithm we use for calculating our results, and we show how to apply this algorithm to the specific problem at hand. Applying the algorithm involves nontrivial projection operations in the multi-region case. In Section III-C, we will show fast projection algorithms for projecting onto the primal restriction set V and introduce our own projection algorithm to compute projections onto the dual restriction set Φ.

6

A. The Primal-Dual Projection Algorithm The primal-dual algorithm we use is presented in detail in [17]. This section gives a short overview and determines the notation we will use later on. The algorithm works on convex-concave saddle point problems. The general form of the problem considered by Chambolle et. al. [17] suited to our problem is inf

sup

v∈BV (Ω)k φ∈C ∞ (Ω)nkl

⟨φ,K(v)⟩L2 (Ω)nkl +G(v)−F ∗ (φ) (31)

0

The function spaces have been chosen to suit problems (18), (25) and (27). Recall that k +1 is the number of segments and l shall be the number of image channels (l = 1:grayscale, l = 3: color). K(.) is a linear operator, G(.) is a convex lower semicontinuous function, and F ∗ (.) is the convex conjugate of a lower semicontinuous function F (.). The conjugacy notation has to do with the usual derivation of saddle point problems via the Fenchel-Legendre transform [20]. With the notation introduced above, the optimization algorithm we use is the following: φn+1 = (Id+σ∂F ∗ )−1 (φn +σK(˜ v n ))

v n+1 = (Id+τ ∂G)−1 (v n −τ K ∗ (φn+1 )) n+1



= 2v

n+1

−v

(32)

n

with Id being the identity operator and step widths σ,τ > 0 chosen such that στ ∥K∥22 < 1, with ∥K∥2 being the operator norm of K. The inequality is essential for convergence of the algorithm, hence choosing an appropriate upper bound of ∥K∥22 is mandatory. There has been a paper published on preconditioning for the primal-dual algorithm [18] and one on a more general theory regarding first order forward-backward algorithms [19], both introducing techniques to potentially speed up the algorithm at hand. We found that in our particular case a heuristic, fixed choice of the step sizes τ and σ introduced in (32) yields the best results. To apply the algorithm one essentially needs to implement discrete versions of the specific resolvent operators (Id+ σ∂F ∗ )−1 and (Id+τ ∂G)−1 . In addition, proper discrete forms of the linear operator K and its adjoint operator K ∗ have to be determined. The process of convexification explained in Section II leads to the relaxed convex restriction set V (26). So as a final step the solution, i.e. the primal variable v = (v1 ,...,vk ) needs to be thresholded in each component to obtain a proper segmentation.

B. Applying the Algorithm This section gives a short summary of the problem-specific operators appearing in the algorithm (32) and other implementation details. We consider the multi-region color model (27) introduced in Section II-D. The treatment for the binary segmentation model (18) and the semi-color multi-region model (25) should become evident as special cases.

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

7

Comparison of the saddle point form of the multi-region color model (27) to the general form (31) yields l

k

∑ ∑ ∫ ⟨φij (x),Dvj (x)⟩ −δΦ (φ) v∈BV (Ω)k φ∈C ∞ (Ω)nkd i=1 j=1 ´¹¹ ¹ ¸¹¹ ¹ ¶ 0 Ω ∗ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ F (φ)= inf

sup

⟨φ,K(v)⟩=

(33)

k+1

+λ3 ∑ ∫ (vi (x)−vi−1 (x))ρi (x)dx+δV (v) i=1 Ω

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ G(v)=

with δΦ (.) and δV (.) being the indicator functions for the dual restriction set Φ = {Φ1 ,...,Φl } defined in (28), respectively the primal restriction set V defined in (26): ⎧ ⎧ ⎪0 v ∈ V ⎪0 φ ∈ Φ ⎪ ⎪ δV (v) = ⎨ , δΦ (φ) = ⎨ (34) ⎪ ⎪ ∞ otherwise ∞ otherwise ⎪ ⎪ ⎩ ⎩ Note that according to (4) it makes sense to choose fi (x) = ˜ i (x) (∇ ˜ being the discrete gradient operator), λ2 gi (x)− ∇I which occurs in the definition of the set Φ and has been defined in (5), respectively (29). For a practical choice of gi (.) see Section II-E. The resolvent operators can be deduced from the following identity (cf. [17]) which implies usage of first order optimality conditions and the fundamental theorem of variational calculus: (Id+σ∂F ∗ )−1 (φ) = argmin { θ∈C0∞ (Ω)nkd

∥θ −φ∥22 +F ∗ (θ)} = ΠΦ (φ) 2σ

∥u−v∥22 +G(u)} = 2τ u∈L2 (Ω)k ⎛ ⎛ −∥I(.)− I¯1 ∥2 +∥I(.)− I¯2 ∥2 ⎞⎞ ⋮ ⎟⎟ ΠV ⎜v(.)−τ λ3 ⎜ ⎝ ⎝−∥I(.)− I¯k ∥2 +∥I(.)− I¯k+1 ∥2 ⎠⎠ (Id+τ ∂G)−1 (v) = argmin {

(35) with Π being the pointwise projection operator with respect to Euclidean distance. It should be evident by now that these projection operators will have a huge influence on calculation time, so fast algorithms to perform these projections will be essential. We introduce algorithms for ΠΦ and ΠV in Section III-C. The operator K(.) for l = 3 (color) is given by ⎛⎛Dv1 ⎞ ⎛Dvk ⎞⎞ K(v) = Dv ∶= ⎜⎜Dv1 ⎟ , ... , ⎜Dvk ⎟⎟ ⎝⎝Dv1 ⎠ ⎝Dvk ⎠⎠

(36)

Comparison to (33) directly leads to the adjoint operator K ∗ (.): 3

3

i=1

i=1

K ∗ (φ) = −divφ ∶= −(∑ divφi1 , ... , ∑ divφik )

(37)

Note that for the grayscale case (l = 1) those operators reduce to K(v) = Dv = (Dv1 ,...,Dvk ) and K ∗ (φ) = −divφ = −(divφ1 ,...,divφk ). Using an estimation process like in the proof of Theorem 3.1 in [21] we obtain ∥K∥22 ≤ 4nkl. We use this estimate to choose appropriate step widths σ and τ such

that στ ∥K∥22 < 1, which leads to convergence of the primaldual algorithm (32) (cf. [17]). For initializing v and φ practically any random value could be used, as the final result is unique due to the convex structure of the problem and hence does not depend on the starting point of the computation. Of course, to reduce computation time it is best to choose an initial solution that is as close as possible to the final - yet to be determined - optimum. As noted in Section II-E there are many options to compute mean color values I¯i for the data term ρi . In general it makes sense to base a first guess for v on the mean color values I¯i chosen for the image data term and choose Dv as a starting point for φ.

C. Projection Algorithms Eventually all the operations presented in previous sections take place locally at each point of the input image, which is especially beneficial for parallel implementations (e.g. on the GPU). This also holds true for the projection operators ΠV and ΠΦ . As already mentioned, algorithms representing those local projections have huge influence on computation time. 1) Projection on Primal Set V: Taking a look at the resolvent operator (Id+τ ∂G)−1 presented in (35) one sees that ΠV projects at each point x a vector v(x) ∈ Rk onto the set V (x) ∶= {y = (y1 ,...,yk ) ∈ Rk ∶ 1 ≥ y1 ≥ ⋅⋅⋅ ≥ yk ≥ 0}, which follows directly from the definition of V in (26). To execute the projection we slightly modify an algorithm presented in [12] by Chambolle et al.: Algorithm To project y = (y1 ,...,yk ) ∈ Rk on V (x), we obtain the projection y˜ = (˜ y1 ,..., y˜k ) = ΠV (x) (y) by 1) set y˜i ∶= yi and define k clusters Zi ∶= {i} (i = 1,...,k) 2) identify index i with y˜i < y˜i+1 , if there is no such index abort the algorithm and jump to step 4 3) set clusters Zj ∶= Zi ∪Zi+1 ∀j ∈ Zi ∪Zi+1 , set y˜i+1 ∶= y˜i ∶= ∑j∈Zi (˜ yj )/∣Zi ∣ and go back to step 2 4) set y˜i ∶= max(min(˜ yi ,1),0) (clipping), then y˜ = ΠV (x) (y) Remarks ● The presented algorithm deviates from the algorithm presented in [12] in the last step, i.e. the clipping process. ● It can be shown that doing the clipping at the end results in the algorithm yielding the exact projection on the set V (x) defined in (26). 2) Projection on Dual Set Φ: For implementing the pointwise projection ΠΦ onto the set Φ = {Φ1 ,...,Φl } we need a fast algorithm to project φi (x) ∈ Rnk (i = 1,...,l) at point x onto Φi (x) ∶= {y = (y1 ,...,yk ) ∈ Rnk ∶ ∥ ∑ yj +2λ1 di (x)∥2 ≤ j1 ≤j≤j2

fi (x)+λ1 (1+∥di (x)∥22 ), 1 ≤ j1 < j2 ≤ k}

(38)

which follows directly from the definition (28) of the dual restriction set Φi (i = 1,...,l).

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

8

Given a vector y = (y1 ,...,yk ) ∈ Rnk we start with introducing z = (z1 ,...,zk ) ∈ Rnk by defining

input

i

zi ∶= ∑ yj , z0 ∶= 0Rn ⇒ zj2 −zj1 −1 = ∑ yj j1 ≤j≤j2

j=1

(39) SMMS

In addition we define radius and center by R ∶= fi (x)+λ1 (1+∥di (x)∥22 ), C ∶= −2λ1 di (x)

(40) ours

This way, we can rewrite Φi (x) = {y = (y1 ,...,yk ) ∈ Rnk ∶ ∥zj2 −zj1 −1 −C∥ ≤ R, j

zj = ∑ ym (j = 1,...,k), z0 = 0Rn , 1 ≤ j1 ≤ j2 ≤ k}

(41)

m=1

which leads to the idea of using successive projections to approximate the projection process. Algorithm Let y = (y1 ,...,yk ) ∈ Rnk be given, we then project on Φi (x) specified in (38) by 1) set z0 ∶= 0Rn , z˜0 ∶= z0 and j ∶= 1 2) set zj ∶= z˜j−1 +yj , then iteratively project zj onto ⋂ Km with circle Km ∶= {z ∈ Rn ∶ m=0,...,j−1

∥z − z˜m −C∥2 ≤ R}, the projected point shall be z˜j a) from the definition of C and R one can see that ⋂ Kj ≠ ∅ (i = 1,...,k) if fi (x) ≥ 0, otherwise j=0,...,i−1

(b) yields at least a good approximation b) a fast and again approximative routine for projecting zj onto ⋂ Km is to set m=0,...,j−1

z˜j ∶= argmax {∥z −zj ∥2 ∶ z = ΠKm(zj ) (m = 0,...,j −1)} z

with ΠKm being the Euclidean projection on the circle Km 3) set j ∶= j +1, if j ≤ k go to step 2, otherwise go to step 4 4) obtain final projected vector y˜ = (˜ y1 ,..., y˜k ) via reversing the accumulation process: y˜j ∶= z˜j − z˜j−1 (j = 1,...,k) Remarks ● The presented algorithm does not yield the exact projection, but the objective function is very well conditioned with regard to the dual variable φ. So small errors in obtaining this dual variable practically only impact the rate of convergence, not the final result. ● The algorithm is a very fast way to obtain a sufficiently accurate projection on the set Φi (x) and hence on the discretization of the set Φ. IV. E XPERIMENTAL VALIDATION In this section we apply the algorithm developed in Section II and III to a variety of different test scenarios. We show that the results are competitive to state-of-the-art segmentation algorithms in terms of quality and can be obtained very fast. The presented algorithm is fully parallelizable. We implemented it in CUDA.

Fig. 3. Binary segmentation: Results for a series of natural scenery images (top) by the new statistical model for Mumford-Shah (SMMS) [23] (middle) and our proposed approach (bottom). Each picture has a resolution of about 0.2 megapixels. Computation time for our approach is, depending on the image, between 40 and 95 ms for a high accuracy of  = 10−4 . According to [23], the SMMS algorithm takes roughly 10 seconds to compute each image.

In the preceding sections we showed how to obtain a highly efficiently computable structure of the initial optimization problem, which led to the primal-dual projection scheme using tailored projection algorithms. We show that the whole process leads to very short computation times, which to the best of our knowledge have not been achieved with other approaches based on similar models yet. All statistics were measured on an Intel Core i7-4930K, 3.40 GHz and 32 GB RAM PC with an NVIDIA GeForce GTX 780Ti running Windows 8 64-bit. In the following, we show results for binary segmentation and then advance to multi-region segmentation evaluation. We conclude this section with a comparison to clustering algorithms and the application of the Berkeley image segmentation benchmark BSDS500 (cf. [22]). All test scenarios are evaluated using algorithm (32). Recall that the primal variable is denoted by v and the dual variable by φ. The output we show is the accumulated, thresholded primal variable, which we rescaled to values between 0 and 1. Each intensity in the resulting grayscale image corresponds to one segment. If the root mean square deviation √ in the primal variable drops below a specified value , i.e. ∥v n+1 −v n ∥22 /(resolution⋅k) < , the algorithm terminates. If not explicitly mentioned otherwise, we use  = 10−4 , σ = 10 and τ = 0.99/(σ ⋅4nkl) (cf. Section III-B) and the threshold value 0.5. We refer to results computed with 50000 iterations as reference solutions.

A. Binary Segmentation We first consider our binary segmentation model (18). While from a mathematical standpoint the theory behind the algorithm in the binary case is pretty much the same as for the multi-region case, the projection scheme presented in Section III boils down to an easily computable clipping (for the primal variable) and projecting onto a single circle (dual variable). Figure 3 shows results for binary segmentation of graylevel images. We compare our results with those of Ref. [23]. The results are comparable in terms of quality and can be obtained in a fraction of a second even for high accuracy . Our experiments show that realtime (≤ 50 ms) computation speeds are achievable with almost the same segmentation result.

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

Fig. 4. 3-label segmentation: The input image with resolution 512 × 512 (left) and the initial solution based on least Euclidean distance towards picked mean color values {I¯1 , I¯2 , I¯3 } (middle). The output of the algorithm (right) based on the semi-color model (25) is obtained in approx. 250 ms with  = 0.5 ⋅ 10−4 in 700 iterations. Parameter choices are λ1 = 10, λ2 = 5, λ3 = 1 and α = 12.

Fig. 5. Challenging 4-label segmentation: The input image initially created by Farbman et. al [24] at resolution 500 × 375 (left) and the initial solution based on least Euclidean distance towards picked mean color values {I¯1 , I¯2 , I¯3 , I¯4 } (middle). The output of the algorithm (right) based on the semi-color model (25) is obtained in approx. 0.5 seconds in 1400 iterations. Parameter choices are λ1 = 2, λ2 = 21, λ3 = 2 and α = 1.

B. Multi-Region Segmentation For multi-region segmentation using our semi-color model (25) and full-color model (27) presented in Section II, the computation process involves far more complicated projections, as introduced in Section III-C. In most cases it is sufficient to use our much faster multi-region semi-color model (25). Results for 3-label, 4-label and 8-label segmentation using this approach are presented in Figure 4, Figure 5 and Figure 6. In Figure 7, we present a case where the full-color model yields superior results to the multi-region model, i.e. when the grayscale-converted input color image has very low contrast. C. Convergence Analysis The rate of convergence is essential for any iterative algorithm. As we are dealing with a convex-concave saddle point problem in (18), (25) and (27), it is safe to assume convergence when ∥v n+1 −v n ∥2 gets very ”low”. What ”low” means depends on the scenario at hand and cannot be answered in general. For example, ∥v n+1 −v n ∥2 scales linearly with the number of segments and also depends on the step width τ chosen for the al-

Fig. 6. 8-label segmentation: The input image also used in [15] at resolution 640 × 491 (left) and the initial solution based on k-means clustering (middle). The output of the algorithm (right) based on the semi-color model (25) is obtained in 1.4 seconds with  = 10−4 in 900 iterations. Parameter choices are λ1 = 2, λ2 = 3.5, λ3 = 1 and α = 1.

9

Fig. 7. Model comparison: In this artificially created case the input image (left) is segmented using the semi-color model (middle) and the full-color model (right). Due to the lack of image gradient information, the image data term takes the predominant role in the semi-color model (cf. Section II-A).

Fig. 8. Relative convergence: Root√ mean square deviation (RMSD) of successive primal variable values, i.e. ∥v n+1 − v n ∥22 /(resolution ⋅ k), plotted against number of iterations for four different test scenarios, presented in this paper: Binary segmentation (magenta) from Figure 3, 3-label segmentation (red) from Figure 4, 4-label segmentation (blue) from Figure 5 and 8-label segmentation from Figure 6.

gorithm (the smaller τ gets, the smaller the difference will be). The determining factor for when the algorithm can be stopped should be the actual visual change of the final segmentation output. Of course, this can only be measured in artificial test environments where a reference solution is known. Because of these observations, in Figure 8 we present graphs for the root mean square deviation (RMDS) of the primal variable, √ which is defined by ∥v n+1 −v n ∥22 /(resolution⋅k), depending on the number of iterations. The actual deviation of the output after each iteration from the reference solution can be seen in Figure 9.

Fig. 9. Absolute convergence: Same scenario as in Figure 8, but considering root mean square deviation of the primal variable to a reference solution. The difference of a few pixels after about 500 iterations in all four scenarios corresponds to  ≈ 10−4 as a stopping criteria (cf. Figure 8).

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

res. / seg. 128x128 256x256 512x512 1024x1024

2 0.08 0.08 0.19 0.65

4 ms ms ms ms

0.08 0.16 0.45 1.65

10

8 ms ms ms ms

0.14 0.38 1.28 4.99

16 ms ms ms ms

0.46 1.62 6.21 24.9

input

k-means(global) k-means(local) mean-shift

quick-shift

ms ms ms ms

TABLE I C OMPUTATION TIME : P ERFORMANCE IN TIME PER ITERATION OF OUR ALGORITHM FOR SEGMENTING THE FIREWORKS IMAGE ( PRESENTED IN F IGURE 1) FOR DIFFERENT RESOLUTIONS AND NUMBERS OF SEGMENTS USING THE SEMI - COLOR MULTI - REGION MODEL PRESENTED IN S ECTION II-D.

D. Computation Time Table I shows representative computation times for one image at different resolutions and segmentation into different numbers of segments with our algorithm implemented in CUDA. This shows that our algorithm scales very well with the number of pixels and the number of segments and makes high resolution image segmentation possible in a matter of seconds. In general, it is difficult to compare computation times to other approaches. First, different methods achieve at least slightly different segmentation results on one and the same input image. This is due to different mathematical models describing the segmenation process. For example, one major feature of our approach is the incorporation of the ”intensity variation on segments” term (cf. (1)), which to the best of our knowledge only occurs in a similar fashion in MumfordShah-related approaches like [7]. Second, the computation time depends to a great extent on the implementation of the algorithm. Highly parallelizable algorithms like ours benefit greatly by the efficient computing capabilities of a GPU. In our experiments we experienced a speedup of approx. factor 100 compared to an optimized Mathematica implementation of our algorithm. This is why we first want to focus on computation times of other approaches implemented in CUDA. In [15] the authors claim that typical runtimes for image segmentation problems for 512 × 512 images and 10 labels are in the range of 50 seconds on a Nvidia Tesla C1060 GPU. For the approach presented in [14] usual runtimes for 640 × 480 images and 5 labels are around 90 seconds, and for 320 × 240 images 7 seconds on a NVIDIA GTX 480 GPU. In [7] typical execution times for 128 × 128 images with the solution discretized on 32 levels are in the order of 600 seconds on on a Tesla C1060 GPU. All these computation times have been obtained using slightly older hardware than the NVIDIA GTX 780Ti we used for our experiments. Taking this into account, our algorithm still outperforms those approaches in terms of computation time. An implementation of the quickshift algorithm (cf. [25]), a method similar to mean-shift and hence completely different from our approach, only needs a few hundred milliseconds running on our setup (for further details see Section IV-E). Additionally, we want to compare our approach to popular CPU-based segmentation algorithms. According to [23] the statistical model for Mumford-Shah (SMMS) and multiphase Mumford-Shah (MPMS) CPU implementations take at least 50 seconds to compute, depending on the number of labels

Fig. 10. Comparison to Mean-Shift and K-Means Clustering: From left to right: The input image and the results obtained by k-means clustering (global optimum, local optimum), mean-shift (Wolfram Mathematica Implementation) and the quick-shift implemention of [25] (with additional grouping of regions and parameters experimentally chosen for best visual results).

and the image size. The segmentation algorithm gPb-owt-ucm presented in [22], which the Berkeley image segmentation benchmark (see Section IV-F) is built upon, has rather long computation times of at least 100 seconds on images with a moderate resolution. According to [26], optimized k-means clustering algorithms usually take only a few hundred milliseconds to compute, but are not guaranteed to converge to a global optimum. Efficient mean-shift algorithm implementations need at least a few seconds to compute (cf. [27]). E. Comparison to Mean-Shift and K-Means Clustering Mean-shift and k-means clustering are well known clustering algorithms, that are widely used for many applications, one of them being image segmentation. It is important to note though that those algorithms do not yield coherent image segments as their output. They rather produce clusters of data, which in the case of an image corresponds to clusters of pixels. Nevertheless, with appropriate grouping mechanisms and filters, their output can be used for image segmentation. For example, if the output contains some noise, denoising methods can be used create suitable image regions. For a short comparison we included Figure 10, which shows the same pictures as Figure 5 and Figure 4, but this time with applying k-means clustering and the mean-shift algorithm. There are many implementations and different variants of k-means clustering and mean-shift algorithms available. Computation times vary from a few seconds to a few hundred seconds depending on the algorithm used and do also depend heavily on the image size. The k-means clustering algorithm can be implemented in such a way that the computation time is very low. Unfortunately there is no guarantee that the algorithm converges to its global optimum, which is in general the desired result. [26] proposes a fast global k-means algorithm, which generates new initial solutions consecutively by running basic k-means clustering in each iteration. While there is no guarantee that the algorithm reaches the global optimum, it is fast and delivers in general better results than basic versions of the k-means algorithm, because it has a higher probability of reaching the global optimum of the clustering problem. Mean-shift algorithms usually have longer computation times and the number of clusters is in general determined

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

BSDS500 Covering PRI VI ODS OIS Best ODS OIS ODS OIS Human 0.72 0.72 0.88 0.88 1.17 1.17 gPb-owt-ucm 0.59 0.65 0.74 0.83 0.86 1.69 1.48 ours 0.47 0.59 0.74 0.77 0.83 2.32 1.78 Mean Shift 0.54 0.58 0.66 0.79 0.81 1.85 1.64 Felz-Hutt 0.52 0.57 0.69 0.80 0.82 2.21 1.87 Canny-owt-ucm 0.49 0.55 0.66 0.79 0.83 2.19 1.89 NCuts 0.45 0.53 0.67 0.78 0.80 2.23 1.89 Quad-Tree 0.32 0.37 0.46 0.73 0.74 2.46 2.32 TABLE II R ESULTS OF THE B ERKELEY IMAGE SEGMENTATION BENCHMARK . F OR EACH SEGMENTATION METHOD , THE LEFTMOST THREE COLUMNS REPORT THE SCORE OF THE COVERING OF GROUND - TRUTH SEGMENTS ACCORDING TO OPTIMAL DATASET SCALE (ODS), OPTIMAL IMAGE SCALE (OIS), OR B EST COVERING CRITERIA . T HE

11

input

gPb-owt-ucm

ours

RIGHTMOST FOUR COLUMNS COMPARE THE SEGMENTATION METHODS AGAINST GROUND - TRUTH USING THE P ROBABILISTIC R AND I NDEX (PRI) AND VARIATION OF I NFORMATION (VI) BENCHMARKS , RESPECTIVELY. F OR DETAILS SEE [22].

automatically, which makes it difficult to obtain a desired number of clusters. Amongst others, [27] shows a reasonably fast and reliable solution for mean-shift algorithms, which still needs a few seconds running on larger images. Quick-shift, a method similar in concept to the mean-shift algorithm yields faster solutions, though. [25] proposes an implementation of the quick-shift algorithm which runs on the GPU. On our setup it achieved average computation times of 300 milliseconds for images with a resolution of approximately 0.25 megapixels. Figure 10 shows that under certain circumstances, and with additional accumulation of regions (grouping mechanism), the quick-shift algorithm can obtain good image segmentation results.

Fig. 11. Results of our algorithm on images of the Berkeley segmentation dataset: The input image (left), the output of the gPb-owt-ucm (Berkeley) algorithm of paper [22] (middle) and the output of our algorithm (right) with automated parameter choices. The shown output serves as the input to the Berkeley image segmentation benchmark. The average computation time for the gPb-owt-ucm algorithm, using code supplied by the authors, is approximately 150 seconds, while our approach took an average of approximately 1 second per image on the dataset.

F. Berkeley Benchmark Results In the following we show how to adapt our algorithm to hierarchical segmentation and apply the most recent Berkeley Image Segmentation Benchmark BSDS500 [22]. This benchmark provides 500 images partitioned into a 200 image training set with manually created ground truth images, a validation dataset consisting of 100 images and a 200 image test set. Paper [22] also introduces a state-of-the-art image segmentation algorithm (gPb-owt-ucm) and the segmentation benchmark is heavily based on specific details and implementations of this algorithm. Ground truth data is obtained from segmentations done manually by multiple humans. It is important to note that the benchmark does neither take the computation time nor the coherency of the detected regions into account. It also does not require the contours to be closed. All of these are important characteristics of image segmentation algorithms which are used beyond pure object recognition and are major benefits of our approach. Nevertheless, we were able to rank among the top scoring algorithms, mentioned in [22] and Table II, and our algorithm is the fastest among those. For example, the best scoring gPbowt-ucm algorithm from the Berkeley group takes an average

of 150 seconds for each picture in the test dataset and, while our algorithm needs about 1 second to compute the final result. As the output of our algorithm is parameter-dependent we decided to compute optimal parameters on the supplied training data set and use those parameters on the final benchmark test dataset. We, however, consider the optimal choice of parameters as future work which will improve the results even further. All computations for this section were carried out using our algorithm with the semi-color model (25). To transform the output of our algorithm into a compatible format for the benchmark, we separated individual coherent regions from our segmentation results and built a tree-like structure, where separated regions are merged successively. This structure served as the input for the benchmark. The detailed results of the benchmark can be seen in Table II. Some results of our algorithm and Berkeleys own gPb-owt-ucmalgorithm on images of the Berkeley test dataset are shown in Figure 11. In general applying benchmarks in image segmentation is a nontrivial task and results are always only objective to

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

a certain degree. The authors of the Berkeley benchmark state in [28] that ”it is considerably easier to quantify the performance of computer vision algorithms at recognition than at segmentation” and that some researchers argue that ”segmentation or grouping performance can be evaluated only in the context of a task such as object recognition”. In addition to that, there are certain important facts to keep in mind: ● The benchmark does not take the computation time into account, so we can only compare our results in terms of quality. ● As our focus is on computation time, we did not construct any more sophisticated segmentation processes that might involve running the algorithm multiple times on one image. Those approaches might improve the quality and accuracy of the segmentation considerably, but contradict the focus of this paper. ● The benchmark incorporates different kind of measures to compare a certain segmentation of an image to the ground truth segmentation. The most important one being covering-measure, i.e. measuring of region overlap. ● Measurement takes place on a local per pixel level, so the coherency of regions is not taken into accout. This also explains why algorithms like mean-shift are able to obtain a rather high score. ● A segmentations contour must not be closed to be applicable for the benchmark. ● The benchmark favors a structure, where segments with the same label belong to one coherent region in the image. This is in contrast to our labeling scheme, which we consider a feature of our approach. ● The output of the segmentation algorithm presented in this paper depends considerably on the choices of the parameters available, especially λ2 and λ3 . ● The segmentation algorithm presented in [22], which the benchmark is built upon, has long computation times, taking 100 seconds or longer for average sized images of 0.15 megapixels. ● In general, we consider the choice of optimal parameters for our algorithm as future work. The same holds true for additional user input and the choice of pixel dependent parameters, like λi (x) (i = 1,...,3). This will improve the quality further. The rather strong dependency on the parameters also shows in lower ODS (optimal dataset scale) values for our approach. V. C ONCLUSION In this paper we have introduced a very fast method for unsupervised, gradient based image segmentation. We have evaluated our approach on different test cases, and achieve high-quality results at a fraction of the usually required computation time. This work is intended as a proof of concept of the ideas behind the modeling and transformation process presented in Section II. We feel that these ideas can be applied to a more general class of optimization problems that tries to minimize gradient norms specifically excluding boundaries. While the presented approach works autonomously, additional user input can be incorporated to increase the visual quality of

12

the segmentation. For example, it would be an interesting and useful extension to let the user influence the weights for each energy term in the objective function locally at each pixel. Because of the short computation times it would also be interesting to see real time interaction of the user with the algorithm via some sort of GUI. The presented approach is independent of the domain dimension, so for future work we plan to consider 3D image segmentation and video segmentation. From a mathematical standpoint we would like to investigate the justification of the convex relaxation process used to obtain a convex-concave saddle point formulation in Section II and the detailed influence of the approximative projection scheme presented in Section III-C on convergence of the algorithm. In addition, the full-color model presented in Section II-D could be extended to have connected restriction sets (similar to the work of Strekalovskiy et. al. [8] on the Mumford-Shah functional), which would further improve quality. A PPENDIX A D ISTRIBUTIONAL D ERIVATIVE OF F UNCTIONS OF B OUNDED VARIATION Functions of bounded variation (BV) have many interesting analytical properties. The one most relevant for this paper is the decomposition of their distributional derivative. Let u ∈ BV, then according to [29] the distributional derivative Du has the following Lebesgue-Nikodym decomposition: Du = ∇u+Cu+(u+ −u− )νu dHn−1 ⌞Ju

(42)

In the above decomposition ∇u denotes the absolutely continous part with respect to Lebesgue measure on Ω of the measure Du. Cu indicates the so-called Cantor part of the measure Du. Both parts, ∇u and Cu, do not charge any dHn−1 finite sets, with dHn−1 denoting n−1-dimensional Hausdorff measure. Only the last part, i.e. (u+ −u− )νu dHn−1 ⌞Ju , charges dHn−1 finite sets, with Ju being the jump set of the function u, νu being the normal vector to this jump set pointing in the ascending direction, and u+ > u− being left and right limit values of the function u at the jumping point. For binary functions - and in general piecewise constant functions - the decomposition (42) consists for obvious reasons only of the jump part, i.e. for the binary function v ∶ Ω → {0,1} we have: Dv = (v + −v − )νv dHn−1 ⌞Jv

(43)

A rather elaborate overview of BV functions and the above decomposition can be found in [12]. ACKNOWLEDGMENT The research leading to these results has received funding from the European Unions Seventh Framework Programme FP7/2007-2013 under grant agreement no. 256941, Reality CG. R EFERENCES [1] M. Wertheimer, Laws of organization in perceptual forms. Harcourt, Brace & Jovanovitch, 1938. [2] K. W. T. Jr, B. L. Bhaduri, E. A. Bright, A. Cheriyadat, T. P. Karnowski, P. J. Palathingal, T. E. Potok, and J. R. Price, “Automated Feature Generation in Large-Scale Geospatial Libraries for Content-Based Indexing,” 2006.

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015

[3] A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to natural image matting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 228–242, 2008. [4] I. Kokkinos and P. Maragos, “Synergy between object recognition and image segmentation using the expectation-maximization algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1486–1501, 2009. [5] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth functions and associated variational problems,” Comm. Pure Appl. Math., vol. 42, no. 5, pp. 577–685, 1989. [6] G. Alberti, G. Bouchitt, and G. Dal Maso, “The calibration method for the mumford-shah functional and free-discontinuity problems,” Calculus of Variations and Partial Differential Equations, vol. 16, no. 3, pp. 299–333, 2003. [7] T. Pock, D. Cremers, H. Bischof, and A. Chambolle, “An algorithm for minimizing the mumford-shah functional,” in Computer Vision, 2009 IEEE 12th International Conference on, pp. 1133–1140, Sept 2009. [8] E. Strekalovskiy, A. Chambolle, and D. Cremers, “A convex representation for the vectorial mumford-shah functional,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Providence, Rhode Island), June 2012. [9] L. A. Vese and T. F. Chan, “A multiphase level set framework for image segmentation using the mumford and shah model,” Int. J. Comput. Vision, vol. 50, no. 3, pp. 271–293, 2002. [10] T. F. Chan and L. A. Vese, “Active contours without edges.,” IEEE Transactions on Image Processing, vol. 10, no. 2, pp. 266–277, 2001. [11] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours.,” in ICCV, pp. 694–699, 1995. [12] A. Chambolle, D. Cremers, and T. Pock, “A convex approach to minimal partitions,” SIAM Journal on Imaging Sciences, vol. 5, no. 4, pp. 1113–1158, 2012. [13] J. Lellmann, J. Kappes, J. Yuan, F. Becker, and C. Schn¨orr, “Convex multi-class image labeling by simplex-constrained total variation,” in Proceedings of the Second International Conference on Scale Space and Variational Methods in Computer Vision, SSVM ’09, (Berlin, Heidelberg), pp. 150–162, Springer-Verlag, 2009. [14] E. Strekalovskiy and D. Cremers, “Generalized ordering constraints for multilabel optimization,” in Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11, (Washington, DC, USA), pp. 2619–2626, IEEE Computer Society, 2011. [15] T. Pock, A. Chambolle, H. Bischof, and D. Cremers, “A convex relaxation approach for computing minimal partitions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Miami, Florida), 2009. [16] D. Cremers, T. Pock, K. Kolev, and A. Chambolle, “Convex relaxation techniques for segmentation, stereo and multiview reconstruction,” in Markov Random Fields for Vision and Image Processing, MIT Press, 2011. [17] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imaging Vis., vol. 40, no. 1, pp. 120–145, 2011. [18] T. Pock and A. Chambolle, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1762–1769, Nov 2011. [19] D. Lorenz and T. Pock, “An accelerated forward-backward algorithm for monotone inclusions,” 2014, 1403.3522. [20] R. Rockafellar, Convex Analysis. Convex Analysis, Princeton University Press, 1997. [21] A. Chambolle, “An algorithm for total variation minimization and applications,” J. Math. Imaging Vis., vol. 20, pp. 89–97, Jan. 2004. [22] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, pp. 898–916, May 2011. [23] B.-W. Hong, Z. Lu, and G. Sundaramoorthi, “A new model and simple algorithms for multi-label mumford-shah problems,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, 2013. [24] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preserving decompositions for multi-scale tone and detail manipulation,” in ACM SIGGRAPH 2008 Papers, SIGGRAPH ’08, (New York, NY, USA), pp. 67:1–67:10, ACM, 2008. [25] B. Fulkerson and S. Soatto, “Really quick shift: Image segmentation on a GPU,” in Trends and Topics in Computer Vision - ECCV 2010 Workshops, Heraklion, Crete, Greece, September 10-11, 2010, Revised Selected Papers, Part II, pp. 350–358, 2010.

13

[26] J. Xie and S. Jiang, “A simple and fast algorithm for global k-means clustering,” Education Technology and Computer Science, International Workshop on, vol. 2, pp. 36–40, 2010. [27] P. Wang, D. Lee, A. G. Gray, and J. M. Rehg, “Fast mean shift with accurate and stable convergence.,” in AISTATS (M. Meila and X. Shen, eds.), vol. 2 of JMLR Proceedings, pp. 604–611, JMLR.org, 2007. [28] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in in Proc. 8th Intl Conf. Computer Vision, pp. 416–423, 2001. [29] H. Attouch, G. Buttazzo, and G. Michaille, Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization. MPS-SIAM Series on Optimization, Society for Industrial and Applied Mathematics, 2006. Benjamin Hell received his Masters degree in Industrial Mathematics in 2011 from the University of Bayreuth, Germany. He is currently pursuing his PhD in Computer Science at the Computer Graphics Lab at the TU Braunschweig. His research interests include optimization and geometry.

Marc Kassubeck received his Bachelor degree in Computer Science in 2013 from TU Braunschweig, Germany. He is currently pursuing his Masters degree in Computer Science at TU Braunschweig. His research interests include image processing and applied mathematics.

Pablo Bauszat received his Diploma degree in Computer Science from TU Braunschweig, Germany, in 2011. He is currently pursuing his PhD in Computer Science at TU Braunschweig, Germany. His research interests include real-time rendering, ray tracing, light transport simulation and perceptual rendering.

Martin Eisemann received a Diploma degree in Computational Visualistics from the University of Koblenz-Landau, Germany, in 2006 and his PhD degree in Computer Graphics from the TU Braunschweig, Germany, and received the best student paper award at the annual conference of the European Association for Computer Graphics (Eurographics) in 2008. 2011 he became Postdoctoral Researcher at the Computer Graphics Lab at the TU Braunschweig. In 2014 he spent one semester as Postdoctoral Researcher at the TU Delft. Since 2015 he is professor at the TH K¨oln. His main research interests include image- and video-based rendering and editing, visual analytics, and realistic and interactive rendering. Marcus Magnor is full professor and head of the Computer Graphics Lab at TU Braunschweig. He holds a Diploma degree in Physics (1997) and a PhD in Electrical Engineering (2000). After his postgraduate time as Research Associate in the Graphics Lab at Stanford University, he established his own research group at the Max-Planck-Institut Informatik in Saarbr¨ucken. He completed his habilitation and received the venia legendi in Computer Science from Saarland University in 2005. In 2009, he spent one semester as Fulbright scholar and Visiting Associate Professor at the University of New Mexico. His research interests meander along the visual information processing pipeline, from image formation, acquisition, and analysis to image synthesis, display, perception, and cognition.