A Weighted Mean Shift, Normalized Cuts Initialized Color Gradient ...

6 downloads 27 Views 2MB Size Report
In case of several diseases such as prostate2–4 and breast cancer,2 ..... solving for when the density gradient is zero and the Hessian is negative semi-definite, the ..... Seventh IEEE International Conference on Computer Vision The, 2, pp. ... and J. Tomaszewski, “Segmentation and classification of triple negative breast.
A Weighted Mean Shift, Normalized Cuts Initialized Color Gradient Based Geodesic Active Contour Model: Applications to Histopathology Image Segmentation Jun Xua , Andrew Janowczyka,b , Sharat Chandranb , Anant Madabhushia a Department b Department

of Biomedical Engineering, Rutgers University, USA; of Computer Science and Engineering, Indian Institute of Technology (IIT) Bombay, Mumbai, India ABSTRACT

While geodesic active contours (GAC) have become very popular tools for image segmentation, they are sensitive to model initialization. In order to get an accurate segmentation, the model typically needs to be initialized very close to the true object boundary. Apart from accuracy, automated initialization of the objects of interest is an important pre-requisite to being able to run the active contour model on very large images (such as those found in digitized histopathology). A second limitation of GAC model is that the edge detector function is based on gray scale gradients; color images typically being converted to gray scale prior to computing the gradient. For color images, however, the gray scale gradient results in broken edges and weak boundaries, since the other channels are not exploited for the gradient determination. In this paper we present a new geodesic active contour model that is driven by an accurate and rapid object initialization scheme-weighted mean shift normalized cuts (WNCut). WNCut draws its strength from the integration of two powerful segmentation strategies–mean shift clustering and normalized cuts. WNCut involves first defining a color swatch (typically a few pixels) from the object of interest. A multi-scale mean shift coupled normalized cuts algorithm then rapidly yields an initial accurate detection of all objects in the scene corresponding to the colors in the swatch. This detection result provides the initial boundary for GAC model. The edge-detector function of the GAC model employs a local structure tensor based color gradient, obtained by calculating the local min/max variations contributed from each color channel (e.g. R,G,B or H,S,V). Our color gradient based edge-detector function results in more prominent boundaries compared to classical gray scale gradient based function. We evaluate segmentation results of our new WNCut initialized color gradient based GAC (WNCut-CGAC) model against a popular region-based model (Chan & Vese) on a total of 60 digitized histopathology images. Across a total of 60 images, the WNCut-CGAC model yielded an average overlap, sensitivity, specificity, and positive predictive value of 73%, 83%, 97%, 84%, compared to the Chan & Vese model which had corresponding values of 64%, 75%, 95%, 72%. The rapid and accurate object initialization scheme (WNCut) and the color gradient make the WNCut-CGAC scheme, an ideal segmentation tool for very large, color imagery.

1. INTRODUCTION Active contour (AC) models have emerged as popular segmentation tools for separating the objects/structures of interest from the background via continuously deformable curves.1 Most active contour methods deform in order to delineate the boundaries of the desired objects in the image through minimizing an energy functional. These active contour models are able to accurately capture the shape of the object and hence enable extraction of higher-level shape and morphological features. In case of several diseases such as prostate2–4 and breast cancer,2 shape and morphological attributes constitute important cues, reflecting the aggressiveness of the disease. With the recent advent and cost-effectiveness of whole-slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form.2, 3 Digital pathology makes computerized quantitative analysis of histopathology imagery possible. In the context of prostate cancer, gland morphology is known to be Further author information: (Send correspondence to Jun Xu or Anant Madabhushi) Jun Xu: E-mail: [email protected] (or [email protected]), Telephone: 732-445-4500 (ext.6261) Anant Madabhushi: E-mail: [email protected], Telephone: 732-445-4500 (ext.6213) Medical Imaging 2010: Image Processing, edited by Benoit M. Dawant, David R. Haynor, Proc. of SPIE Vol. 7623, 76230Y · © 2010 SPIE · CCC code: 1605-7422/10/$18 · doi: 10.1117/12.845602

Proc. of SPIE Vol. 7623 76230Y-1 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

highly correlated with Gleason grade which in turn reflects the degree of malignancy;4 with lower Gleason grades corresponding to less aggressive disease and higher Gleason grades corresponding to higher degree of invasiveness. The size and shape of glands found in low Gleason grade patterns tend to be less variable compared to the higher Gleason grade patterns which tend to manifest glands that are often angular, fused, or characterized by a complete absence of glandular lumen.4 The Bloom Richardson (BR) scheme5 is similarly a popular grading scheme for breast cancer. One of the criteria for the BR scheme is the morphology of histological structures (e.g. nuclei and glands). Grading of prostate and breast cancer histopathology is subject to both interobserver and intraobserver variability.2–5 Hence there is a clear need for accurate and reproducible quantitative techniques for computerized grading of cancer histopathology. An important pre-requisite to such a computerized grading scheme, however, is the ability to accurately and efficiently segment histological structures (e.g. glands and nuclei) of interest. While active contour are a good candidate for this task, most deformable model based approaches are limited in terms of their ability to segment very large, color images. The reasons for this are two fold. Firstly, in the case of most boundary based active contour schemes, the energy function is dependent on the gray scale intensity gradient.6 In other words, most schemes convert color images into an equivalent gray scale representation and hence do not exploit the color tensor information present in these images.6 A second limitation of most active contour scheme is their inability to handle very large images without careful model initialization.7 A prostate cancer needle core biopsy digitized at 40x magnification could results in an image that is > 2 GB in size.2, 3 Hence there is a need to be able to rapidly identify the target areas of interest within these very large images in order to initialize and run most active contour models. In this paper, we present a geodesic active contour scheme that employs (a) an accurate, efficient, and minimally interactive model initialization scheme called weighted mean shift based normalized cuts (WNCut),8 and (b) a local structure tensor based color gradient, obtained by calculating the local min/max variations contributed from each color channel (e.g. R, G, B or H, S,V), resulting in stronger object boundaries compared to the gray scale gradient. Figure 1 illustrates the flowchart showing the working of the WNCut initialized color gradient based (WNCut-CGAC) model.

Select color swatch

Frequency weighted mean shift clustering

Normalized cut

Weighted mean shift normalized cut (WNCut) initialization

Final boundary

Co!or gradient based GAC model

Initial boundary

Color gradient based GAO model Figure 1. Flowchart illustrating the modules comprising the WNCut-CGAC model.

The rest of this paper is organized as follows: In Section 2, we discuss previous related work and our novel contributions. In Section 3, we provide the details of our new WNCut-CGAC model. In Section 4, we describe

Proc. of SPIE Vol. 7623 76230Y-2 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

the datasets and experimental design followed by results of qualitative and quantitative evaluation. Concluding remarks and future research directions are presented in Section 5.

2. PREVIOUS RELATED WORK Based on the type of image information used to drive the model, AC models may be categorized as either (a) boundary-6 or (b) region-based.9 The geodesic active contour (GAC) model proposed by Caselles et. al.6 is an important boundary-based AC model. Beginning with a user specified initial boundary, boundary-based GAC models utilize a positive-decreasing gradient function as the stopping criteria. This attracts the contour towards edges of the desired objects. The edge-detector function is a positive-decreasing function, defined as 1 g(f (c)) = 1+s(f (c)) , where s(f (c)) is the magnitude of the gradient at every pixel in the image. The minima of the function g(f (c)) is achieved, as the gradient magnitude s(c) approaches the maximal value at the image edges. When this happens, the curve stops its evolution on right of the edge of the desired object. One limitation of boundary-based GAC models is that they are highly dependent on the edge-detector function. Most boundary based active contour models6, 10, 11 define the function g(f (c)) as the gradient of the gray scale image. For color images, the common approach is to convert the image into its corresponding gray scale representation by eliminating 2 of the 3 color channels (e.g. removing the hue and saturation channels while retaining the luminance channel). The directional gradient is then calculated from the gray scale image obtained by converting the single color channel. However, this conversion procedure results in broken edges and weak boundaries due to the lose of information from the other color channels. This limitation of GAC models in employing gray scale gradients can be appreciated in Figure 3 (c). Broken edges and weak boundaries adversely affect the curves’ evolvement, such as causing it miss the boundaries of objects whose gradients are not large enough. The other limitation is that the model, in general, lacks robustness against the position of the initial boundary.9 Recently, region-based active contour (RAC) models have been proposed to address some of the limitations of GAC models. The region-based model essentially employs statistical information derived from different regions (foreground and background) to drive the active contour model. An early region-based model called the region competition model was proposed in,12 where the generalized Bayes and minimum description length (MLD) criteria were utilized to drive the AC model. Inspired by the Mumford-Shah functional,13 the AC model without edges (CV model) and RAC models were proposed in9, 14 and,15 respectively. More recently, Rousson et. al. proposed a generalized CV model.16 The parametric distribution of each region is approximated by a Gaussian, using the mean and variance intensities from each region. The Chan & Vese’s (CV) region-based active contour model9 is a very popular region-based scheme that utilizes the average image intensity of the foreground and the background to drive the model. Compared to the classical boundary-based AC model, the CV model is independent of the edge-detector function and does not require precise initialization. However, the CV model has its limitations. For instance, the model may lead to inaccurate boundaries if the boundary information is ignored. Moreover, the model assumes that the image has two regions which is sometimes infeasible. If the background of the image is too complicated such as in digitized histopathology (see Figure 4 (a)), the CV model may not segment the region of interest. In fact even in scenarios where the background is not very complicated (see Figure 2) , the CV model may latch onto the incorrect boundary. Figure

(a)

(b)

(c)

Figure 2. (a) CT image of chest, and corresponding results for the lung for the (b) Chan & Vese and (c) traditional GAC model.

2(a) shows a CT image of the human chest. In this application the objective was to segment the lungs. In this

Proc. of SPIE Vol. 7623 76230Y-3 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

experiment, the initialization of the CV model was manually performed within the two pleural cavities. The CV model ends up segmenting the bony structures which are outside the lung cavity. The resultant segmentation results are shown via red contours in Figure 2(b). On the other hand, a GAC model with the same initialization as the Chan & Vese model is able to accurately latch onto the lung boundary6 (Figure 2 (c)). While Hybrid AC models have been proposed to combine boundary-based and region-based models to overcome the initialization problems associated with both models, they also sometimes fail without accurate initialization. Without accurate initialization, they are also constrained like most boundary and region based models in that they are unable to segment multiple objects in very large images. This maybe explain, why up until now, there have been relatively few attempts to apply shape based segmentation tools to digitized histopathology imagery.2, 3 In,17 an expectation-maximization (EM) algorithm based method was utilized for automatically detecting the centers of lymphocytes in the breast cancer histopathology images. The initial contours for the evolution of the partial differential equation (PDE or curve evolution function) were defined with these detected centers. In,18 Naik et. al. presented a level set based method for gland segmentation from digitized histopathology. The level set was initialized by the likelihood scenes generated by a Bayesian classifier. In,19 we successfully employed a GAC model, initialized via an EM model for breast lesion segmentation on dynamic contrast-enhanced magnetic resonance imaging (DEC-MRI). None of the initialization schemes proposed above, however, are able to address the demands of on the fly, rapid and efficient model segmentation of a specific target of interest on very large images. In this paper we present a boundary based GAC model that employs a weighted mean shift based normalized cuts (WNCut) initialization scheme8 for rapid, minimally supervised, specification of the target of interest. By simply specifying a few pixels from the object of interest, the WNCut scheme can be used to rapidly identify all related and similar objects within the image. By specifying representative pixels from a different object, WNCut could similarly be used to initialize the GAC model to segment a different target. The WNCut scheme use a hierarchically represented data structure that bridges the mean shift20 clustering and normalized cuts21 algorithms. This allows WNCut to efficiently traverse a pyramid of the input image at various color resolutions, efficiently and accurately pre-segmenting the object class of interest. A second important attribute of our new GAC scheme is that it employs a color gradient function to drive the boundary based AC model. For simplicity, we employ the abbreviation WNCut-CGAC to represent our weighted mean shift, normalized cuts initialized color gradient based geodesic active contour model for the remainder of this paper.

3. METHODOLOGY Table 1. Description of Notation Symbol

Description

Symbol

Description

C

2D image scene

C

2D Cartesian grid of pixels c = (x, y)

f (c)

function that assigns intensity values to pixel c

wk,j

the j−th element of weight vector wk at level k



bounded open set in R2   +∞, φ(c) = 0; Delta function δ(φ) =  0, φ(c) 6= 0.

C H(φ) Ωf φ(t; c)

the zero level set C = {c ∈ Ω : φ(c) = 0}   1, φ(c) ≥ 0; Heavside function H(φ) =  0, φ(c) < 0.

δ(φ)

foreground region Ωf = {c ∈ Ω : φ(c) > 0}

Ωb

the level set function

A(·)

background region Ωb = {c ∈ Ω : φ(c) < 0} the set of pixels within boundary

3.1 Weighted Mean Shift based Normalized Cuts initialization scheme The Weighted Mean Shift based Normalized Cuts (WNCut) scheme was originally presented in8 for rapidly and accurately segmenting the object class of interest. The scheme was found to perform comparably to supervised classification algorithms such as Probabilistic Boosting Tree (PBT).22 The strength of WNCut is derived from the fact that it marries the best of both a novel weighted mean shift clustering20 and the Normalized Cuts algorithm.21 The mean shift algorithm is used to detect modes in data using a density gradient estimator. By

Proc. of SPIE Vol. 7623 76230Y-4 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

solving for when the density gradient is zero and the Hessian is negative semi-definite, the local maxima can be identified. Normalized cuts (NCuts) is a graph partitioning method.23 The hierarchical pyramid created by mean shift and corresponding to various levels of color resolution, serves as the initial input to the NCuts algorithm. NCuts takes a connected graph with vertices and edges and partitions the vertices into disjoint groups. By setting vertices to the set of color values, and having the edges represent the similarity (or affinity) between the color values, the vertices can be separated into distinct groups, each of which are comprised of similar color. By operating in the color, as opposed to the spatial domain (on pixels), the scheme is very fast. The scheme is outlined in the following three steps 1. User selects the domain swatch A user via manual selection defines a color swatch S from the color function f such that S1 = {f1,α |α ∈ {1, . . . , N }} creates a selection of color values that are representative of the object of interest taken from a representative image. Hence the swatch represents a unique set that are selective of the object of interest. 2. Weighted mean-shift clustering on a multi-resolution color pyramid In this step, the weighted mean-shift algorithm is employed to generate multiple levels of a pyramidal scene representation Ck = (C, fk ), where k ∈ {1, . . . , K} represent K levels of color pyramid produced at each iteration and fk is the function that assigns a color value to a pixel at level k. At each level k, the values in Fk are considered unique under the constraint that any two values are equivalent if kfk,i − fk,j k ≤ ε, where ε is ˆ k can be constructed from fk , where F ˆ k ⊂ Fk a pre-defined similarity constraint. As a result, the vector F ˆ and Fk is a set of only the unique values present Fk . The weight vector wk = {wk,1 , wk,2 , . . . , wk,Mk } ˆ k is computed as associated with F ˆk| |F

wk,j =

X

wk−1,i

(1)

i=1,fk,i =fˆk,j

where j ∈ {1, . . . , Mk }. Intuitively, Equation (1) is summing the weights from the previous level into the new unique values that resulted from the next iteration of convergence. As a result, wk,j contains a count of the number of original pixels that have migrated to fˆk,j through mean shifting. Here ˆ k | = Mk |wk | = |F

(2)

and Mk X

wk,i = N

(3)

i=1

Then based on the weight vector (1), the fixed point iteration update becomes PMk wk,i fˆk,j G(fˆk,j − fˆk,i ) fk+1,j ← i=1 PMk G(fˆk,j − fˆk,i )

(4)

i=1

where Gaussian function G with a bandwidth parameters σ is defined as G(fˆk,j − fˆk,i ) = exp(−

kfˆk,j − fˆk,i )k2 ). σ2

(5)

We use the Gaussian function to compute the kernel density estimate at color data point fˆk,j . When converged, the shifted image is a naturally clustered version of the original image using many fewer colors. 3. Normalized cuts on weighted mean shift reduced color space ˆ K are easily analyzed Normalized Cuts21 is employed to small number of unique values in the bottom level F by Normalized Cuts to remove all miscellaneous information that is not considered similar to the user selected domain swatch which was obtained in Step 1. The resulting segmentation makes for an excellent initialization state for the active contour models.

Proc. of SPIE Vol. 7623 76230Y-5 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

3.2 Geodesic active contour model 3.2.1 Energy functional Instead of defining the energy functional on the space of contours as in,6, 10, 24 we define the energy functional on the space of a level set function. Assume the image plane Ω ∈ R2 is partitioned into 2 non-overlapping regions by a zero level set function φ. The foreground region Ωf , background region Ωb and the curve C are defined in Table 1. The relationship among them are Ω = Ωf ∪ Ωb ∪ C,

(6)

Ωf ∩ Ωb = ∅,

(7)

and

where Ωf and Ωb represent the set of image locations corresponding to the foreground object and background regions, respectively. The optimal partition of the image plane Ω by a zero level set function φ can be obtained through minimizing the energy functional as follows, E(φ) = E1 (φ) + E2 (φ), Z Z = α g(f (c))dc + β C

g(f (c))dc.

(8)

Ωf

In Equation (8), the first term E1 (φ) is the energy functional of a traditional GAC model, obtained as the integral of an edge detector function g(f (c)), for each pixel c over the curve C. This external image force pushes or attracts the curve C to the high gradient regions. Minimization of this energy term is equivalent to minimizing the weighted Euclidean length of the curve C. The second term E2 (φ) is inspired by the balloon force proposed in,10 which is an area minimization term. The inflation force, like a balloon, stops the curve C when the edge of the objects is strong, or allows it to pass through if the edge is too weak with respect to the inflation force.10 Minimization of this term is equivalent to minimizing the weighted foreground areas enclosed by the curve C. Note that the edge detector function in the traditional GAC model and the balloon force are based on the calculation of the gray level gradient of the image, such as the Canny-Deriche edge extractor in.10 In this paper, the edge-detector function is based on the color gradient. g(f (c)) is the color gradient based edge-detector function which is defined as 1 g(f (c)) = (9) 1 + s(f (c)) where s(f (c)) is the local structure tensor based color gradient which will be defined in Section 3.3. By employing the Heavside function H(φ), we can unify two integrals in Equation (8) as9, 25 Z Z E(φ) = α g(f (c))k∇H(φ)kdc + β g(f (c))H(φ)dc, Ω

(10)



where c ∈ Ω. Using the fact k∇H(φ)k = δ(φ(f (c)))k∇φk,9, 26 we finally get the energy function as, Z Z E(φ) = α g(f (c))δ(φ(f (c)))k∇φkdc + β g(f (c))H(φ)dc. Ω

(11)



3.2.2 Curve evolution function of GAC model Based on the theory of the calculus of variations,27 the curve evolution function can be derived from the level set framework by minimizing the energy functional (11). The curve evolution function is defined by the following partial differential equation (PDE): ( h i ∂φ ∇φ ∂t = δ(φ){αdiv g(f (c)) k∇φk − βg(f (c))}, (12) φ(0, c) = φ0 (c), where α and β are positive constant parameters, δ(φ) is the Delta function (see Table 1), and φ0 (c) is the initial curve of the evolution function. Note that this initialization (φ0 ) is obtained via the WNCut segmentation result (see Section 3.1).

Proc. of SPIE Vol. 7623 76230Y-6 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

3.3 Color gradient based edge-detector function The color gradient based active contour model has been proposed in.28, 29 A major difference between the WNCutCGAC model and the color gradient vector flow snake in29 (where the color gradient serves as an external force to drive the snake) is that in WMCut-CGAC, the color gradient serves as the edge detector function. The color gradient function employed in WNCut-CGAC is inspired by the Cumani operator,30 a second-order differential operator for for vectorial images. The Cumani operator is based on the Di Zenzo multi-valued geometry.31 For a color image C = (C, f ) in color space, the L2 norm of f can be written in matrix form as 2

df =



dx dy

T 

g11 g21

g12 g22



dx dy



,

(13)

where ∂f T ∂f ∂f1 2 ∂f2 2 ∂f3 2 ) ( )=( ) +( ) +( ) ∂x ∂x ∂x ∂x ∂x ∂f ∂f ∂f1 ∂f1 ∂f2 ∂f2 ∂f3 ∂f3 g12 = g21 = ( )T ( ) = · + · + · (14) ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂f ∂f ∂f ∂f1 2 ∂f2 2 ∂f3 2 ∂f ) +( ) +( ) g22 = ( )T ( ) = ( )T ( ) = ( ∂y ∂y ∂y ∂y ∂y ∂y ∂y   g11 g12 The matrix [gij ] = is the first fundamental form in color space and also referred to as the local g21 g22 structure tensor. It locally sums the gradient contributions from each image channel. Here f1 , f2 and f3 are intensities of each channel for C. For the matrix [gij ],the maximum and minimum eigenvalues of the matrix (λ+ and λ− ) representing the extreme rates of change in the direction of their corresponding eigenvectors. λ+ and λ− may be formally expressed by

1'

w

g11 = (

(a)

(b)

(c)

Figure 3. (a) Original color image of a prostate, and corresponding, (b) color gradient, and (c) gray scale gradient obtained after converting the color image in (a) to its gray scale representation.

λ± = (g11 + g22 ± where

p △)/2,

2 △ = (g11 − g22 )2 + 4g12 .

(15)

(16)

The color gradient is defined as28 s(f (c)) =

p λ+ − λ− .

Proc. of SPIE Vol. 7623 76230Y-7 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

(17)

q 2 ∂ fi ∂ 2 fi From equations (13)-(17), it is easy to show that the gray scale gradient ∂x2 + ∂y 2 , where i ∈ {1, 2, 3}, (widely employed for edge detection) is a special case of the color gradient s(·). Note that the methodology for computing the color gradient described above could be easily applied to different vectorial color representations such as RGB, HSV, Luv and so on. Figure 3 illustrates the role and importance of the color gradient function (17) in driving the curve evolution function for an active contour model. the color gradient representation (Figure 3(b)) for the color image corresponding to digitized prostate histopathology (Figure 3 (a)) results in more prominent boundaries compared to the corresponding gray scale gradient (Figure 3 (c)).

4. EXPERIMENTAL RESULTS AND DISCUSSION 4.1 Data We evaluated the WNCut-CGAC model against the Chan & Vese model9 on a total of 60 images obtained from three datasets from two different institutions. The first dataset comprised of 19 different Hematoxylin & Eosin (H&E) stained prostate biopsy samples obtained from the Hospital at the University of Pennsylvania (UPENN). Each of samples were digitized at 20x optical magnification using an Aperio whole-slide digital scanner. The second dataset comprising of 24 images of whole-mount histological sections of radical prostatectomies, also obtained from the Hospital at UPENN. The third dataset comprised of 17 H & E stained breast histopathology biopsy samples collected from The Cancer Institute of New Jersey. For all 3 datasets, the objective was to segment the glandular regions. Since it was impossible to have an expert pathologist manually segment each and every gland in each of the 60 images (to provide ground truth for quantitative evaluation), the expert was asked to randomly pick 2 glands on each image and segment the boundaries for only those structures. Hence WNCut-CGAC and the Chan & Vese model were quantitatively evaluated in terms of their ability to accurately segment the boundaries of approximately 100 glands from 60 different images.

4.2 Performance measures The gland segmentation results of the WNCut-CGAC model and Chan & Vese model were evaluated in terms of overlap (OL), sensitivity (SN), specificity (SP) and positive predictive value (PPV). For each image, the set of pixels corresponding to the manual delineation of the gland is denoted as A(G). Set of boundary pixels corresponding to the segmentations obtained from the WNCut-CGAC and Chan & Vese models are defined as A(S). OL, SN, SP, and PPV are then defined as 1. Overlap (OL) =

|A(S)∩A(G)| |A(S)∪A(G)| ,

2. Sensitivity (SN) =

|A(S)∩A(G)| , |A(G)|

3. Specificity (SP) =

|C−A(S)∪A(G)| , |C−A(G)|

4. Positive Predictive Value (PPV) =

|A(S)∩A(G)| , |A(S)|

Note that, for the CV model, the model is randomly initialized.

4.3 Qualitative results The qualitative segmentation results for the WNCut-CGAC and Chan & Vese model for images from the three datasets are shown in Figures 4, 5 and 6, respectively. Figures 4(c), Figure 5(c), and Figure 6(c) reveal that WNCut is able to successfully identify all image pixels whose colors match those in the user defined color swatch. Following WNCut initialization, our WNCut-CGAC model is able to accurately segment the gland boundaries (see Figures 4(f), 5(f), and 6(f)) compared to the CV model (see Figures 4(e), 5(e), and 6(e)). Figures 4(e) reveal the inability of the Chan & Vese model to accurately segment the glands, segmenting neighboring nuclei instead. One possible reason for the relative poor performance of the CV model is that it assumes that the image comprises of only two regions, the dark area being assumed to be the foreground and everything else being the

Proc. of SPIE Vol. 7623 76230Y-8 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

(a)

(c)

(e)

(f)

k tØ4

(b)

a

(d)

Figure 4. (a) Cropped ROI from prostate biopsy image which includes multiple glands; (b)the edges detected by color gradient based edge-detector function; (c) initial segmentation results by WNCut scheme; (d) initial contour φ0 (c) for the evolution of the PDE obtained from WNCut find segmentation results from the (e) CV, and (f) WNCut-CGAC models respectively.

background. The WNCut initialization scheme allows the WNCut-CGAC model to focus only the objects of interest and prevents the contour from enclosing undesired objects (see Figures 4(d), 5(d), and 6(d)). Further, Figures 4(b), 5(b), and 6(b) reveal the additional strength conferred on the WNCut-CGAC model by using the color gradient to drive the active contour’s energy functional.

4.4 Quantitative results Quantitative segmentation results for the three datasets for the CV and WNCut-CGAC models are summarized in Table 2. In terms of all 4 measures, the WNCut-CGAC model easily outperformed the CV model across the 60 images; all differences being statistically significant.

CV WNCut-CGAC

Overlap (OL) 0.47 0.60

CV WNCut-CGAC

0.79 0.85

CV WNCut-CGAC

0.65 0.74

Prostate biopsy (19) Sensitivity (SN) Specificity (SP) 0.52 0.95 0.72 0.98 Prostate whole mounts (17) 0.92 0.95 0.92 0.99 Breast biopsy (24) 0.82 0.92 0.86 0.95

Positive Predictive Value (PPV) 0.64 0.80 0.82 0.92 0.69 0.80

Table 2. Quantitative evaluation of segmentation results between the CV and our WNCut-CGAC models.

5. CONCLUDING REMARKS In this paper we presented a novel scheme to initialize a geodesic active contour model, allowing the GAC model to rapidly and accurately segment specific objects of interest on very large color images. The use of the color

Proc. of SPIE Vol. 7623 76230Y-9 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

(c)

fr

W"

/o.

,

4'.

t 'v

J,f.4'., .;.#.

f

I'

,i:..''

t: J,

t?', 4x

!

a.

:

(b)

_tc.;

(a)

(d)

(e)

(f)

Figure 5. (a) Cropped ROI from prostate whole mount image which includes multiple glands; (b)the edges detected by color gradient based edge-detector function; (c) initial segmentation results by WNCut scheme; (d) initial contour φ0 (c) for the evolution of the PDE obtained from WNCut find segmentation results from the (e) CV, and (f) WNCut-CGAC models respectively.

gradient based edge-detector function for the GAC model allows for more prominent boundaries compared to the traditional gray scale gradient. A quantitative and qualitative comparison between the WNCut-CGAC and popular Chan & Vese model for the task of gland segmentation across 60 images of prostate and breast cancer histology revealed that the WNCut-CGAC model easily outperformed the Chan & Vese model. While our preliminary results appeal to be extremely promising, there is containing considerable room for improvement. One limitation of this study stems from the choice of a boundary based active contour model. The model still depends on the edge-detector function, which may make it difficult to segment structures with weak boundaries. WNCut initialization scheme is powerful and involves minimal human intervention. However, variations in color, stain, and illumination mean that a user defined swatch on one image may not be work accurately for the same object of interest on another image. Future work will center on (a) replacing the GAC model employed in this work with a hybrid boundary- and region-based model, (b) evaluation of the scheme on large datasets from different labs and clinics, and (c) continued evaluation of the model on very large images.

Acknowledgements Funding for this work was made possible via grants from the Wallace H. Coulter Foundation, New Jersey Commission on Cancer Research, National Cancer Institute (R01CA136535-01, R21CA127186-01, R03CA12808101), the Cancer Institute of New Jersey, and Bioimagene Inc.

Proc. of SPIE Vol. 7623 76230Y-10 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

(a)

(b)

fj

(d)

(e)

(f)

p-

(c)

Figure 6. (a) Cropped ROI from breast histology image which includes multiple glands; (b)the edges detected by color gradient based edge-detector function; (c) initial segmentation results by WNCut scheme; (d) initial contour φ0 (c) for the evolution of the PDE obtained from WNCut segmentation results from the (e) CV, and (f) WNCut-CGAC models respectively.

REFERENCES 1. A. Blake and M. Isard, Active contours: the application of techniques from graphics, vision, control theory and statistics to visual tracking of shapes in motion, Springer, 2000. 2. M. Gurcan, L. Boucheron, A. Can, A. Madabhushi, N. Rajpoot, and B. Yener, “Histopathological image analysis: A review,” Biomedical Engineering, IEEE Reviews in 2, pp. 147–171, 2009. 3. A. Madabhushi, “Digital pathology image analysis: opportunities and challenges,” Imaging in Medicine 1(1), pp. 7–10, 2009. 4. R. Montironi, R. Mazzuccheli, M. Scarpelli, A. Lopez-Beltran, G. Fellegara, and F. Algaba, “Gleason grading of prostate cancer in needle biopsies or radical prostatectomy specimens: contemporary approach, current clinical significance and sources of pathology discrepancies,” British Journal of Urology International 95(8), pp. 1146–1152, 2005. 5. J. Meyer and C. Alvarez, “Breast carcinoma malignancy grading by bloom-richardson system vs proliferation index: reproducibility of grade and advantages of proliferation index,” Modern Pathology 18(8), pp. 1067– 1078, 2005. 6. V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” International Journal of Computer Vision 22(1), pp. 61–79, 1997. 7. L. D. Cohen and R. Kimmel, “Global minimum for active contour models: A minimal path approach,” Int. J. Comput. Vision 24(1), pp. 57–78, 1997. 8. A. Janowczyk, S. Chandran, R. Singh, D. Sasaroli, G. Coukos, M. D. Feldman, and A. Madabhushi, “Hierarchical normalized cuts: Unsupervised segmentation of vascular biomarkers from ovarian cancer tissue microarrays,” in MICCAI, pp. 230–238, 2009. 9. T. F. Chan and L. A. Vese, “Active contours without edges,” Image Processing, IEEE Trans. on 10(2), pp. 266–277, 2001. 10. L. D. Cohen, “On active contour models and balloons,” CVGIP: Image Underst. 53(2), pp. 211–218, 1991. 11. R. Malladi, J. A. Sethian, and B. C. Vemuri, “Shape modeling with front propagation: a level set approach,” Pattern Analysis and Machine Intelligence, IEEE Transactions on 17(2), pp. 158–175, 1995. 12. Z. Song Chun and A. Yuille, “Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on 18(9), pp. 884–900, 1996.

Proc. of SPIE Vol. 7623 76230Y-11 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms

13. D. Mumford and J. Shah, “Optimal approximations by piecewise smooth functions and associated variational problems,” Communications on Pure and Applied Mathematics 42(5), pp. 577–685, 1989. 14. T. F. Chan, B. Y. Sandberg, and L. A. Vese, “Active contours without edges for vector-valued images,” Journal of Visual Communication and Image Representation 11(2), pp. 130–141, 2000. 15. J. Yezzi, A., A. Tsai, and A. Willsky, “A statistical approach to snakes for bimodal and trimodal imagery,” in Proc. Seventh IEEE International Conference on Computer Vision The, 2, pp. 898–903 vol.2, 1999. 16. M. Rousson and R. Deriche, “A variational framework for active and adaptative segmentation of vector valued images,” in Motion and Video Computing, 2002. Proceedings. Workshop on, pp. 56–61, 2002. 17. H. Fatakdawala, J. Xu, A. Basavanhally, G. Bhanot, S. Ganesan, M. Feldman, J. E. Tomaszewski, and A. Madabhushi, “Expectation maximization driven geodesic active contour with overlap resolution (emagacor): application to lymphocyte segmentation on breast cancer histopathology,” Biomedical Engineering, IEEE Transactions on , In Press. 18. S. Naik, S. Doyle, S. Agner, A. Madabhushi, M. Feldman, and J. Tomaszewski, “Automated gland and nuclei segmentation for grading of prostate and breast cancer histopathology,” in Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE International Symposium on, pp. 284–287, May 2008. 19. S. Agner, J. Xu, H. Fatakdawala, S. Ganesan, A. Madabhushi, S. Englander, M. Rosen, K. Thomas, M. Schnall, M. Feldman, and J. Tomaszewski, “Segmentation and classification of triple negative breast cancers using dce-mri,” in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, pp. 1227–1230, 28 2009-July 1 2009. 20. D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24, pp. 603–619, 2002. 21. S. Jianbo and J. Malik, “Normalized cuts and image segmentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on 22(8), pp. 888–905, 2000. 22. Z. Tu, “Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering,” in ICCV ’05: Proceedings of the Tenth IEEE International Conference on Computer Vision, pp. 1589–1596, IEEE Computer Society, (Washington, DC, USA), 2005. 23. Z. Wu and R. Leahy, “An optimal graph theoretic approach to data clustering: theory and its application to image segmentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on 15(11), pp. 1101– 1113–, 1993. 24. S. Kichenassamy, A. Kumar, P. Olver, A. Tannenbaum, and A. Yezzi, “Gradient flows and geometric active contour models,” pp. 810–815, 1995. 25. H.-K. Zhao, T. Chan, B. Merriman, and S. Osher, “A variational level set approach to multiphase motion,” J. Comput. Phys. 127(1), pp. 179–195, 1996. 26. L. A. Vese and T. F. Chan, “A multiphase level set framework for image segmentation using the mumford and shah model,” International Journal of Computer Vision 50(3), pp. 271–293, 2002. 27. I. M. Gelfand and S. V. Fomin, Calculus of variations, Dover Publications, INC., 2000. 28. G. Sapiro, “Color snakes,” Computer Vision and Image Understanding 68(2), pp. 247–253, 1997. 29. L. Yang, P. Meer, and D. J. Foran, “Unsupervised segmentation based on robust estimation and color active contour models,” Information Technology in Biomedicine, IEEE Transactions on 9(3), pp. 475–486, 2005. 30. C. Aldo, “Edge detection in multispectral images,” pp. 40–51, Academic Press, Inc., 1991. 31. S. Di Zenzo, “A note on the gradient of a multi-image,” 33(1), pp. 116–125, Academic Press Professional, Inc., San Diego, CA, USA, 1986.

Proc. of SPIE Vol. 7623 76230Y-12 Downloaded from SPIE Digital Library on 12 Apr 2010 to 204.52.215.1. Terms of Use: http://spiedl.org/terms