Multi-Shape - Hierarchical Active Shape Models

Multi-Shape - Hierarchical Active Shape Models Juan J. Cerrolaza, Arantxa Villanueva, and Rafael Cabeza Department of Electrical and Electronic Engineering, Public University of Navarra, Pamplona, Navarra, Spain Abstract— Active Shape Models (ASMs) have become one of the most widespread segmentation paradigms since their inception in the early nineties. However, their capability to capture and model the shapes’ variability is highly conditioned by the training set used. Trying to overcome this limitation, this paper presents a new hierarchical formulation of classical ASMs. Using the wavelet transform, a new complete tree wavelet packet is used to decompose the shape into small pieces of information, which are easier to model even with a small number of training shapes. Unlike previous hierarchical approaches, this new decomposition scheme and the matrix notation introduced allow the new hierarchical segmentation algorithm to deal with complex multi-shape structures in both, 2D and 3D spaces, maintaining the versatility of classical ASMs. The advantages of the new segmentation algorithm in terms of both accuracy and robustness with the number of training shapes have been successfully tested with two completely different databases containing multi-shape structures. Keywords: Active shape models, image segmentation, wavelet transform, hierarchical decomposition

1. Introduction Deformable shape models appeared in the late eighties as an alternative to the traditional bottom-up segmentation approaches. The use of low-level operations, such as edge detection or region growing, have proven inadequate in the presence of noise, occlusion, or when faced with the anatomical complexity and variability inherent to biological structures. Trying to overcome some of these limitations and those exhibited by the first freely-deformable models [1], the Active Shape Models (ASMs) proposed by Cootes et al. [2] have became one of the most popular segmentation paradigms since its inception in the early nineties to the present day. Roughly speaking, the ASMs consist of estimating the population statistics from a set of examples via Point Distribution Models (PDMs) and learning the particular patterns of variability of the structure of interest. If the statistical model is properly constructed, these patterns allow the definition of a subspace of “allowed shapes”, which prevents the appearance of incoherent results while allowing the modeling of the finer details of the shapes. The great versatility and relative simplicity of the algorithm have undoubtedly contributed to the wide acceptance of the method, the usefulness of which has been successfully demonstrated in a great variety of applications, including

object tracking [3], and medical imaging [4]. The emergence of a large number of variants based on the original ASM framework, such as 3D-ASM [5] or spatio-temporal ASM [6], is also illustrative of the great popularity of the method in the research community. Another valuable property of ASMs is their innate capability to deal with complex multi-shape structures, that is, shapes formed by a set of simple objects or sub-shapes. This type of structure is especially relevant in the field of medical imaging [7]. Despite the advances in the different medical imaging modalities (X-ray, MR, etc.), the absence of obvious boundary information makes it difficult to accurately segment the shape of interest. In such cases, the interobject relationship between adjacent shapes can contribute significantly to the improvement of the accuracy. Despite the advantages of learning the specific patterns of variability of the shape of interest, the need for a certain number of previously marked examples is an important limitation, especially when dealing with high-dimensional probability problems. That is, the capability of the model to adequately capture the variations of the shape directly depends on the quality of the training set provided. If this training set does not provide a sufficient number of samples, the model built will not be flexible enough to capture the full variability of the input shapes. However, the availability of an adequate training set is not always possible, either because of the difficulty of obtaining images or the high temporal and human cost of marking the images, typically manually. Aware of this problem, Davatzikos et al. [8] proposed a hierarchical decomposition of the shape into small pieces of information via wavelet transform. Modeling each of these pieces separately, the authors reduce the dimensionality of the problem and thus, the number of examples required. Although the hierarchical model proposed seems to provide satisfactory results, its application is restricted to 2D single-shape structures, that is, planar structures formed by a single contour. This important limitation directly undermines the usefulness of the method in multi-shape and 3D environments where the large number of training shapes required becomes particularly critical due to the high variability presented by these complex structures. A few years ago, Nain et al. [9] presented a specific threedimensional version using spherical wavelets, which also limited to single-shape cases. The goal of this paper is to present a new general framework able to deal with any type of structure, regardless of the dimensionality of the problem (2D or 3D) or the number

of shapes that compose the object to segment. First, an improved general matrix notation is introduced, extending the wavelet filtering to the general multi-shape d-dimensional case. Using this notation and a new decomposition scheme as alternative to that proposed by Davatzikos et al. [8], a novel multi-shape hierarchical formulation of ASM (MS-HASM) is presented. The experimental tests performed demonstrate that this new segmentation approach systematically improves the results provided by classical ASM, even when the number of training samples decreases. The remainder of this paper is organized as follows. Section 2 provides a general overview of classical ASM and the construction of the statistical models of shape and appearance. Section 3 introduces the new general matrix framework that extends the wavelet decomposition to the general case of multi-shape structures. The new MS-HASM is described in Section 4. The experimental set-up used to evaluate the new segmentation approach is presented in Section 5. Finally, Section 6 summarizes and concludes the paper.

t X j=1

2. Active Shape Models Since its presentation in the early nineties, a large number of variants of the ASMs originally conceived by Cootes et al. [2] have been proposed. Far from intending to present a complete description of this algorithm and all of its variants, this section describes the basic concepts of the algorithm required to fully understand the remainder of the work. For a more detailed description of the algorithm, the reader is advised to consult the extensive research literature regarding ASM. Basically, the ASM algorithm can be described as an iterative process in which two statistical models, built from the training set, are sequentially applied to drive the segmentation process. A statistical appearance model guides the matching process of the shape to a new image, whereas a statistical shape model applies global shape constraints to guarantee that only plausible instances occur.

2.1 Statistical Shape Model Construction In the context of ASMs, one of the simplest and most generic shape-representation methods is used: a set of points distributed across the surface or contour. In this way, each shape of the training set is represented by a fixed number of K points, referred to as landmarks. The careful placement of these points is of crucial importance because they must describe specific parts of the shape. That is, each of the landmarks establishes a point of correspondence between the shapes, allowing the modeling of the shape variations via PDM as follows. Suppose each shape of the training set is annotated by K d-dimensional landmarks (d = 2 or 3). Concatenating the coordinates of each point, the i-th training shape can be expressed in vectorial form as xi = (x1,1,i , ..., xd,1,i , ..., x1,K,i , ..., xd,K,i )T

where i = 1, ..., N , with N being the number of examples in the training set. Using the generalized Procrustes alignment, the training samples are aligned in a common coordinate frame to eliminate variations of the shape due to similarity transforms, i.e., translation, rotation and scaling. The statistical shape model is built by applying Principal Component Analysis (PCA) to the aligned shapes, obtaining the mean shape x, the covariance matrix S, and the eigensystem of S. The dimensionality of the problem is reduced from dK to t, retaining the t eigenvectors corresponding to the t largest eigenvalues. Typically, the number of modes to retain is chosen as a proportion fv (e.g., 95 − 98%) of the total variance exhibited in the training set we wish the model to explain. For the common case where N − 1 ≤ dK, the number of non-zero eigenvalues is N − 1, the sum of which defines the total variance of the training set. Thus, assuming the eigenvalues, λj , are sorted in decreasing order, t can be easily obtained as the minimum integer that satisfies

(1)

λj ≥ fv

N −1 X

λj

(2)

j=1

Concatenating the corresponding t eigenvectors in a (dK ×t) matrix, P = (p1 | p2 | ... | pt ), it is possible to approximate any instance of the shape space by the linear equation x = x + Pb

(3)

where b = (b1 , b2 , ..., bt )T is a t dimensional vector holding the shape parameters of the model, the value of which can be easily computed for a given new shape of the class we are modeling, x, as b = PT (x − x)

(4)

Constraining the values for each component of b, a subspace of allowed shapes is defined, guaranteeing that only plausible shapes are generated by equation (3). One of the simplest methods is to assume independence among each component of b, constraining it to lie inside of a hyperrectangle. In this way, a hard limit is applied to each element p | bj |≤ β λj (5)

where β is a constraint that determines the flexibility of the model, typically defined between 1 and 3. It is interesting to note that no assumption about the connectivity among landmarks has been made when constructing the statistical shape model, being independent of the number of elements that compose the shape. This generality is one of the main advantages of the model, which is one of the most popular approaches to dealing with multishapes cases.

2.2 Statistical Appearance Model During the matching process of a new image, each landmark looks for its optimal location according to a particular

statistical appearance model built from the training set. Typically, this appearance model is based on the normalized first derivative of fixed-size gray profiles, normal to the boundary of the object and centered at each landmark. Assuming that these gray profiles come from a multivariate Gaussian distribution, the optimal location for each landmark is that where the appearance information minimizes the Mahalanobis distance to the mean profile learned from the training set. This image-driven update of landmarks is alternated with a shape adjustment step, using (4) and (5) to calculate and restrict b, respectively, creating an iterative segmentation process.

3. Wavelet Decomposition of MultiShape Structures The origin of wavelet theory can be traced back to the early 20th century, although the main contributions to its development are relatively recent [10]. The potential utility of this mathematical tool has been successfully proved in a variety of disciplines, such as image processing [11] and biomedical engineering [12]. Basically, wavelet transform is a mathematical technique that allows the multi-resolution analysis of data or functions. Unlike the Fourier transform, which provides frequency information only, the coefficients obtained during the wavelet analysis contain information from both domains, frequency and spatial location. Of special importance for our purposes, and in the field of computer graphics in general, is the matrix approach of Finkelstein and Salesin [13]. This notation is particularly convenient in the context of ASM, unlike that originally proposed by Davatzikos et al. [8], where the wavelet analysis is restricted to single-shape 2-D curves. As is shown below, a simple extension of the matrix notation allows us to easily address the wavelet filtering of any multi-shape structure in both 2D and 3D. Suppose c0 represents a general discrete unidimensional signal with K0 samples, the vectorial expression of which is (c00 , c01 , . . . , c0K0 )T . The wavelet filtering of the signal can be expressed by the following analysis equations, c

1

d1

1 0

= A c = = B 1 c0 =

(c11 , c12 , . . . , c1K1 )T (d11 , d12 , . . . , d1K0 −K1 )T

(6) (7)

where the matrices A1 and B1 are called the analysis filters. The resultant vector in equation (6), c1 , can be considered as a lower resolution version of the original signal, the filtering and downsampling (K0 > K1 ) of which is implemented by A1 . On the other hand, d1 captures the loss of information between c0 and c1 by B1 . From a frequency point of view, c1 contains the low-frequency information between samples of the signal, storing the finer (high-frequency) details in d1 . If this pair of filters, A1 and B1 , is properly selected, no information of the original signal is lost during the process,

it being possible to recover c0 by the following synthesis equation, c0 = F1 c1 + G1 d1 (8) where F1 and G1 are the synthesis filters. Because no assumption has been made about the original signal, the above filtering process can be iteratively applied over any of the new signals, decomposing c0 into smaller pieces of information. One of the most typical decomposition schemes is the logarithmic tree 2-band wavelet packet, in which only the low-pass branch is filtered cr dr

= =

Ar cr−1 Br cr−1

(9) (10)

cr−1

=

Fr cr + Gr dr

(11)

Both the synthesis and analysis matrices are defined by the chosen wavelet basis, and are related by the equation r A −1 = (Fr | Gr ) (12) Br The choice of the wavelet basis is conditioned by the particular context of use. A set of synthesis filters for Haar and B-spline wavelets, of special interest in the field of computer graphics, is provided by Stollnitz et al. [14]. Suppose that x0 corresponds to the vectorial expression of a single-shape structure in a general d-dimensional space, like that presented in (1). That is, the concatenation of the landmark coordinates of a figure composed by a single contour is expressed as x0 = (x01,1 , ..., x0d,1 , ..., x01,K0 , ..., x0d,K0 )T

(13)

where K0 is the number of landmarks used to describe it. Considering each coordinate as an independent unidimensional signal, the wavelet analysis framework presented above can be extended to adapt it to the PDM matrix b 1 , that notation. Thus, the (dK1 × dK0 ) analysis matrix, A 0 1 provides a lower resolution version of x , x , can be obtained as  a1  0 0 ... a1 1,1

 b1 =  A  

0

a11,1

1,K0

...

0

a11,K0

a1K1 ,K0 0

0

..

a1K1 ,1 0

0 a1K1 ,1

. ... ...

a1K1 ,K0

   

(14)

where a1i,k ’s are the elements of A1 . Following this scheme, br, B br , F br and G b r , can be the general set of matrices, A obtained by re-writing the original set for the unidimensional signal case, Ar , Br , Fr and Gr . Thanks to the convenience of addressing the wavelet analysis from a matrix point of view when dealing with geometrical forms, this simple extension of the formulation allows us to address the wavelet filtering of a d-dimensional single-shape structure. However, one last generalization is still necessary in order to work with the general case of multi-shape forms. Suppose now this

x

(a)

0

x

(b)

(d·K0)

A A A

2

3 3

3 F G

(d·K0 /8)

q1

2 F G 3 B

1

1

1 1

1

2

1 2

(16, 32, 8)

(4, 8, 2) (8, 16, 4)

q1

q2

?

?

3

Fig. 1: Wavelet decomposition of a shape using a 3-level logarithmic wavelet packet, proposed by Davatzikos et al. [8]. (a) Decomposition of a single-shape structure into 8 equalsized bands of information. (b) This logarithmic scheme is invalid when dealing with multi-shape structures.

multi-shape structure is formed by M different single-shape parts, whose vectorial form, x0 , can be expressed as the concatenation of M vectors, x01 , x02 , ..., x0M =

(x01 ; . . . ; x0M )

(15)

(x01,1,1 , . . . , x0d,1,1 , . . . , x01,K 1 ,1 , . . . , x0d,K 1 ,1 , . . . 0

0

x01,1,M , . . . , x0d,1,M , . . . , x01,K M ,M , . . . , x0d,K M ,M )T 0

0

K01 , ..., K0M

where are the number of landmarks of each subshape. The corresponding analysis and synthesis matrices can be redefined as  r  b A 0 ··· 0 1   .. Ar =  (16)  . 0

0 ···

br A M

b r , ..., A b r are the analysis matrices of each singlewhere A 1 M shape structure, and 0 are matrices of zeros. From this expression, it is possible to define B r , F r and G r as the br, F b r and G b r , allowing us to apply multi-shape extension of B any wavelet packet to a multi-shape structure.

4. Hierarchical ASM The idea behind the Hierarchical ASM approach (HASM), originally proposed by Davatzikos et al. [8], is to divide the shape into a set of equal-sized bands of information obtained by wavelet transform. Using an L-level logarithmic tree 2band wavelet packet, this filtering process is applied over the whole normalized training set, modeling each of the 2L bands separately via PCA, to characterize its variability. Fig.1(a) illustrates a 3-level band-division process for a single-shape structure, x0 , described with K0 d-dimensional landmarks. As equations (6) and (7) indicate, the size of the filtered signals depends on the wavelet basis selected, although for most cases, the relation between Kj and Kj−1 is close to 2. Because only the low-pass branches are filtered, the outputs of the high-pass branches must be split into bands with dKL (dK0 /23 for the particular case depicted) elements.

q

1 (4, 8, 2)

2

2 3

3

3 3

2 (4, 8, 2)

2 2

3

q

1

2 2

3

x0 =

1

2

3

(4, 8, 2)

q 5 q6 q 7 q 8

1

3 3

(d·K0 /2)

0

(32, 64, 16)

1

2 2

q4

x

1 1

2 3

(d·K0 /8) (d·K0 /4)

q2 q3

B

G

F 2 B

0

(32, 64, 16)

q

3 (4, 8, 2)

3

3

3 3

q

4 (4, 8, 2)

q

5 (4, 8, 2)

2 3

3

3 3

q

6 (4, 8, 2)

q

7 (4, 8, 2)

3

q

8 (4, 8, 2)

Fig. 2: New complete tree wavelet packet that allows the hierarchical decomposition of any complex multi-shape structure. In this example, a 3-level scheme is used to decompose a generic shape, x0 , into 8 equal-sized bands of information.

The target of this process, in which the shapes are divided into smaller pieces of information or bands, is to reduce the dimensionality of the data we have to model, and thus the size of the training set required. The lower number of elements of these bands allows us to capture the variability exhibited by the shape of interest, even when only a reduced number of training examples is available. However, it is worth noting that the division process depicted above is restricted to only single-shape structures, and its application in a multi-shape environment raises serious difficulties. One of the main drawbacks is the band-splitting process of the high-pass branches, as Fig.1(b) illustrates with a particular example. Suppose now that x0 represents the vectorial form of a certain multi-shape structure (see (15)) in 2 − D, composed by three different planar contours with K01 = 16, K02 = 32 and K03 = 8 2D landmarks. That is, x0 is composed by a total of 112 landmarks. The problem arises at the end of the 3-level logarithmic wavelet filtering process, when the information contained in the high-pass branches must be split into bands of 14 elements, where the size of the bands at the end of the low-pass branches are q1 and q2 . As can be deduced from Fig.1(b), the bands resulting from this process will contain mixed information of different sub-shapes. However, important issues such as the criteria for mixing this information or its effect in future statistical models and thus in the final segmentation accuracy remain unsolved. Because wavelet analysis allows any dyadic tree structure, depending on which branches are filtered, we propose the use of a complete packet tree to work with multi-shape structures, as an alternative to the original logarithmic filtering scheme. In this approach, both branches, low-pass and highpass, resulting from the wavelet analysis are filtered again and are decomposed into other two branches, as shown in Fig.2. On this occasion, the previous vectorial expression, x0 , is filtered through a 3-level complete tree scheme, the result of which is the decomposition of the signal into eight equal-

sized bands,{q1 , . . . , q8 }. Each of these bands contains not only the same number of elements but also information from the three sub-shapes that compose the original x0 . With this new decomposition scheme, which provides an elegant alternative to the wavelet analysis of multi-shape structures, the statistical model of the shape is built as follows. Suppose now that the structure to model is composed by M single-shape contours and that x0i corresponds to the vectorial expression of the i-th training example

Algorithm 1: MS-HASM (Statistical Shape Model Adjustment) Input: y, shape resulting from landmarks updating 1 Normalize y, aligning it with the mean shape of the normalized training set; 2 Wavelet filtering of the normalized y, using the L-levels complete tree wavelet packet → {q1 , . . . , q2L }; 3 for j = 1 to 2L do ˜j ; 4 Adjust and truncate bj = PTj qj − qj → b 0 xi = (x0i,1 ; . . . ; x0i,M ) (17) 5 ˜ ˜ Built the corrected band qj = qj + Pj bj ; = (x0i,1,1,1 , . . . , x0i,d,1,1 , . . . , x0i,1,K 1 ,1 , . . . , x0i,d,K 1 ,1 , . . . 6 end 0 0 7 Rebuilt the new shape ˜ y using the corresponding x0i,1,1,M , . . . , x0i,d,1,M , . . . , x0i,1,K M ,M , . . . , x0d,K M ,M )T 0 0 synthesis equation; where, as in (15), K01 , . . . , K0M are the number of ddimensional landmarks of each sub-shape, and i = 1, . . . , N , with a total number of N training examples. The superscript 0 is written to be consistent with the previous wavelet notation, indicating that the information corresponds to the original upper-resolution signal. Filtering each training example, previously normalized, through an L-level complete tree wavelet packet, the original vectorial expression is transformed into a set of 2L bands, x0i → qi,1 , . . . , qi,2L . The most commonly used wavelet basis provides a decomposition of the signal into two equal-sized parts, and thus, the size of each band can be approximated as K0 /2L = K01 + . . . + K0M /2L . As in Section 2, it is possible to use PCA to compute the eigenvectors and eigenvalues of each band, creating a specific PDM, qj = qj + Pj bj , where j = 1, . . . , 2L . As in Section 2.1, qj corresponds to the average j-th band obtained from the training set, Pj is a (dK0 /2L ×tj ) matrix composed by the tj main eigenvectors, and bj is the tj -dimensional vector holding the parameters of the band model, the components of which are constrained by the corresponding eigenvalues, as in (5). It is worth emphasizing that the new band-division scheme only affects the statistical shape model of the segmentation algorithm and not the appearance model (see Section 2.2). In this way, the same iterative segmentation scheme, characteristic of the classical ASM, is used. During this process, the optimal location in the new image to segment is searched for each landmark, according to the previously built appearance model. The shape resulting from this landmarkupdating process is adjusted and corrected by the statistical shape model, as Algorithm 1 details. The sequence of these two processes, the landmark updating followed by the shape adjustment method described in Algorithm 1, creates an iterative scheme that continues until convergence (no variation of the shape is observed after updating landmarks) or a maximum number of iterations is reached. Notice that the term shape is used here in a general sense, referring to both multi-shape and single-shape structures.

5. Results and Discussion To characterize and quantify the potential advantages provided by the new multi-shape segmentation scheme presented here, MS-HASM, a set of experimental tests has been designed. In these tests, the experimental results obtained with the new algorithm are compared to those provided by classical ASM, one of the most popular multi-shape segmentation algorithms. Both algorithms are evaluated in terms of accuracy and robustness vs. size of training set. Although the framework developed is completely general, the tests presented in this article are restricted to the 2D case, due to the availability of images to work with. In particular two different databases are used during the study, the JSRT database [4], [15] and a proprietary database of hand images. The images extracted from the JSRT database are 247 thoracic radiographs, 154 corresponding to cases with pulmonary nodules and 93 healthy cases, with a resolution of 512 × 512 pixels. The two lungs have been marked using 128-landmarks for each one. The hands database contains pairs of 5 different hand poses from a total of 20 subjects, with the same ratio of male and female cases. The contour of each hand has been described using 513 landmarks in images of 960 × 640 pixels. Both databases have been divided into two sets, the training and the test set, containing the same number of images and maintaining the original proportion of healthy and pathological cases for the JSRT database, and of male and female cases for the hands database. The training set is used to build the statistical models of shape and appearance, while the other set is used to evaluate the accuracy of the segmentation algorithms. In turn, the sets of training shapes have been arranged in sets of different sizes to study the robustness of the algorithms with the number of examples. To minimize the potential bias effect caused by a particular selection of training shapes, different subsets are considered when building the statistical models for a given size of the training set. The average behavior of these subsets allows more reliable information about the real effect that

(a)

(b)

Table 1: Avg. Point-to-Curve Segmentation Error (pixels) JSRT Database # Trn. Shapes MS-HASM 123 3.94 100 3.79 80 3.87 60 4.10 40 3.99 30 3.86 20 4.14 15 4.16

ASM 3.80 3.84 3.92 3.93 4.36 4.14 4.70 4.78

Hands Database # Trn. Shapes MS-HASM 50 3.59 40 3.85 35 3.40 30 4.12 25 3.95 20 4.37 15 4.08 10 5.64

MS-HASM

ASM 4.40 4.34 3.85 4.05 4.38 5.14 5.49 7.46

ASM

5

JSRT Database

Avg. Error (pixels)

4.8 4.6

Target Shape ASM MS-HASM

4.4 4.2 4 3.8

15 20

30

40

60

80

100

123

Number of Training Shapes Hands Database

Avg. Error (pixels)

7.5 7 6.5 6

Fig. 4: Segmentation example of the JSRT database when using 15 training examples. (a) Comparison between ASM and MS-HASM. (b) Detail of the left lung that allows the appreciation of how MS-HASM is able to better capture the variability of the shapes, successfully delineating the target contour.

5.5 5 4.5

(a)

(b)

4 3.5 10

15

20

25

30

35

40

50

Number of Training Shapes

Fig. 3: Graphical representation of the average accuracy for the two multi-shape segmentation algorithms tested, ASM and HASM, when considering different sizes of training sets. ASM MS-HASM

the number of training shapes has over the accuracy of the algorithm to be obtained. For the JSRT database, 8 different sizes for the training set have been tested, namely, 123, 100, 80, 60, 40, 30, 20 and 15, with 123 being the total number of training shapes available. The number of subsets built for each case is 1, 1, 2, 2, 3, 4, 4 and 5, respectively. In the same way, 9 different sizes have been considered for the hands database, 50, 40, 35, 30, 25, 20, 15 and 10, with 1, 1, 2, 2, 2, 3, 4 and 5 different subsets built for each one, respectively. The statistical shape models of both algorithms, ASM and MS-HASM, have been built to explain the 95% variability observed in the training sets (see eq.(2)). The flexibility of the models, defined by the margin of variation of each element of b in equations (3) and (??), is controlled by the parameter β (see eq.(5)), with a value of β = 2 used in our study. The wavelet basis used in the MS-HASM algorithm is the linear B-spline wavelet basis due to its close linkage to the typical join-the-dots approach used to describe shapes in the context of PDM. Although there are many other possible bases, pilot experiments have demonstrated the good behavior of the linear B-spline one basis without a need to use the other basis with higher numbers of vanishing moments like Daubechies-7. The number of levels of the

Fig. 5: Segmentation example of the Hands database when using 10 training examples. (a) Comparison between ASM and MS-HASM. (b) Detail of the left hand where it is possible to observe the difficulties of ASM in accurately segmenting the details of the image.

complete tree wavelet packet has been set to 6 for the JSRT database and 8 for the hands database. The appearance models have been built according to the original model described in Section 2.2, defined by the mean and covariance matrix of gray-level profiles of fixed length, normal to the boundary and centered at each landmark. The length of these profiles has been set to 11 pixels, 5 to each side of the landmark, defining a search space of 19 pixels (4 to each side) during the landmarks updating process. Finally, a previous initialization process must be incorporated to provide a valid first estimation that allows a good evolution of the segmentation algorithms. Although more sophisticated alternatives are possible, a very simple initialization approach is used in this study. For each shape in the corresponding training set, the Euclidean transformation that minimizes its distance to the mean shape of the statistical

shape model is estimated. The initialization applied to the mean shape at the beginning of the algorithm is obtained by averaging the inverses of the resultant set of transformation. To provide a better robustness to the initialization process, a multi-resolution approach with a 5-levels Gaussian image pyramid has been incorporated. Table 1 shows the accuracy information of the two algorithms tested, the novel MS-HASM and the traditional ASM, for the different sizes of the training set tested. The segmentation error has been evaluated in terms of the symmetric point-to-curve error. Given two contours, x1 and x2 , corresponding to the target shape and the segmentation provided by the algorithm respectively, the mean point-tocontour distance from x1 to x2 is computed. This result is averaged with the mean point-to-contour distance from x2 to x1 , obtaining a symmetric measurement of the error. Fig.3 presents a graphical comparison of both algorithms, MSHASM and ASM. The similar behavior that the two algorithms present when the training set contains a considerable number of training examples can be appreciated. However, as the size of the training set decreases, the error obtained with the classical ASM grows significantly, while the MS-HASM presents a greater stability. That is, even with a limited number of training shapes, the new hierarchical approach is able to optimally capture the variability of the structure of interest, as Figs. 4 and 5 illustrate. These figures show how MS-HASM successfully delineates the target contours when only a few training examples are used, unlike ASM, which is unable to accurately capture important details of the images. Although this improvement in accuracy can be appreciated in both databases, the JSRT database of chest radiographs and the Hands database, it is particularly significant in the latter, where the difference between MS-HASM and ASM is up to 2 pixels. The improvement that the new MS-HASM provides over ASM has been statistically verified by means of a paired t-test with the typical alpha value of 5%. The different sizes of the training set have been considered including the Bonferroni correction into the statistical tests.

6. Summary and Conclusions In this work, we present a new segmentation algorithm called Multi-Shape - Hierarchical ASM (MS-HASM). Inspired by the original proposal of Davatzikos et al. [8], this new MS-HASM also tries to overcome one of the main limitations of the widespread ASMs: the strong dependency on the number of training shapes is. Using a novel Llevel complete tree wavelet packet, the original shape is divided into a set of equal-sized bands of information, which allows the better capture of the variability of the structure of interest, even with a few training examples. The new matrix notation introduced here, in addition to the new decomposition structure presented here, allows the creation of a completely general algorithm. Unlike previous approaches, which are restricted to single-shape planar

cases, MS-HASM is able to deal with complex multi-shape structures in a general d-dimensional space. That is, the hierarchical decomposition scheme proposed by MS-HASM allows the division of any multi-shape structure into smaller bands of information, while the composition information is preserved. The advantages of the new algorithm have been tested using two different databases, the JSRT database of chest radiographs, and a proprietary database containing pair of hands in different poses. The accuracy of both multi-shape segmentation algorithms, the traditional ASM and the new MS-HASM, has been studied using a careful experimental set-up, which allows the appreciation of their robustness for different sizes of the training set. The results obtained demonstrate how, unlike ASM, the new algorithm successfully delineates the target contour even when only a few training examples are used.

References [1] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 1, no. 4, pp. 321–331, 1988. [2] C. J. Taylor, D. H. Cooper, and J. Graham, “Training models of shape from sets of examples,” in Proc. British Machine Vision Conf., 1992, pp. 9–18. [3] S.-W. Lee, J. Kang, J. Shin, and J. Paik, “Hierarchical active shape model with motion prediction for real-time tracking of non-rigid objects,” IET Computer Vision, vol. 1, no. 1, pp. 17–24, mar. 2007. [4] B. van Ginneken, M. B. Stegmann, and M. Loog, “Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database,” Med. Image Anal., vol. 10, no. 1, pp. 19–40, FEB 2006. [5] R. H. Davies, C. J. Twining, T. F. Cootes, and C. J. Taylor, “Building 3-D statistical shape models by direct optimization,” IEEE Trans. Med. Imag., vol. 29, no. 4, pp. 961–981, apr. 2010. [6] G. Hamarneh and T. Gustavsson, “Deformable spatio-temporal shape models: extending active shape models to 2D+time,” Image Vis. Computing, vol. 22, no. 6, pp. 461–470, 2004. [7] M. de Bruijne and M. Nielsen, “Multi-object segmentation using shape particles,” in IPMI, 2005, pp. 762–773. [8] C. Davatzikos, X. Tao, and D. Shen, “Hierarchical active shape models, using the wavelet transform,” IEEE Trans. Med. Imag., vol. 22, no. 3, pp. 414–423, march 2003. [9] D. Nain, S. Haker, A. Bobick, and A. Tannenbaum, “Multiscale 3d shape representation and segmentation using spherical wavelets,” Medical Imaging, IEEE Transactions on, vol. 26, no. 4, pp. 598–618, 2007. [10] S. Mallat, “Multiresolution representation and wavelets,” Ph.D. dissertation, University of Pennsylvania, Philadelphia, 1988. [11] P.-L. Shui, Z.-F. Zhou, and J.-X. Li, “Image denoising algorithm via best wavelet packet base using wiener cost function,” Image Processing, IET, vol. 1, no. 3, pp. 311–318, 2007. [12] M. Akay, “Wavelet applications in medicine,” Spectrum, IEEE, vol. 34, no. 5, pp. 50–56, May 1997. [13] A. Finkelstein and D. H. Salesin, “Multiresolution curves,” in SIGGRAPH. New York, NY, USA: ACM, 1994, pp. 261–268. [14] E. J. Stollnitz, T. D. Derose, and D. H. Salesin, Wavelets for computer graphics: theory and applications. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1996. [15] J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K. Komatsu, M. Matsui, H. Fujita, Y. Kodera, and K. Doi, “Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules,” American Journal of Roentgenology, vol. 174, pp. 71–74, 2000.