Content-based Image Retrieval Incorporating ... - Semantic Scholar

12 downloads 4106 Views 638KB Size Report
Current content-based retrieval systems use low-level image features based ...... [19] http://www.mathworks.com/access/helpdesk/help/toolbox/images/images.
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019

Content-based Image Retrieval Incorporating Models of Human Perception M. Emre Celebi, Y. Alp Aslandogan [email protected], [email protected]

Technical Report CSE-2003-21

Content Based Image Retrieval Incorporating Models of Human Perception M. Emre Celebi ([email protected]) Y. Alp Aslandogan ([email protected])

Department of Computer Science and Engineering The University of Texas at Arlington

ABSTRACT In this work, we develop a system for retrieving medical images with focus objects incorporating models of human perception. The approach is to guide the search for an optimum similarity function using human perception. First, the images are segmented using an automated segmentation tool. Then, 20 shape features are computed from each image to obtain a feature matrix. Principal component analysis is performed on this matrix to reduce its dimension. Principal components obtained from the analysis are used to select a subset of variables that best represents the data. A human perception of similarity experiment is designed to obtain an aggregated human response matrix. Finally, an optimum weighted Manhattan distance function is designed using a genetic algorithm utilizing Mantel test as a fitness function. The system is tested for content-based retrieval of skin lesion images. The results show significant agreement between the computer assessment and human perception of similarity. Since the features extracted are not specific to skin lesion images, the system can be used to retrieve other types of images having focus objects.

1. INTRODUCTION Medical information systems with advanced browsing capabilities play an important role in medical training, research, and diagnostics. Since images represent an essential component of the diagnosis, it is natural to use medical images to support browsing and querying of medical databases [1]. Current content-based retrieval systems use low-level image features based on color, texture, and shape to represent images. However, except in some constrained applications such as human face and fingerprint recognition, these low-level features do not capture the high-level semantic meaning of images [2]. Although the ultimate goal of all image similarity metrics is to be consistent with human perception, little work has been done to systematically examine this consistency. Commonly, the performance of similarity metrics is evaluated based on anecdotal accounts of good and poor matching results [3]. In this paper, we develop a system for retrieving medical images with focus objects incorporating models of human perception. The approach is to guide the search for an optimum similarity function using human perception. First, the images are segmented using an automated segmentation tool. Then, 20 shape features are computed from each image to obtain a feature matrix. Principal component analysis is performed on this matrix to reduce its dimension. The principal components obtained from the analysis are used to select a subset of variables that best represents the data. A human perception of similarity experiment is designed to obtain an aggregated human response matrix. Finally, an optimum weighted Manhattan distance function is designed using a genetic algorithm utilizing Mantel test as a fitness function. The system is tested for content-based retrieval of skin lesion images. The results show significant agreement between the computer assessment and human perception of similarity. Since the features extracted are not specific to skin lesion images, the system can be used to retrieve other types of images having focus objects. The rest of the paper is organized as follows: Section 2 describes segmentation and feature computation. Section 3 introduces principal component analysis (PCA), a dimensionality reduction and feature selection technique, and describes the application of PCA to our data set. Section 4 describes design of the human perception of similarity experiment. Section 5 discusses optimization of the similarity function using a genetic algorithm. Section 6 concludes the paper and provides future research directions.

2. SEGMENTATION AND FEATURE COMPUTATION

This Section discusses segmentation and feature computation steps. Section 2.1 gives a description of our data set. Section 2.2 discusses the importance segmentation for object recognition. Finally, Section 2.3 describes feature computation and presents descriptions of the shape features that are computed for each image. 2.1 Data Set Description

In this work we use 184 skin lesion images obtained from various online image databases such as … Because of their nature many shape features are not invariant to scaling. This means that, shape features obtained from images acquired in different resolutions can be misleading. As an example, the same lesion image in two different resolutions will definitely have two different area measurements. Therefore, for reliable shape recognition using a standard resolution image set is crucial. In our database all of the images have a standard resolution of 500 pixels per centimeter. 2.2 Segmentation

Segmentation is the process of partitioning an image into regions that have similar characteristics such as intesity, color, texture, etc. [4]. It is an extremely important step in image retrieval since accurate computation of shape features depend on good segmentation [2]. There is no theory of image segmentation. As a consequence, no single standard method has emerged. Rather, there are several ad hoc methods that have received some degree of popularity [4]. For segmentation of lesion images we use an automated tool which have been developed by Goshtasby et al. [5] specifically for this purpose. Some of the segmented lesion images are shown in Figure 1.

Figure 1. Segmented skin lesion images 2.3 Feature Computation

An image feature is a distinguishing primitive characteristic or attribute of an image [4]. Feature computation involves extraction of a specific number of features (say n) from each database image. These features are then grouped in n-dimensional feature vectors which form an n-dimensional feature space. Similarity search in ndimensional feature space thus consists of computing the distances between the feature vector of the query image and those of the database images and ranking them. The ABCD rule [6] summarizes the clinical features of a melanoma (a serious type of skin cancer) by : asymmetry (A), border irregularity (B), color variegation (C) and diameter greater than 6 mm (D). Figure 2 shows the ABCD rule. Interestingly, three of these features are shape based features. For each image in the database we compute 20 shape features including asymmetry, border irregularity, and diameter. Descriptions of these shape features will be presented in the rest of this Section.

Figure 2. ABCD’s of Melanoma [6]

1) Diameter For computing the diameter of a lesion, the contour of the lesion is treated as a point set. The diameter of a point set P is defined as the maximum distance between two points of P . A brute force approach would be to compare distances between all pairs of points in P which means O(n2) distance computations. Since a typical lesion contour contains a large number of points this would be very slow. Because of this, we use a GNU general public license software [7] for this purpose.

2) Bending Energy The bending energy of a contour has been proposed as a robust global shape descriptor [8] and is one of the most important global measures related to shape complexity. This measure has been adapted from elasticity theory, expressing the required amount of energy necessary to transform a given close contour into a circle with the same perimeter as the original contour. Consequently, this contour descriptor is invariant to translation, rotation and reflection, being easily normalized with respect to changes in scale and frequently used as a complexity measure [9]. To compute the total bending energy, curvature of the boundary at each point should be computed. Let x(i) and y(i) be the ith coordinate location describing an object’s boundary. Then the curvature of this boundary at the ith position is defined as: C(i)2= (2* x(i)- x(i-1)- x(i+1))2+ (2* y(i)- y(i-1)- y(i+1))2 where i є [1, M-2] and M is the number of points on the boundary. Curvature normalized with boundary length P is called the bending energy: M-2 E= (1/ P)* ∑ C(i)2 [10] i=1 3) Contour Moments Statistical moments are among the most classical and popular shape descriptors [9]. A set of moments computed from an image generally represents global characteristics of the image shape, and provides a lot of information about different types of geometrical features of the image [11].Moment based shape recognition applications date back to 1960s [12]. There are many types of moments functions each having its own advantages in specific application areas [11]. Examples include Geometric moments, Zernike Moments, and Legendre moments. In this work we use contour sequence moments because of their noise insensitivity and efficiency [20]. Let z(i) be an ordered sequence that represents the Euclidean distance between the centroid and all N boundary pixels of the object. The rth contour sequence moment mr and the rth central moment µr can be defined as [13]:

N mr = (1/N) * ∑ [z(i)]r i=1

N µr = (1/N) * ∑ [z(i)- m1]r i=1

Gupta and Srinath introduced translation-, rotation-, and scale-invariant onedimensional normalized contour sequence moments [14].

While normalized contour sequence moments can be used directly for shape representation, less noise-sensitive results can be obtained from the following shape descriptors: F1 = (µ2)1/2 / m1 F2 = µ3/ (µ2)3/2 F3 = µ4/ (µ2)2 F4 = µ5/ (µ2)5/2 Obviously, contour sequence moments are less computationally expensive than region moments since only boundary points are considered in calculations. 4) Convex Hull Perimeter & Area An important class of shape representation methods is based on defining a bounding region that encloses the shape of interest [9]. The Convex Hull is defined as the smallest convex set that contains the shape (See Figure 3). We compute the convex hull using the Graham Scan algorithm [15]. The area of the convex hull is computed by selecting an arbitrary point inside the hull (in our case we select the centroid), forming triangles by drawing lines from this point to the points constituting the hull and then summing the areas of triangles. The hull perimeter is calculated by summing the euclidean distances between adjacent points in the hull.

Figure 3. Convex Hull [16]

5) Lesion Area & Perimeter The area of a lesion is calculated simply by counting the number of pixels contained within its boundary. The perimeter is calculated by counting the number of pixels on the boundary. 6) Convexity Convexity is the relative amount that an object differs from a convex object [16]. It is formulated as the ratio of the perimeter of a convex hull of an object to the perimeter of the object. It is equal to 1 for convex objects. Objects having irregular boundaries have convexity less than 1.

Figure 4. Convexity [16]

7) Solidity Solidity measures the density of an object and is defined as the ratio of the area of an object to the area of a convex hull of the object [16]. Objects having irregular boundaries have solidity less than 1.

Figure 5. Solidity [16]

8) Compactness While area and perimeter are both invariant with respect to translation and rotation, they are not invariant to scaling. A combination of both descriptors, however, can be used to obtain a dimensionless measurement which is invariant to translation, rotation and scaling: compactness, which is defined as (4*π*Area/ Perimeter2 ) is a measure of the roundness of a shape and is minimal for a circular shape [17]. Objects having irregular boundaries have larger compactness.

Figure 6. Compactness [16]

9) Ratio of Principal Axes (Eccentricity) Principal axes (See Figure 7) of an object can be defined as lines that intersect

orthogonally at the centroid of the object and represent the directions with zero crosscorrelation. The lengths of principal axes equal the eigenvalues of the covariance matrix C of the contour. Let us denote the elements of C as

The ratio of principal axes is a reasonable measure of elongation which can be calculated as [18]:

Figure 7. Principal Axes [16]

Figure 8. Eccentricity [16]

10) Extent Extent is defined as the proportion of the pixels in the convex hull of the object that also belong to the object [19]. 11) Orientation Orientation is defined as the angle between the x-axis and the major axis of the ellipse that has the same second-order moments as the object [19]. As mentioned earlier, some of these features such as area and perimeter are not scaleinvariant. In the beginning, this may seem inappropriate. Because, all shapes with the same area have the same value for this attribute, although they may not be similar in shape. However, effective combination of some simple features may lead to more discriminative features as in the case of compactness. Also, even some simple shape measures can be successfully applied in some specific domains [18], although it is not easy to decide which features are most appropriate for each problem [9]. Possible approaches in this case could be either applying feature transformation (extraction) or feature subset selection to a large set of possible features to obtain an optimum set of features for the training set. These issues will be discussed in the next Section in detail.

3. DIMENSIONALITY REDUCTION AND FEATURE SUBSET SELECTION This Section discusses the dimensionality reduction and feature subset selection procedures. Section 3.1 introduces the concept of “curse of dimensionality” [48]. Section 3.2 discusses the dimensionality reduction methods. Section 3.3 describes the principal component analysis technique. Section 3.4 describes principal component analysis of skin lesion data. Finally, Section 3.5 discusses the use of principal component analysis in selecting a subset of variables. 3.1 Curse of Dimensionality

After the feature computation step, we have high dimensional (20) data to be analyzed. In classification problems, obtaining and analyzing high dimensional data is difficult due to several reasons. First of all, extracting a high number of features implies high computational cost in terms of both time and storage. Therefore, by refraining from extracting redundant features we can reduce the overall cost of classification. Also, keeping the number of features small might be beneficial due to economic (acquisition of features might be expensive) or medical reasons (measurement of some features might be risky). The number of features affects the time needed for accurate classification. An abundance of irrelavant or redundant features can increase the size of the search space and hence the time needed for sufficiently accurate classification [30]. ”Curse of dimensionality” also affects classifier accuracy. Initially, classifier accuracy increases as we add more features but after some point further inclusion of new features will result in performance degradation [21]. This is also known as the “Hughes phenomenon” [46] or “peaking phenomenon”. As it can be seen from Figure 1 as dimensionality of data increases, the number of required samples increases exponentially.

Figure 1. Hughes Phenomenon [47]

Data visualization is an invaluable tool for data mining. It is important for better understanding of data, discovery of clusters and detection of outliers. Unfortunately, as dimensionality increases, effective visualization becomes difficult and ineffective. 3.2 Dimensionality Reduction

Dimensionality reduction can be achieved in two different ways [21]. One approach, called feature subset selection (FSS), is selection of a subset of features that minimizes intra-class variance and maximizes inter-class variance. In other words, less discriminatory features are eliminated, leaving a subset of of the original features which retains sufficient information to discriminate well among classes. [31] The other approach, called feature extraction (FE), is linear or non-linear transformation of the original feature vector that optimizes a certain class separability criterion, possibly leading to fewer features at the same time. [22] There are numerous FSS procedures proposed in the literature. These can be roughly categorized as complete or heuristic. In complete search strategies all possible feature subsets are systematically examined. [23] Depth-first, breadth-first, and branch and bound search [24] are among the well known complete search procedures. Heuristic algorithms can be divided into deterministic and non-deterministic ones. Classic deterministic heuristic procedures are sequential forward selection, sequential backward selection [25], floating selection [26], and best-first search [27]. These are deterministic in the sense that all the runs obtain the same solution [23]. In nondeterministic heuristic search procedures randomness is used to escape from localminima. [23] Two classic non-deterministic heuristic procedures are Genetic Algorithms [28], and Simulated Annealing [29]. Feature extraction methods can be broadly categorized as linear or non-linear methods. Unsupervised linear feature extraction methods are more or less all based on Principal Component Analysis (PCA), whereas supervised linear feature extraction methods usually rely on discriminant analysis technique[32]. Multidimensional scaling [33, 34] and Sammon’s mapping [35] are two classical non-linear feature extraction methods. With the development of neural networks, new non-linear feature extraction techniques have been developed [32]. Self-Organizing Maps [36], autoassociative feedforward networks [37], and a neural-network version of Sammon’s mapping [38] are the examples. 3.3 Principal Component Analysis (PCA)

PCA is an unsupervised linear feature extraction technique that transforms a set of variables into a substantially smaller set of uncorrelated variables that represents most of the information in the original set of variables [39]. The idea was originally devised by Pearson [40], and independently developed by Hotelling [41].

Suppose we want to reduce our n dimensional data to n