Image retrieval using bdip and bvlc moments - Semantic Scholar

2 downloads 0 Views 818KB Size Report
Image Retrieval Using BDIP and BVLC Moments. Young Deok Chun, Sang Yong Seo, and Nam Chul Kim. Abstract—In this paper, we propose new texture ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

951

Image Retrieval Using BDIP and BVLC Moments Young Deok Chun, Sang Yong Seo, and Nam Chul Kim

Abstract—In this paper, we propose new texture features, block difference of inverse probabilities (BDIP) and block variation of local correlation coefficients (BVLC), for content-based image retrieval and then present an image retrieval method based on the combination of BDIP and BVLC moments. BDIP uses local probabilities in image blocks to measure local brightness variations of an image well. BVLC uses variations of local correlation coefficients in image blocks to measure local texture smoothness of an image well. Experimental results show that the presented retrieval method yields about 12% better performance in precision vs. recall and about 0.13 in average normalized modified retrieval rank (ANMRR) than the method using wavelet moments. Index Terms—Block difference of inverse probabilities (BDIP), block variation of local correlation coefficients (BVLC), contentbased image retrieval (CBIR), texture feature.

I. INTRODUCTION

T

HE RAPID growth of computer-based communication has led to a tremendous increase in multimedia information. Images, compared with other media like text and audio, tend to contain an enormous volume of information. Therefore, it is very difficult for users to access visual information unless the image database (DB) is organized so as to allow efficient storage, management, and retrieval. Active research into efficient image retrieval has steadily progressed since the 1970s. Generally, there are two types of retrieval methods: textbased and content-based [1]. Text-based retrieval is a popular method that annotates images by text and uses text-based DB management systems to perform image retrieval. However, there are two major difficulties with this method, especially when the image DB is huge. One is the vast amount of labor required in manual image annotation. The other difficulty results from the rich content of images and the subjectivity of human perception. Subjective perception can lead to imprecise annotations that may produce incorrect search results in subsequent retrieval processes. The other method, content- based image retrieval (CBIR), does not suffer from these difficulties, for instead of relying on manual annotation, it indexes images by their visual features such as color, shape, and texture. Color is one of the most widely indexed features in CBIR. It is invariant to image size and orientation. It can be indexed by feature descriptors such as color moment [1], color histogram [2], or color structure descriptor (CSD) which is one of MPEG-7 color descriptors [3]. Manuscript received February 12, 2002; revised September 19, 2002. This paper was recommended by Associate Editor R. Lancini. Y. D. Chun and N. C. Kim are with the School of Electronic and Electrical Engineering, Kyungpook National University, Taegu 702-701, Korea (e-mail: [email protected]; [email protected]). S. Y. Seo is with the Korea Telecom Corporation, Seoul 137-792, Korea (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSVT.2003.816507

Shape is a feature that represents the contour of an object in an image. It is invariant to the size and location of the object. However, it is difficult to extract the contour of an object correctly. Shape can be represented by feature descriptors such as Fourier descriptor, chain code, moment invariant, and Zernike moments [4]. Texture refers to innate surface properties of an object and their relationship to the surrounding environment [5]. In the early 1970s, Haralick et al. proposed the co-occurrence matrix representation of texture features [5]. Tamura texture [6] was subsequently proposed as an enhanced version. In the early 1990s, after wavelet transform was introduced, researchers began to study the use of wavelet transform in texture representation [7], [8]. Among retrieval methods using wavelet transform, the method [8] using means and variances extracted from wavelet subbands is known to produce excellent texture images [1]. Wavelet transform-based methods have also been combined with other techniques to achieve better image retrieval performance [9], [10]. Additional texture indexing techniques that have been proposed include: Gabor transform [11], which reflects characteristics of the human visual system; PIM (picture information measure) [12], which has a property of entropy operator; and edge histogram descriptor (EHD) [3], which is an MPEG-7 texture descriptor that captures the spatial distribution of edges. Several CBIR systems provide automatic indexing and querying based on visual features such as color, shape, and texture. For example, QBIC [13] offers options for retrieval by color, spatial location, and texture. Virage [14] offers retrieval by a weighted combination of color, spatial location, texture, and edge-direction histogram. Photobook [15] provides interactive tools for browsing and retrieval by shape, texture, or appearance in terms of facial recognition. Blobworld [16] allows for a transformation from raw pixel data to a small set of localized coherent regions of color and texture. In this paper, we propose two new texture features for CBIR: block difference of inverse probabilities (BDIP) and block variation of local correlation coefficients (BVLC). We then present an image retrieval method based on the combination of BDIP and BVLC moments. BDIP, which is a kind of entropy operator, is defined as the difference between the number of pixels in a block and the ratio of the sum of pixel intensities in the block to the maximum in the block. This feature uses local probabilities in image blocks to measure variations in intensity well. BVLC is defined as the difference between the maximum and minimum of local correlation coefficients according to four orientations in a block. This feature uses variations of local correlation coefficients in image blocks to measure texture smoothness well. The first and second moments of BDIP and BVLC are used as a feature vector for CBIR, which is not affected by translation and rotation of an object. The Corel DB, chosen from the Corel Draw

1051-8215/03$17.00 © 2003 IEEE

952

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

(a)

(b)

(c) Fig. 1. Example of wavelet decomposition. (a) Block diagram for a 4-band wavelet decomposition. (b) Configuration of 3-level wavelet decomposed images. (c) An original image and its 3-level wavelet decomposed images.

Photo DB [17], and the VisTex DB, chosen from the VisTex collection of the MIT Media Lab [18], were used to evaluate the performance of the proposed retrieval method. II. CONVENTIONAL FEATURES In this section, we describe some features that are commonly used and whose performance we will compare with that of the proposed features. For simplicity, the descriptions are focused on feature extraction from one component of a color image.

corresponds to an original image. Fig. 1(b) shows the configuration of 3-level wavelet decomposed images and Fig. 1(c) shows an original image and its 3-level wavelet decomposed images. In [8], wavelet moments are computed as the first and second central moments on the absolute values of coefficients of each subband in wavelet transform domain. That is, they can be expressed as (2)

A. Histogram A histogram represents the frequencies of gray levels in an image. The histogram of an image with gray levels is expressed as (1) denotes a Kronecker delta function and the where intensity of a pixel ( ) in the image . This feature is relatively robust, even when images are rotated or translated. However, it is sensitive to changes of contrast and lacks information about how color is spatially distributed in an image. B. Wavelet Moments The set of wavelet moments used in [8] is a feature related to the energy distribution of each wavelet subband decomposed into multiscales. Fig. 1 shows an example of wavelet decompo)-level image sition. As shown in Fig. 1(a), the ( is decomposed into a low band image and three high , , and of horiband images zontal, vertical, and diagonal orientations, respectively, where

(3) denote decomposition level and subband oriwhere and the number of coefficients in the entation, respectively, th subband, the intensity of a pixel ( ) in the , and and the mean and standard desubband image th subband, viation of absolute values of coefficients in the respectively. C. Picture Information Measure (PIM) PIM [12], which is applied to calculating the entropy for an image block, is defined as the difference between the number of pixels in a block and the maximum value of the histogram in a block. That is (4) where denotes a gray level, the number of gray levels, and the histogram value in a block.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

953

(a)

(b) Fig. 2. Original images and their BDIP images. (a) Original images. (b) BDIP images.

B. BVLC

(a)

(b)

(c)

(d)

2

Fig. 3. Pixel configurations in 2 2 windows and their corresponding windows shifted in each of four orientations, which are required to compute: (a) (0; 1); (b) (1; 0); (c) (1; 1); and (d) (1; 1).

0

Variation of local correlation coefficients (VLCC) [20] is known to measure texture smoothness well. It is defined as the variation, or the difference between the maximum and minimum, of local correlation coefficients according to four orientations. The second texture feature we propose, BVLC, is a block-based version of the VLCC. Each local correlation coefficient is defined as local covariance normalized by local variance. That is (6)

III. PROPOSED TEXTURE FEATURES A. BDIP The difference of inverse probabilities (DIP) [19] is an operator for extracting sketch features that contain valleys and edges subject to local intensities. In the DIP, the ratio of a pixel intensity in an image window to the sum of all pixel intensities in a window is considered as a probability. So, the name DIP means the difference between the inverse of the probability for the center pixel in a window and that for the pixel of maximum intensity in the window. BDIP, which is one of the proposed texture features, is a block-based version of the DIP. It is defined as the difference between the number of pixels in a block and the ratio of the sum of pixel intensities in the block to the maximum in the block. That is

(5)

denotes the intensity of a pixel ( ) and a block where . The larger the variation of intensities there is of size in a block, the higher the value of BDIP. Fig. 2 shows some original images and their BDIP images whose pixel intensities represent the negatives of their BDIP values. The block size is chosen as 2 2 and higher BDIP values are shown darker. In Fig. 2(b), the insides of objects and the backgrounds are shown bright, while edges and valleys are shown dark. We therefore see the effectiveness of the BDIP feature for extracting edges and valleys.

denotes a block of size and and where denote the local mean and standard deviation of the block B, respectively. The notation ( ) denotes a pair of horizontal shift and vertical shift associated with the four orientations . As a result, and represent the mean and standard deviation of the block shifted by ( ), respectively. Fig. 3 shows pixel configurations in 2 2 windows and their corresponding windows shifted in each of four orientations, , , , and which are required to compute . As a result, the value of BVLC is expressed as

(7) From the above equation, we see that the larger the degree of roughness there is in a block, the higher the value of BVLC. Fig. 4 shows the BVLC images for the original images in Fig. 2(a). The BVLC images are displayed in the same manner as the BDIP images in Fig. 2(b). In Fig. 4, we can see that the intensity of the BVLC images is determined by texture smoothness. For example, smooth textures such as sky are shown bright, while rough textures such as road and petals are shown dark. IV. IMAGE RETRIEVAL USING BDIP AND BVLC MOMENTS Our image retrieval method is based on the combination of BDIP and BVLC moments. A set of the first and second moments of BDIP and BVLC are used as a feature vector for CBIR.

954

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

Fig. 4. BVLC images for the original images in Fig. 2(a).

Fig. 5. Block diagram of an image retrieval system using the combination of BDIP and BVLC moments.

Fig. 5 shows the block diagram of an image retrieval system using the combination of BDIP and BVLC moments. When a query color image enters the system, each color component image is divided into nonoverlapping blocks of size The system then computes BDIP and BVLC in each block and classifies all the blocks into eight classes. The purpose of the classification is to reflect the characteristics of several homogeneous regions or objects, which an image generally contains, in the proposed features. The block classification proceeds as follows. In the first step, all the blocks are classified into two groups, and the average of BDIPs over all blocks in the image is used as a threshold. In the second step, all blocks in each of the two groups are classified again into two groups, but this time the average of BDIPs over all blocks in each group is used as a threshold. In the last step, which repeats the same procedure as in the second step, all the blocks are classified into eight classes. After the block classification, the system computes the first and second moments of BDIP and BVLC for each class and combines these moments as a feature vector, which can be written as

denote the mean and standard deviation for each th class for each color component image, respectively. The system finally calculates the distance between the feature vector of the query image and that of each target image in an image DB and retrieves a given number of the most similar target images. V. EXPERIMENTAL RESULTS AND DISCUSSIONS

(8)

The Corel DB and the VisTex DB were used to evaluate the performance of the proposed retrieval method. The Corel DB is composed of 990 RGB color images with 192 128 pixels, chosen from Corel Draw Photo [17] and divided into 11 classes, each with 90 images. The VisTex DB is composed of 1200 RGB color images with 128 128 pixels, divided into 75 classes. Each class contains 16 images coming from their mother image, which is one of 75 images with 512 512 pixels chosen from the VisTex collection from the MIT Media Lab [18]. Table I lists the classes of the Corel and the VisTex DB. Fig. 6 shows some RGB images sampled from the Corel and the VisTex DB. The similarity measure is for computing the similarity between a feature vector of a query image and a feature vector of a target image. We use a Mahalanobis distance [21] as a similarity measure, which is defined as

(9)

(11)

(10) and denote the BDIP and BVLC moment vecwhere and tors for each color component image, respectively, and

and denote the th components of a query feature where vector and a target feature vector , respectively, and and denote a metric order and the dimension of a feature vector.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

955

(a)

(b) Fig. 6. RGB images sampled from (a) Corel DB and (b) VisTex DB.

TABLE I CLASSES OF COREL DB AND VISTEX DB

TABLE II ESTIMATED COMPUTATIONAL COMPLEXITY OF RETRIEVAL METHODS

denotes the standard deviation of the th component for feature vectors in a feature DB. As measures of retrieval performance, we use precision and relevant to recall [22]. Suppose a query , a set of images are given. Recall the query , and a set of retrieved images and precision then are given as (12) (13) returns the size of a set. Therefore, the where the operator recall represents the ratio of the number of images in retrieved images relevant to the query to the number of images in the DB relevant to the query. The precision represents the ratio of the

*A: the number of additions; M: the number of multiplications; size of query and target images; : the number of images in a DB.

D

HV : the

number of images in retrieved images relevant to the query to the number of retrieved images. In our experiments, each image in a test DB was chosen as a query and the others in the DB became target images for such a query. As a result, the performance of a retrieval method was ascertained by averaging retrieval performances over all the queries. We also adopted the ANMRR (average normalized modified retrieval rank) [23] used in all of the MPEG-7 color core experiments. Note that a lower ANMRR value means more accurate retrieval performance. The block size for the BDIP and BVLC was chosen as 2 2, which represents local properties in great detail and which actually yielded the best performance in the test DBs. In the wavelet moments, Daubechies biorthgonal 9/7 tap filters

956

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

Fig. 7. Precision performance of the retrieval methods using the proposed features and the method using PIM according to feature dimension. (a) Corel DB. (b) VisTex DB.

[24] was chosen for wavelet decomposition and the means and standard deviations for 12 subbands including three LL subbands were calculated as wavelet moments. The block sizes for the EHD and for the PIM were determined as 4 4 and 8 8 respectively, because other sizes yielded inferior performances in the test DBs. For a simple implementation of similarity measurement, the metric order of all the methods . was chosen as To evaluate the computational complexity of the proposed method, we compare the number of additions and the number of multiplications for the proposed method with those for the histogram method. Let the size of query and target images be The extraction of the BDIP in (5) requires additions and a multiplication per block. The extraction of the BVLC in additions and multiplica(6) and (7) needs tions per block. The block classification by three steps requires additions per color component image approximately and the calculation of the feature vector in (8) needs additions and multiplications per color component image. Since the block size of the proposed features chosen additions is 2 2, the proposed method needs about multiplications per color image. On the other hand, and additions the calculation of the histograms requires only per color image. Let denote the number of images in a DB.

Fig. 8. Precision versus recall performance of retrieval methods. (a) Corel DB. (b) VisTex DB.

Since the metric order chosen is , the similarity compuadditions and multiplitation in (11) requires cations per query image. Therefore, the similarity computation needs additions for the proposed feature vector of multiplications and that for the histogram feature of and needs additions and multiplications. For the is about 13.7, we finally see Vistex DB in which the ratio that the proposed method requires about 3.4 times the number of additions and about 8.3 times the number of multiplications over the histogram-based method. The complexity of the EHD, CSD, and wavelet moments can also be estimated in similar ways and is summarized in Table II. In the estimation of the complexity of the wavelet moments, we used the fact that there is no need of filtering for pixels excluded by subsampling to reduce a half of the number of additions and the symmetry property of the filters to further reduce about a half of the number of multiplications. Fig. 7 shows the precision performance of the retrieval method using the proposed features and the method using the PIM moments according to feature dimension. The method using the combination of BDIP and BVLC moments yielded 5% average performance gain in the Corel DB and about 10% in the VisTex DB over the method using only BDIP moments. The method also yielded 10% in the Corel DB and 12% in the VisTex DB over the method using only BVLC moments. These results underscore the effectiveness of combining BDIP and BVLC moments. The proposed method also yielded 20%

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 9, SEPTEMBER 2003

TABLE III DIMENSION OF RETRIEVAL METHODS

TABLE IV ANMRR PERFORMANCE OF RETRIEVAL METHODS

average performance gain in the Corel DB and 40% in the VisTex DB over the PIM moments. Fig. 8 shows the precision versus recall performance of the combination of BDIP and BVLC moments, histogram, wavelet moments, EHD, and CSD. In these figures, the numbers attached to the curves represent the number of retrieved images. The dimension of all the methods performed here is given in Table III. The proposed method yielded 8% average performance gain in the Corel DB and 7% in the VisTex DB over the wavelet moments and produced 15% average performance gain in the Corel DB and 4% in the VisTex DB over the CSD. Also, the proposed method showed 15% average performance gain in both of the two DBs over the histogram and the EHD. Table IV shows ANMRR performance of the retrieval methods. The proposed method also yielded at least 0.13 performance gain in the Corel DB and at least 0.04 in the VisTex DB over the other methods. Finally, in the proposed method, scale invariance of a texture feature is not considered in relation to real-world data. The problem requires a further work. REFERENCES [1] Y. Rui and T. S. Huang, “Image retrieval: Current techniques, promising directions, and open issues,” J. Vis. Commun. Image Repres., vol. 10, pp. 39–62, Oct. 1999. [2] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis., vol. 7, pp. 11–32, 1991. [3] “ISO/IEC 15 938–3/FDIS Information Technology – Multimedia Content Description Interface—Part 3 Visual,” ISO/IEC/JTC1/SC29/ WG11, Doc. N4358, 2001. [4] B. M. Mehtre, M. Kankanhalli, and W. F. Lee, “Shape measures for content based image retrieval: A comparison,” Info. Processing & Management, vol. 33, pp. 319–337, 1997.

957

[5] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Texture features for image classification,” IEEE Trans. Syst. Man Cybern., vol. SMC-8, pp. 610–621, Nov. 1973. [6] H. Tamura, S. Mori, and T. Yamawaki, “Texture features corresponding to visual perception,” IEEE Trans. Syst. Man Cybern., vol. 8, pp. 460–473, June 1978. [7] A. Laine and J. Fan, “Texture classification by wavelet packet signatures,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp. 1186–1191, Nov. 1993. [8] J. R. Smith and S.-F. Chang, “Transform features for texture classification and discrimination in large image databases,” Proc. IEEE Int. Conf. Image Processing, vol. 3, pp. 407–411, Nov. 1994. [9] A. Kundu and J.-L. Chen, “Texture classification using qmf bank-based subband decomposition,” CVGIP: Graph. Models and Image Processing, vol. 54, pp. 369–384, 1992. [10] K. S. Thyagarajan, T. Nguyen, and C. Persons, “A maximum likelihood approach to texture classification using wavelet transform,” Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 640–644, Nov. 1994. [11] B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 837–841, Aug. 1996. [12] S.-K. Chang, Principles of Pictorial Information Systems Design. Upper Saddle River, NJ: Prentice-Hall, 1989. [13] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, vol. 28, pp. 23–32, Sept. 1995. [14] A. Gupta and R. Jain, “Visual information retrieval,” Comm. ACM, vol. 40, pp. 71–79, May 1997. [15] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases,” Int. J. Comput. Vis., vol. 18, pp. 233–254, June 1996. [16] C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, and J. Malik, “Blobworld: A system for region-based image indexing and retrieval,” in Proc. Int. Conf. Vision Information System, June 1999, pp. 509–516. [17] Corel Draw Photo DB [Online]. Available: http://dlp.cs. berkeley.edu/ photos/corel/ [18] VisTex Texture Database. [Online]. Available: http://www-white. media. mit.edu./vismod/imagery/VisionTexture/VisTex.html [19] Y. J. Ryoo and N. C. Kim, “Valley operator extracting sketch features: DIP,” Electron. Lett., vol. 248, pp. 461–463, Apr. 1988. [20] S. Y. Seo, C. W. Lim, Y. D. Chun, and N. C. Kim, “Extraction of texture regions using region-based correlation,” in Proc. SPIE VCIP2001, vol. 4315, San Jose, CA, Jan. 2001, pp. 694–701. [21] W. Y. Ma and B. S. Manjunath, “A comparison of wavelet transform feature for texture image annotation,” Proc. IEEE Int. Conf. Image Processing, vol. 2, pp. 256–259, Oct. 1995. [22] S. F. Chang, W. C. Horace, J. Meng, H. Sundaram, and D. Zhong, “A fully automated content-based video search engine supporting spatiotemporal queries,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 602–615, Sept. 1998. [23] P. Ndjiki-Nya, J. Restat, T. Meiers, J.-R. Ohm, A. Seyferth, and R. Sniehotta, “Subjective evaluation of the MPEG-7 retrieval accuracy measure (ANMRR),” in ISO/WG11 MPEG Meeting, Geneva, Switzerland, May 2000, Doc. M6029. [24] I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” IEEE Trans. Inform. Theory, vol. 36, pp. 961–1005, Sept. 1990.