Sensibility-Aware Image Retrieval Using ...

5 downloads 651 Views 542KB Size Report
RIM, JPG, J2K, and Their Mixtures. Takatoshi Kato1, Shun'ichi Honma1,2, ... http://www.wiz.cs.waseda.ac.jp/index-e.html. Abstract. Sensibility-aware image ...
Sensibility-Aware Image Retrieval Using Computationally Learned Bases: RIM, JPG, J2K, and Their Mixtures Takatoshi Kato1 , Shun’ichi Honma1,2 , Yasuo Matsuyama1 , Tetsuma Yoshino1 , and Yuuki Hoshino1 1

Department of Computer Science and Engineering, Waseda University, Tokyo 169-8555, Japan 2 SONY, Tokyo 108-0075, Japan {tkato,shunichi1029,yasuo,tetsuma-y,yu-ki hs} @wiz.cs.waseda.ac.jp http://www.wiz.cs.waseda.ac.jp/index-e.html

Abstract. Sensibility-aware image retrieval methods are presented and their performances are compared. Three systems are discussed in this paper: PCA/ICA-based method called RIM (Retrieval-aware IMage format), JPEG, and JPEG2000. In each case, a query is an image per se. Similar images are retrieved to this query. The RIM method is judged to be the best settlement in view of the retrieval performance and the response speed according a carefully designed set of opinion tests. An integrated retrieval system for image collections from the network and databases which contain RIM, JPEG and JPEG2000 is realized and evaluated lastly. Source codes of the RIM method is opened.

1

Introduction

The sensibility, or kansei, is an essential cry from human clients as a lubricant in the contemporary dry network society. Many studies have been reported towards the realization of human-friendly tools. But, plain strategies usually lead to demanding turnaround time because of their ad hoc natures. By reflecting this, we address the human-aware image retrieval based upon computational intelligence methods, especially by “learning from data.” All images are handled in their compressed domains. Addressed problems are as follows: (1) To present image compression methods based on PCA (Principal Component Analysis) and ICA (Independent Component Analysis). (2) To show the coding system using such learned bases. This format is called RIM (Retrieval aware IMage format). (3) To compare the RIM method with JPEG and JPEG2000. Obtained results can be previewed as follows. (a) On the data compression, the performance is JPEG2000  RIM  JPEG. M. K¨ oppen et al. (Eds.): ICONIP 2008, Part I, LNCS 5506, pp. 620–627, 2009. c Springer-Verlag Berlin Heidelberg 2009 

Sensibility-Aware Image Retrieval

621

(b) On the similar image retrieval, the performance is RIM  JPEG  JPEG2000. (c) The retrieval speed constrained by the performance is RIM >> JPEG ≫ JPEG2000. (d) By considering the items (a) ∼ (c), one finds that the RIM which is based on the learning from data is a viable method on the joint data compression and retrieval of images reflecting the human sensibility. (4) A similar-image retrieval system for mixtures of {RIM, JPEG, JPEG2000} is realized and tested. In order to guide readers to grasp the addressed problem, we give Figure 1 in advance which conveys the purport of this paper. As can be understood from this figure, a query to an image collection or a database is an image per se. There are three types of image expressions, RIM, JPEG, and JPEG2000. Installed systems find similar images as is shown in this illustration. Images marked by a circle are correct ones according to tested persons’ opinions. The third image may or may not be judged similar depending on the tested person. The sensibility or the similarity thus differ personally. The RIM method will be found the best from the viewpoint of the joint retrieval performance and the speed.

2

Image Compression Using Learned Bases

This paper’s joint image compression and retrieval by RIM uses bases learned from source data. The steps are as follows: (1) sampling, (2) average separation, (3) average quantization, (4) entropy coding on the average, (5) PCA bases learning, (5’) ICA bases learning, (6-1) bases normalization, (6-2) entropy coding on bases (7-1) coefficients calculation, (7-2) coefficients quantization, (7-3)

Fig. 1. Sensibility-aware image retrieval in three types of coded domains

622

T. Kato et al.

Entropy Coding

Average Quantization

image

3 Sampling

1

Average Separation

Calculating Coefficients

2 Bases Learning

5

Entropy Coding

Coefficients Quantization

7-1 Bases Normalization

4

7-2

7-3 Entropy Coding

Bases Quantization

6-1

Compressed File

6-2

6-3

Fig. 2. Image encoding using learned bases

average

bases

coefficients

Compression Info Header Block Header Info Header File Header

Fig. 3. RIM

entropy coding on coefficients. Figure 2 illustrates the flow of the encoding which is based on the learned bases. The above numbering matches to block numbers appearing in this figure. It is necessary to pack encoded quantities efficiently. That is, an effective format needs to be defined. Figure 3 illustrates such a format called RIM [1]. This format contains headers. But, this overhead is absorbed within the margin of the compression performance advantage over JPEG. The item (3)(a) of Section 1 was obtained by a preliminary experiment for this paper.

3

Similarity Measures for RIM, JPEG and JPEG2000

The goal of this paper is rephrased by the following items. (a) Retrieval performance improvement of RIM (b) Comparison of three methods, RIM, JPEG and JPEG2000, on the joint performance on the similar-image retrieval and its speed. (c) Checking to see if the sensibility-aware image retrieval is possible in formatmixed environments. This is in the compressed domains It is important to understand that the selection of similarity measures depends on each image format. This is because image format conversions are avoided in

Sensibility-Aware Image Retrieval

623

the process of the image retrieval. Defining characteristics of three compression methods can be reviewed as follows. RIM: There are two types, PCA-based and ICA-based. The format contains average colors, bases and coefficients which can be decoded separately. JPEG: Fixed DCT bases are used. There are DC and AC components. JPEG2000: Fixed wavelet transformation is used. Multiple subband images are encoded. Table 1. Features extracted from each format features

RIM

JPEG

JPEG2000

colors edges textures

average colors average colors bases

DC components DC components AC components

lower frequencies lower frequencies higher frequencies

Table 1 lists up extractable information from coded components. Quantities appeared in this table are utilized in various similarity definitions. Color and edge features can be utilized commonly to RIM, JPEG and JPEG2000. On the other hand, the texture information contributes to the similarity measure strongly depending on individual methods, The following summarizes sub-similarity measures Scolor , Sedge and Stexture which form the total similarity measure between two images: Stotal = b{aScolor + (1 − a)Sedge } + (1 − b)Stexture ,

{a, b} ∈ [0, 1]2 .

(1)

Details of the sub-similarity measure are as follows. (1) Color similarity Scolor : We test two types of color similarity measures. (a) Color structure descriptor(CSD) [2] (b) Auto-color-correlogram (ACG) [3] (2) Edge similarity Sedge : We use the edge histogram descriptor (EHD) of MPEG7 [2]. This feature was newly added in this paper. (3) Texture similarity Stexture : This similarity measure depends on the image compression methods RIM, JPEG and JPEG2000. (3-1) For RIM: Either PCA or ICA basis set is used [1]. The similarity computation uses a weighted summation of inner products of bases [4]. (3-2) For JPEG: Variance of AC coefficients (AC) obtained by a patch-wise zigzag scanning of images [5] is used. (3-3) For JPEG2000: (a) Variance of subband coefficients (VSC): This method uses variance of subband information stored in bins of color component × directions × resolution levels [6]. (b) Wavelet correlogram (WCG): This method uses the correlogram on higher subband frequencies [7]. (c) Generalized Gaussian density distance (GGD): This method uses estimated higher frequency distribution by generalized Gaussian densities. Distances are measured by the K-L divergence [8].

624

T. Kato et al. 0.7 RIM CSD + EHD + ICA RIM ACG + EHD + ICA 0.6

Precision

0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

Recall

Fig. 4. Precision-recall curves for RIM via ICA

4 4.1

Similar-Image Retrieval Performance Opinion Test Design

The sensibility is different individually. So is the human judgment on the image similarity as can be understood from Figure 1. Therefore, a set of well-designed opinion tests is essential for the choice of a viable similar-image retrieval system. The following is the description of the database generation for the opinion test. The groundtruth of 5,200 images was generated from a 20,000 image set of 52 categories in 5 themes [9]. Then, 100 query images (20 images form each theme) were chosen randomly. Each of 20 subjects (opinion test persons) checks to see 5,200 images for a given query. For each query, 5 ∼ 20 similar images were manually selected as correct ones in advance. Thus, the judgment on the similarity is strongly human sensibility dependent. The average number of correct images is only 0.2% of the total groundtruth. 4.2

Compatibility with Human Sensibility

Retrieval performances were measured as precision-recall curves. We tested all combinations of sub-similarity measures on {RIM, JPEG, JPEG2000} appeared in Section 3. This was a set of very demanding tasks both to human subjects and machines. Due to the space limitation, we give precision-recall curves only for the RIM via ICA+EHD+{CSD, ACG} as Figure 4. Here, Recall=1 means that all similar images are hit regardless of the subjects’ sensibility. This can not occur since the sensibility or the similarity judgment strongly depends on the personality. Therefore, 0Recall0.2 is the range of interest. It is also necessary to emphasize that Precision0.5 is a high performance region because of the variety of the subjects’ judgments.

Sensibility-Aware Image Retrieval

625

Fig. 5. Image retrieval performances for RIM, JPEG and JPEG2000

Instead of giving all combinations of sub-similarity measures for RIM, JPEG, JPEG2000, we show representing and average performances for all formats as Figure 5. One might think that average values are low. But, the averages are computed from the whole recall range (see Figure 4). From Figure 5, one finds that the RIM by ICA is the best and the JPEG by AC follows well. The JPEG2000 by WGG of the level 3 and the RIM by PCA follows as the third. Therefore, we can claim the RIM method is the most creditable from the compatibility with the human sensibility. But, there will be a further merit on the RIM. It is the speed which is a crucial factor for the retrieval. This will be shown in the next subsection. 4.3

Joint Performance of the Retrieval and Speed

The speed is an important factor ranked with the retrieval correctness. By considering this, we computed the joint performance of the average precision and speed. Figure 6 illustrates this result. In this figure, “XXX (YYY)” means that the method is “XXX” (J2K stands for JPEG2000), and the applied sub-similarity is “YYY.” The marks {diamond, square, triangle} stand for the similarity measures {color, edge, texture}. Positions of dots are obtained by adjusting parameters to show the best retrieval result. From this set of experiments, one finds the following. (a) RIM is speedy. Its color similarity is especially good from the joint viewpoint of the retrieval and speed. (b) JPEG shows creditable retrieval performance in each sub-similarity measure. But, its retrieval time requires 3∼4 times more than RIM. (c) JPEG2000 shows creditable retrieval performance in color and texture subsimilarities measures. But, its color similarity computation requires 3∼4 times more than RIM. Moreover, its texture similarity computation demands 10∼20 times more than RIM.

626

T. Kato et al.

Fig. 6. Joint retrieval and speed performance

Fig. 7. Conventional all-in-one system

(d) Considering the joint performance of the human sensibility awareness with respect to the similar image retrieval and its speed, RIM is judged to outperform JPEG and JPEG2000.

5

Concluding Remarks

In this paper, similar-image retrieval systems are generated an tested on the compatibility with the human sensibility. There were three formats tested: RIM

Sensibility-Aware Image Retrieval

627

using learned PCA/ICA bases, JPEG using DCT bases, and JPEG2000 using wavelets. It was shown that RIM is the best from the viewpoints of the retrieval performance and the speed. Since we generated similar-image retrieval systems on {RIM, JPEG, JPEG2000}, it has become possible to realize an all-in-one system which can work on mixed images of {RIM, JPEG, JPEG2000}. By reviewing Figure 6, however, the full integration of {RIM, JPEG, JPEG2000} suffers from the retrieval speed; even for putting the opinion tests together. Therefore, we generated a conventional all-in-one system for {RIM(ACG), JPEG(DC), JPEG2000(L5)}. Figure 7 illustrates the result on {RIM, JPEG, JPEG2000} with the same percentage mixture. This is another viable system besides the RIM of Figure 3 since any format conversions among {RIM, JPEG, JPEG2000} are unnecessary. This paper’s version-up of the full RIM method with its viewer called Wisvi (Waseda image search viewer) is open and can be downloaded from the authors’ web site given in the first page of this paper.

References 1. Katsumata, N., Matsuyama, Y., Chikagawa, T., Ohashi, F., Horiike, F., Honma, S., Nakamura, T.: Retrieval-aware image compression, its format and viewer based upon learned bases. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 420–429. Springer, Heidelberg (2006) 2. Martines, J.M.(ed.): ISO/IEC JTC1/SC29/WG11 Coding of moving pictures and audio: MPEG-7 overview, N6828 (2004) 3. Huang, J., Kumar, S.R., Mitra, M., Shu, W.-J., Zabih, R.: Image indexing using color correlograms. In: Proc. IEEE Comp. Soc. Conference on Visual and Pattern Recognition, pp. 762–768 (1997) 4. Katsumata, N., Matsuyama, Y.: Database retrieval for similar images using ICA and PCA bases. Engineering Applications of Artificial Intelligence 18, 705–717 (2005) 5. Mandal, M.K., Idris, F., Panchanathan, S.: A critical evaluation of image and video indexing techniques in the compressed domain. Image and Vision Computing 17, 513–529 (1999) 6. Mallat, S.G.: A theory of multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. and Machine Intelligence 11, 674–693 (1989) 7. Abrishami Moghaddam, H., Taghizadeh Khajoie, T., Rouhi, A.H.: Wavelet correlogram: A new approach for image indexing and retrieval. Pattern Recognition 38, 2506–2518 (2005) 8. Do, M.N., Vetterli, M.: Wavelet-based texture retrieval using generalized Gaussian demsity and Kullback-Leibler divergence. IEEE Trans. Image Processing 11, 146– 158 (2002) 9. Corel Stock Photography 1 (1993)