A Gradient Descent based Similarity Refinement Method ... - IEEE Xplore

0 downloads 0 Views 268KB Size Report
A Gradient Descent based Similarity Refinement. Method for CBIR Systems. Esmat Rashedi, Hossein Nezamabadi-pour. Department of Electrical Engineering.
6'th International Symposium on Telecommunications (IST'2012)

A Gradient Descent based Similarity Refinement Method for CBIR Systems Esmat Rashedi, Hossein Nezamabadi-pour

Saeid Saryazdi

Department of Electrical Engineering Shahid Bahonar university of Kerman Kerman, Iran [email protected]

Department of Electrical Engineering Shahid Bahonar university of Kerman Kerman, Iran [email protected]

Abstract—This paper provides a short term learning method in CBIR systems based on similarity refinement method. The weights of the similarity function are optimized using gradient decent method to improve the results of a retrieval session. In the proposed approach, the weights of feature’s components as well as the weights of each type of features are adjusted. A proper error function is introduced and minimized using gradient descent method. The results are examined in a public dataset with 20000 color images. The experimental results of 60 topic images and comparing with a state-of-the-art method confirm the effectiveness of the proposed method. Keywords—Content based image retrieva; Relevance feedback; Short term learning; Similarity function; Gradient descent optimization.

I.

INTRODUCTION

Content based image retrieval (CBIR) is a major challenge in the field of pattern recognition which is employed in diverse areas such as entertainment, art, fashion design, advertising, history, medicine and industry. CBIR systems use the visual contents of images, such as color, texture, and shape features to represent and index images [1-3]. Using these features, images are represented as vectors in the feature space. To retrieve relevant images, a nearest-neighbour method is used to find similar images to a query image in terms of the visual features. The features that are extracted from images are low level features and cannot completely model the semantic desire of the user. Therefore, there is a semantic gap between the low level features and the high level features that represent the semantic of the image [3,4]. The CBIR systems attempt to bridge the semantic gap with learning-enhanced relevance feedback (RF) methods that short term learning, STL, is one of them. STL is an intra-query learning method that tries to learn the relevant images of each query with interaction between user and system during some relevance feedback rounds. By STL, CBIR system learns the images that are relevant to the user’s query. Different kinds of strategies for STL methods are classification based methods, similarity refinement methods, query refinement methods, and multi-query methods. In

978-1-4673-2073-3/12/$31.00 ©2012 IEEE

classification based methods, a classifier is designed to separate relevant and irrelevant images [5], and in the similarity refinement methods, the similarity function is improved often by feature reweighting [6,7]. In query refinement methods, the query vector is improved in such a way to keep close to the relevant images and keep away from the irrelevant images [8,9], and in multi-query methods, some further query vectors are introduced often by clustering [10,11]. In this paper, a similarity refinement STL method based on gradient descent is proposed. In the proposed method, a proper error function is defined and the weights of the similarity function are learned by gradient descent method to minimize error function. This paper is organized as follows. Related works are reported in section 2 and the proposed method is explained in section 3. Results are given in section 4 and finally the paper is concluded in section 5. II.

RELATE WORKS

In each session of retrieval, user interacts with system and system learns his/her desired semantic by STL. A comprehensive survey on the recently developed methods and tools in CBIR were reported in [12, 13], and some RF approaches for CBIR were reviewed in [14]. In the similarity refinement, the weights of the similarity function are adjusted during feedback rounds. Some of the methods in this category compute the weights of each feature component as the inverse of the standard deviation of feature's values on relevant examples [4]. Several methods in this category define a suitable error function and learn the weights of similarity function by a gradient descent algorithm to minimize the error function [15]. The use of gradient descent method to improve classification results was reported in [16]. In [16], the classification margin was defined as the difference between a weight assigned to the correct label and the maximal weight assigned to any single incorrect label. Thus, the Leaving-One-Out Nearest Neighbour (LOO NN), error estimation is defined by Eq. (1). To learn weights and minimize this function, the gradient descent optimization is used. In [16], Eq. (2) is used to calculate the distance between two feature vectors.

1161

where w j is a weight associated with the j th component of 2

d (Q,F | W) =

vectors Q and F . f j is the j th component of feature vector

F with size m and T is the train set. F + and F − are the same-class and different-class NNs of Q . step is the step function. Based on the idea of [16], the authors in [15] use the method of minimizing the Leaving-One-Out classification error in image retrieval to learn weighted distances. In this work, a weighted version of the L1 distance was used (Eq. (3)) and to learn the weights w for the distance function, the feedback images was considered as training images for the nearest neighbour method. To improve the performance of the CBIR system, weights were learned such that the distances among the positively marked images were minimized whereas the distances between relevant images and irrelevant images were maximized. In total, the term of Eq. (4) was minimized using gradient descent method with respect to the w in the distance function d . Q + is the set of relevant images and Q − is the set of irrelevant images.

J=



Q∈T

step(

d (Q, F + | W ) d (Q, F − | W )

)

w12,0



q1,j 2 ∑ w1, j ⎜⎜ j =1 q1,j L1



⎛ q K,j − f K,j + wK2 ,0 ∑ wK2 , j ⎜ ⎜q + f j =1 K,j ⎝ K,j LK

⎞ ⎟ ⎟ ⎠

2

− f1,j ⎞⎟ + ... + f1,j ⎟⎠

2

(5)

where F and Q are the feature vectors of two images; d is the similarity measure between two images; f k , j is the j th component of the k th type of feature, that is color, edge or texture, extracted from image F ; Lk represent the length of vector of k th type of feature. wk ,0 is the weight of the k th type of feature and wk , j / j ≠0 is the weight of the j th component of the k th type of feature. K is the total number of feature types.

(1) Figure 1. CBIR system with similarity refinement STL method.

m

(

d 2(Q,F | W) = ∑ w 2j q j − f j j

)

2

m

d (Q,F | W) = ∑ w j | q j − f j | j

J= ∑



∑ (

Q∈Q + F + ∈Q + F − ∈Q −

III.

(2)

(3)

+

d (Q, F | W ) ) d (Q, F − | W )

(4)

THE PROPOSED STL METHOD

A typical CBIR system with similarity refinement STL method is depicted in Fig. 1. As it can be seen from the figure, in each query session, query image is processed and the similar images are found by the similarity function. Using a similarity criterion, feature vector of query image is compared with the vectors in the feature database and the most similar images to the query image are returned to the user. The weights of the similarity function are initialized to the predefined values and are adjusted by STL method. In this article, similarity function between images is defined as follows.

In the proposed method, the weights of feature components ( wk , j , j ≠ 0 ) and feature types ( wk ,0 ) are optimized using gradient descent method. Error function is defined as Eq. (6). In this equation, F + is the farthest relevant retrieved image to the query and F − is the closest irrelevant retrieved image to the query. In each round, query is updated by averaging the feature vectors of all relevant images using Eq. (7). Error function of Eq. (6) is related with the theory of maximal margin [16]. By minimizing this error function, the weights of the similarity function are optimized to increase the distance between query and most similar irrelevant retrieved image and decrease the distance between query and less similar relevant retrieved image. Comparing with the error function given by Eq. (4), using the proposed error function, the computational complexity is decreased by considering only a pair of labelled (by user through RF) images.

J R− N =

d (Q, F + | W ) d (Q, F − | W )

(6)

The minimization of J R− N by gradient descent is an iterative procedure which, at each step t, the weights w j are moved towards negative direction of the gradient of J R− N . At the beginning of each session, the weight values are initialized as in Eqs. (8) and (9). After each feedback round, weights are updated using Eq. (10) to get better results in next

1162

rounds. The details about calculation of ∂J R− N / ∂wk , j are

wk ,0 =

given in Eqs. (11) to (14). After each step, weights are normalized using Eqs. (15) and (16). The values of μ j are The results of the proposed method are in the next section.

1 ' ∑Q + ' + | Q | Q ∈Q

wk , j (0) =

IV.

(7)

1 , k = 1,.., K K

(8)

1 , k = 1,.., K , j = 1,.., Lk Lk

(9)

wk ,0 (0) =

wk , j = wk , j − μ j

∂J , k = 1,.., K , j = 0,.., Lk ∂wk , j

wk , j = wk , j − μ j × wk , j × r × [ Rk+, j − Rk−, j ]

r=

d (Q, F + | W ) d (Q, F − | W )

∑ wi ,0

(10)

(11)

, k = 1,.., K

(16)

i =1

learning rates.

Q=

wk ,0 K

RESULTS

The proposed method is implemented on a general database containing 20000 images (CLEF 2007 photo image database [17]). All images are colored and in JPEG format. Three types of MPEG7 visual features [18] are used for image indexing ( K = 3 ). The color features are taken from 256 bins (16×4×4) HSV-based scaled color descriptor (SCD). Edge features are taken from the Edge Histogram Descriptor (EHD) that is the local edge distribution in the image. After subdivision of image into 4 × 4 sub images, four directional edges called vertical, horizontal, 45 degree, 135 degree, and a non-directional edge are computed in each sub image. By histogram computation, a total of 16 ×5 = 80 bins are gained. Texture features calculated by Homogeneous Texture Descriptor (HTD) with 62 components that are means and variances of filtered images by 30 Gabor filters with six different orientations and five different scales. By 256 features of color ( L1 = 256 ), 80 features of edge ( L2 = 80 ), and 62 features of texture ( L3 = 62 ), the total number of features becomes 398 ( m = 398 ). To evaluate the performance of the proposed method the work reported by Desalaers et al. in [15] is implemented. To implement the Deselaers STL method [15], the Eq. (17) is used. Our experiments show that Chi2-like distance is work better than L1.

(12) m

qj − f j

j

qj + fj

d (Q,F | W) = ∑ w j |

|

(17)

2

⎛ qk , j − f k~, j ⎞ ⎟ ⎜ qk , j + f k~, j ⎟ ⎝ ⎠ , ~∈ {+,−} , k = 1,.., K , (13) 2 ~ d (Q, F | W )

wk2,0 ⎜ Rk~, j =

Rk~,0 =



2

− f k,j~ ⎞⎟ + f k,j~ ⎟⎠ ⎝ , ~∈ {+,−} , k = 1,.., K (14) d 2 (Q, F ~ | W )

qk,j 2 ∑ wk , j ⎜ ⎜q j =1 k,j

wk , j =

wk , j Lk

∑ wk ,i

i =1

, k = 1,.., K , j = 1,.., Lk

and in every round of relevance feedback, the gradient descent is learned for 10 steps. To test the proposed STL method, 60 topic images from 60 different semantic groups [17] were examined. Some of CLEF topic images are shown in Fig. 2. Each session is consisted of four rounds of relevance feedback and 25 images were retrieved per round.

j = 1,.., Lk

Lk

In the proposed method, μ 0 = 0.0001 , μ j / j ≠0 = 0.001

(15)

The most common measure used in CBIR is precision that is defined by division of the number of relevant retrieved images to the total number of the retrieved images [19,20]. In a fixed number of images, precision versus number of iteration (the number of relevance feedback rounds) graph is the common way for comparison of STL approaches in CBIR systems. This graph is provided for the proposed method and Deselaers method in Fig. 3. In both methods, the gradient descent based similarity refinement method was used and top 25 most similar images to the query were retrieved. The graph of Fig. 3 shows that the

1163

proposed method improves the retrieval accuracy compared with the same CBIR system with the method of reweighting similarity function using Deselaers method. Furthermore, the J R− N method accelerates the retrieval process. In our experiments, the executive time for a whole automatic session was 13.76 and 6.50 second for the Deslaers method and proposed J R − N method respectively.

Figure 2. Some topic images in the CLEF database. Figure 3.

Figure 4. Precision graph of the proposed STL method compared with the Deselaers reweighting method.

V.

CONCLUSION

In this paper, an STL method based on similarity refinement is proposed. The proper error function is defined and minimized with respect to weights using gradient descent method. Comparative results in a 20K images database confirm the effectiveness of the proposed method.

ACKNOWLEDGMENT The project is funded by Iran research institute for ICT under contract number t-500-19245. REFERENCES [1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 1349–1379, 2000. [2] S. Antani, R. Kasturi, and R. Jain, “A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video,” Pattern Recognition, vol. 35, pp. 945–965, 2002. [3] Y. Liu, D. Zhang, G. Lu, and W.Y. Ma, “A survey of content-based image retrieval with high-level semantics”, Pattern Recognition, vol. 40, pp. 262-282, 2007. [4] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra. “Relevance feedback: A power tool for interactive content-based image retrieval”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8(5), pp. 644–655, 1998. [5] S. Barrett, “Content-based image retrieval: a short term and long-term learning approach,” 2007, http://digital.cs.usu.edu/~xqi/ Teaching/ REU07/ Website/ Samuel/ SamFinalPaper.pdf. [6] P.C. Cheng, B.C. Chien, H.R. Ke, and W.P.Yang, “A two-level relevance feedback mechanism for image retrieval,” Expert Systems with Applications, vol. 34, pp. 2193–2200, 2008. [7] A. Shamsi, H. nezamabadi-pour, and S. Saryazdi, “A new method in relevance feedback in content based image retrieval system”, Proceeding of CSICC2010, Tehran, Iran, (in Persian). [8] K. Porkaew, K. Chakrabarti, and S. Mehrotra, “Query refinement for multimedia similarity retrieval in MARS,” Proceedings of the ACM International, Multimedia Conference, pp. 235–238, 1999. [9] J.J. Rocchio, “Relevance feedback in information retrieval,” In: Salton, G. (Ed.), The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, pp. 313–323, 1971. [10] S. Salvador, and P. Chan, “Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms,” Technical Report CS-2003-18, Florida Institute of Technology, 2003. [11] D.H. Kim, C.W. Chung, and K. Barnard, “Relevance feedback using adaptive clustering for image similarity retrieval,” The Journal of Systems and Software, vol. 78, pp. 9–23, 2005. [12] M.S. Lew, N. Sebe, C. Djeraba, R. Jain, “Content-based multimedia information retrieval: state of the art and challenges”, ACM Trans. Multimedia Comput. Commun. Appl. vol. 2 (1), pp. 1-19, 2006. [13] X.S. Zhou, T.S. Huang, “Relevance feedback in image retrieval: a comprehensive review”, Multimedia systems, vol. 8 (6), pp. 536-544, 2003. [14] R. Datta, D. Joshi, J. Li, J. Wang, “Image retrieval: Ideas, influences, and trends of the new age”, ACM Computing Surveys (CSUR), vol. 40 (2), pp. 1-60, 2008. [15] T. Deselaers, R. Paredes, E. Vidal, and H. Ney, “Learning Weighted Distances for Relevance Feedback in Image Retrieval”, 19th International Conference on Pattern Recognition (ICPR), 2008. [16] R. Paredes and E. Vidal, “Learning weighted metrics to minimize nearest neighbor classification error”, PAMI, vol. 28(7), pp. 1100-1110, 2006. [17] P. Clough, M. Grubinger, A. Hanbury, and H. M¨uller, “Overview of the imageclef 2007 photographic retrieval task”, In CLEF 2007 Workshop, LNCS, in press, Budapest, Hungary, 2008. [18] S.F. Chang, T. Sikora, A. Puri, “Overview of the MPEG-7 Standard”, IEEE Transactions on circuits and systems for video technology, vol. 11 (6), pp. 688-695, 2001. [19] H. Nezamabadi-pour and E.Kabir, “Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval”, Expert System with Application, vol. 36, Issues 3, part 2, pp. 5948-5954, 2009. [20] H. Nezamabadi-pour, and E. Kabir, "Image retrieval using histograms of uni-color and bi-color blocks and directional changes in intensity gradient", Pattern Recognition Letters, vol. 25, pp. 1547–1557, 2004.

1164