A Method for Enhancing Image Retrieval based on ... - Semantic Scholar

2 downloads 0 Views 567KB Size Report
disambiguation of image titles (such as bat, mouse, jaguar, etc). The results show that by ... „jaguar‟ with a car image of a „jaguar‟ even if the user wants an ...
Recent Researches in Artificial Intelligence and Database Management

A Method for Enhancing Image Retrieval based on Annotation using Modified WUP Similarity in WordNet DONGJIN CHOI 1, JUNGIN KIM 1, HAYOUNG KIM 1, MYUNGGWON HWANG 2, PANKOO KIM 1 1

Department of Computer Engineering, Chosun University, Gwangju, REPUBLIC OF KOREA 2 Korea Institute of Science and Technology Information (KISTI), Daejon, REPUBLIC OF KOREA [email protected], [email protected], [email protected], [email protected], [email protected] Abstract: - Images are the most common contents on the Internet for a long time. Lots of researchers have been studied to satisfy user demands for semantic visual recognition using low-level feature (such as color or texture) or keywords which were textual annotations but still challenging. Keywords in images give great evidence to identify what images are. Keywords are not always related with image its own. It is necessary to remove those irrelevant keywords and give higher values to relevant keywords using statistical models and knowledge base such as WordNet. For this reason, we propose a modified WUP similarity measurement in WordNet to decide which keywords are close to image. To identify irrelevant keywords, we use various semantic similarity measures between keywords and image titles. We focus on solving word sense disambiguation of image titles (such as bat, mouse, jaguar, etc). The results show that by augmenting knowledge-based with proposed method we can remove irrelevant images and take a further step to solve the WSD problem.

Key-Words: - Image Retrieval, Image Annotation, WordNet, WUP Similarity, Semantic Similarity, Word Sense Disambiguation understanding semantic contents that are not directly visible [2]. To overcome this limitation, annotated textual descriptions (i.e, a keyword, or a simple sentence) which were described by human are the most popular data to provide semantic relations between queries and annotated images [3], [4]. Textual descriptions which were annotated by human describe which kinds of objects are in the images. Hence, the keywords are not always related with queries. For example, if an image („bat.jpeg‟) indicates a scene that a baseball player tries to hit a ball, keywords might be like athletes, baseballs, players, equipment, bats, swinging, hitting, and so on. Can the existing image retrieval system distinguish this baseball play image from a chiropteran image if the titles of images are the same as „bat‟? The answer is no. Due to the fact that the existing system gives the result when queries and titles were matched each other [5], [6]. In other

1 Introduction Images are the most popular contents on the web over the long time. According to developments of digital cameras and digital devices, the amount of image data has been increased rapidly. The traditional method to retrieve image data from the web was the text matching between given queries and titles of images. This method does not consider semantic meaning of the image itself. For example, traditional method can answer a query request for „jaguar‟ with a car image of a „jaguar‟ even if the user wants an animal image. In order to find semantics from images, many researchers have been used low-level features such as color, texture, and shape for content-based image retrieval [1]. However, it is still challenging to find relations between low-level features and high-level features (given queries). Semantic interpretation of image is not capable without some mechanism for 

Corresponding author. Tel.: +82) 62 230 7636 E-mail address: [email protected]

ISBN: 978-1-61804-068-8

83

Recent Researches in Artificial Intelligence and Database Management

words, the system is not able to find semantic relations between queries and titles. The most popular way to extract semantic similarities is a method using Knowledge Base (KB) such as WordNet developed by the Cognitive Science Laboratory of Princeton University and defines approximately 81,000 noun concepts [7]. KB is an essential element for semantic information processing to find semantic relations among words. For this reason, WordNet1 has been applied to many other fields for finding similarity between words. Hwang has been studied to extract semantic similarities from titles of Wikipedia Documents and context information in abstract of documents [8]. His researches show that WordNet has great potential to be a fundamental data for the semantic document retrieval system. Moreover, Cho conducted different similarity measurement using annotated keywords based on WordNet networks for enhancing the accuracy of semantic image retrieval system [2]. Although, this research improved the accuracy, it still has a limitation that this method cannot consider words that have more than two kinds of sense (i.e, bat, mouse, leopard, jaguar, and so on). In order to overcome this limitation, we apply modified WUP similarity measurement in WordNet for enhancing image retrieval system using annotated keywords. The reminder of this paper is organized as follows: Section 2 describes the related works of this research. Section 3 explains a proposed modified method to enhance Image Retrieval system using WUP similarity with examples. Section 4 presents conclusion and a comment on future works.

common way before semantic similarities was announced. Semantic similarities model is a strong method to reveal hidden semantics among words or senses [9], [10]. However, the current accuracy of retrieval system is quite low due to the existence of too many noisy words. Also, this system did not cover WSD (Word Sense Disambiguation). Hence, it is not easy to compute meaningful understanding of images.

Image DB

Image DB

Image DB

Query: Animal

Previous Research

Query: Bat

Query: Mouse

Current Research

Fig. 1. Differences between previous and current works. As figure 1 shows that previous research only gave result to user when input query was “Animal”. If user input is more specified for example “Bat or Mouse”, the system cannot find precise image files corresponding to given queries. In order to overcome this limitation, we apply modified WUP measurement to avoid WSD problem. WUP measurement will be explained in following section 2.2.

2 Related Works This paper deals with a modified WUP method to solve Word Sense Disambiguation of queries which might have different meanings. In this section, we describe a problem that traditional Image Retrieval Systems have and explain WordNet which is the most common fundamental Knowledge base over the world. Also, we describe what WUP similarity is and why we use it.

2.2 The WUP Similarity Measurement The WUP similarity method (Wu and Palmer, 1994) computes the similarity of the nodes as a function of the path length from the least common subsumer (LCS) of the two given concepts C 1 and C 2 , which is the most specific concept that they share as an ancestor. This value is scaled by the sum of the path lengths from the individual concepts to the root. For example, if C 1 was „bat.n.01‟ 2 and C 2 was „cat.n.01‟3 then the LCS would be „placental.n.01‟4.

2.1 Traditional Image Retrieval Systems Traditional Image Retrieval Systems were based on a text matching method and a statistical model by analyzing statistical relations between images and keywords. The text matching method was the most 1

2

bat: nocturnal mouselike mammal with forelimbs modified to form membranous wings and anatomical adaptations for echolocation by which they navigate 3 cat, true cat: feline mammal usually having thick soft fur and no ability to roar

http://wordnet.princeton.edu

ISBN: 978-1-61804-068-8

84

Recent Researches in Artificial Intelligence and Database Management

The WUP similarity between nodes C 1 and C 2 is calculated by following formula 1.

simwup 

2  depth(LCS (C 1 , C 2 )) depth(C 1 )  depth(C 2 )

still difficult to machine. This is the major problem when machine tried to understand human language. Because of this, current image retrieval systems likely to make mistakes frequently. This paper introduces modified WUP method to overcome this problem.

(1)

where depth(C) is the depth of concept C in the WordNet hierarchy. The value of this method goes to high when two concepts shared an ancestor with long depth. The following figure 2 shows the semantic relations between sense „bat.n.01‟ and sense „cat.n.01‟ defined in WordNet. ...

3 A Modified Method to Enhance Image Retrieval Systems Images with annotations give valuable information to machine to understand precise meaning of images. However, annotations are described by human so they are not always closely related with a title of image. For example, let us assume that there are two images titled „bat‟ with annotations „animal‟, „halloween‟, „special_occations‟, „silhouette‟, „flying‟ „wings‟, and „mammals‟. The other image is annotated with „equipment‟, „gloves‟, „baseballs‟, „grasses‟, „leisure‟, „receation‟, and „sports‟. The annotations describe objects in images or place where the image was taken. For this reason, it is possible to distinguish differences between images even though titles are the same. If user wants to find animal bat, computer will compute similarities between given query and annotations. Following figure 3 shows proposed system process to retrieve relevant images by given queries.

X → Y: ‘Y’ is a ‘X’ relation

Animal.n.01 Chordate.n.01 Vertebrate.n.01 Mammal.n.01

LCS depth is 11

Placental.n.01 Bat.n.01

Carnivore.n.01 Feline.n.01

Big_cat.n.01

Canine.n.02 Cat.n.01

Fig. 2. The concept hierarchy in WordNet. The similarity between „Bat.n.01‟ and „Cat.n.01‟ is 0.8461 which means two concepts close enough to be considered as a relevant concept each other. In order to find lexical similarities or semantic similarities between concepts, many researchers have been used WordNet hierarch for extracting relations. Hwang [11] introduced new similarity measurement for analyzing web documents based on WordNet sense network to make machine understand human written language. Fern [12] tested a semantic similarity using different kinds of measurement and compared accuracy, precision and recall rate of them. It shows that there is no best way to find similarities for machine like human does. For example, human can easily distinguish differences between „bat.n.01‟ and „bat.n.02‟ 5 but computer cannot. Also human can understand „jaguar.n.01‟6 as „vehicle‟ but computer may misunderstand „jaguar.n.01‟ as „big cat‟. Human natural language is

Fig. 3. The proposed system structure So it is important to choose which kinds of method will be applied to measure similarities. The current image retrieval systems give a result to user using only keywords text matching. So, baseball bat will be resulted in query „bat‟ even if the user only wants to get animal bat images. In order to avoid this problem, we have to compare similarities among annotation with given queries to satisfy user demands using following formula.

4

placental: mammals having a placenta; all mammals except monotremes and marsupials 5 bat (baseball): a turn trying to get a hit; "he was at bat when it happened" 6 jaguar, panther, Panthera onca, Felis onca: a large spotted feline of tropical America similar to the leopard

ISBN: 978-1-61804-068-8

85

Recent Researches in Artificial Intelligence and Database Management

sim m_wup 

2  (depth(LCS (C 1 , C 2 )))2 depth(C 1 )  depth(C 2 )

the standard deviation was increased to 3.6475. This result shows that the image has irrelevant keywords to given queries will be ignored in proposed image retrieval system.

(2)

When the simwup value is higher than 0.5, we multiplied depth(LCS) again to the numerator as described in formula 2. When the WUP value goes higher than 0.5, it means that given two concepts are shared half of all concept hierarches. In other word, we need to give higher value when one of annotations was related to given query. Therefore, computer can distinguish which annotations are more important.

4 Experiment and Evaluation This paper has described a method to enhance image retrieval system using annotations based on the WUP measurement. In order to test proposed method, we made simple program to compare results of the WUP and modified WUP methods. The following graph shows results of query „mouse‟. The x axe shows the image files (title is „mouse‟) and y axe indicates similarity values. The images from 1 to 5 were computer mouse images and from 6 to 10 were animal mouse images. As we can see in this simple graph, when we used only the WUP measurement, the values were similar each other. So, it was difficult to distinguish differences between each other. However, we can find difference easily when we applied modified WUP values.

Table 1. The similarities among bat.n.01 and annotations index

Annotations

1 2 4 5 6 7 8

Animals Halloween special_occations silhouette Wings mammals Flying Sum Ave

depth of LCS 6 0 0 4 10 9 0

WUP

m_WUP

0.7368 0 0 0.3809 0.5217 0.909 0 2.5484 0.6371

4.4208 0 0 0.3809 5.217 8.181 0 18.1997 4.5499

Table 1 gives an example of similarities among sense bat.n.01 and annotations using WUP and m_WUP methods. As we can see in the table 1, „animal‟, „wings‟ and „mammals‟ got higher value than others. Since, we multiplied the depth(LCS) value again when the WUP value was bigger than 0.5, it is possible to emphasize concepts which were related to given queries. Figure 4. The results graph using mouse as a query.

Table 2. The similarities between bat.n.01 and annotations index

Annotations

1 2 4 5 6 7 8

equipment gloves baseballs grasses leisure receation sports Sum Ave

depth of LCS 6 9 9 9 6 6 6

WUP

m_WUP

0.421 0.3809 0.3636 0.5454 0.1052 0.1052 0.6315 2.5528 0.3646

0.421 0.3809 0.3636 0.4908 0.1052 0.1052 3.789 10.0735 1.4390

Even though the images have well distributed values, there are problems. First, there is no perfect threshold value to separate images into two groups. In the given example in figure 4, approximately 2.5 are adequate for the query „mouse‟. If the query had been changed, threshold value also needs to be changed. Second, the m_WUP value is too sensitive to annotations in images. So it would be better to use low-level features after retrieving relevant images based on the proposed method.

This is another example of annotations with baseball bat image but still title is the same as „bat‟. The average of m_WUP value of table 1 was 4.5499 when it was 1.4390 in table 2. When we only applied the WUP measurement, the standard deviation of example in table 1 was only 0.444. However, when we applied the m_WUP method,

ISBN: 978-1-61804-068-8

5 Conclusion This paper described a method to enhance image retrieval system using the modified WUP measurement. We pointed out the problem that traditional image retrieval system only focused on the low-level features, not the annotations. In order

86

Recent Researches in Artificial Intelligence and Database Management

[9] K. Barnard and D. Forsyth, “Learning the Semantics of Words and Pictures,” Proc. Int‟l Conf. Computer Vision, vol. 2, pp. 408-415, 2001. [10] D. Blei and M. Jordan, “Modeling Annotated Data,” Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 2003. [11] M. Hwang, D. Choi, J. Choi, H. Kim, and P. Koo, “Similarity Measure for Semantic Document Interconnections”, An International Interdisciplinary Journal, Vol. 13, No. 2, pp. 253-267, 2010. [12] S. Fern and M. Stevenson, “A Semantic Similarity Approach to Paraphrase Detection”, Computer and Information Science, 2008.

to overcome drawback which traditional system have, we emphasized concepts in annotations when the WUP value was bigger than the threshold value. Because of this, irrelevant concepts were took small parts to decide the final similarity values. As we can see in the results of this paper, it is better to use annotations using modified WUP method to find semantic relations between images and annotations. This research only took simple examples but we believe that proposed method will give a further step to remove irrelevant images and solve the WSD problems.

6 Acknowledgement This research was financially supported by the Ministry of Education, Science Technology (MEST) and National Research Foundation of Korea (NRF) through the Human Resource Training Project for Regional Innovation

References: [1] R. C. Veltkamp and M. Tanase, “Content-Based Image Retrieval Systems: A Survey”, Technical Report UU-CS-2000-34, 2000. [2] M. Cho, C. Choi, H. Kim, J. Shin and P. Kim, “Efficient Image Retrieval Using Conceptualization of Annotated Images”, Lecture Notes in Computer Science, Vol. 4577/2007, pp. 426-433, 2007. [3] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Pektovic, P. Yanker, C. Faloutsos, and G. Taubin, “The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,” Proc. SPIE Storage and Retrieval for Image and Video Databases, pp. 173-181, 1993. [4] G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos, “Supervised Learning of Semantic Classes for Image Annotation and Retrieval”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No.3, pp. 394-410, 2007. [5] J. De Bonet and P. Viola, “Structure Driven Image Database Retrieval,” Proc. Conf. Advances in Neural Information Processing Systems, vol. 10, 1997. [6] A. Jain and A. Vailaya, “Image Retrieval Using Color and Shape,”Pattern Recognition J., vol. 29, pp. 1233-1244, Aug. 1996. [7] T. Deselaers and V. Ferrari, “Visual and Semantic Similarity in ImageNet”, 2011 IEEE Conference on CVPR, pp. 1777-1784, 2011. [8] M. Hwang, C. Choi, and P. Koo, “Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 6, pp. 845-858, 2011.

ISBN: 978-1-61804-068-8

87