An Interactive Paper and Digital Pen Interface for ... - Semantic Scholar

1 downloads 0 Views 666KB Size Report
We present the implemen- tation of the interactive paper/digital pen interface on top of QbS, our .... low-level visual features and, optionally, textual features [15].
An Interactive Paper and Digital Pen Interface for Query-by-Sketch Image Retrieval Roman Kreuzer, Michael Springmann, Ihab Al Kabary, and Heiko Schuldt Databases and Information Systems Group, Department of Mathematics and Computer Science, University of Basel, Switzerland

Abstract. A major challenge when dealing with large collections of digital images is to find relevant objects, especially when no metadata on the objects is available. Content-based image retrieval (CBIR) addresses this problem but usually lacks query images that are good enough to express the user’s information need. Therefore, in Query-by-Sketch, CBIR has been considered with user provided sketches as query objects – but so far, this has suffered from the limitations of existing user interfaces. In this paper, we present a novel user interface for query by sketch that exploits emergent interactive paper and digital pen technology. Users can draw sketches on paper in a user-friendly way. Search can be started interactively from the paper front-end, due to a streaming interface from the digital pen to the underlying CBIR system. We present the implementation of the interactive paper/digital pen interface on top of QbS, our system for CBIR using sketches, and we present in detail the evaluation of the system on the basis of the MIRFLICKR-25000 image collection.

1

Introduction

The rapidly increasing production of digital images in the last few years had a very strong impact on the role images play in science, business, and in our everyday’s private life. At the same time, managing large digital image collections and in particular providing sophisticated search and retrieval facilities that take into account the images’ content have become more and more challenging. Due to the sheer size of collections, manually adding tags and other descriptive data is no longer a feasible approach. In addition, for many retrieval tasks, the metadata automatically provided by digital cameras are not sufficient either. Content-based image retrieval (CBIR) has become a very mature technology for exploiting low-level image features for search. Yet, CBIR usually suffers from the lack of query images that are good enough to express the users’ information needs. In order to cope with that, several approaches to CBIR with user-drawn sketches, known as Query-by-Sketch, have been proposed [4,6,7]. However, existing approaches to Query-by-Sketch have mainly suffered from two challenges. First, users do usually not sketch images entirely, but will focus on the most interesting parts with regard to their search task – this is the case even in Known Item Search, i.e., when the user has previously seen the image she is R. Baeza-Yates et al. (Eds.): ECIR 2012, LNCS 7224, pp. 317–328, 2012. c Springer-Verlag Berlin Heidelberg 2012 

318

R. Kreuzer et al.

looking for and knows that it is included in a collection. Furthermore, the sketch might not exactly match the image a user is looking for but may be subject to translations, different scaling, rotations, etc. Second, previous approaches have exploited mice, graphic tablets, or tablet PCs as user interfaces which have turned to provide limited user-friendliness and expressiveness as they are difficult to use, especially for non-experts. One of the most significant handicaps of image retrieval based on Query-by-Sketch has been identified as using the mouse as input device instead of pen and paper [16]. The first challenge —making Queryby-Sketch invariant to changes in translation, scale, rotation, and degree of detail in the background of the image— has been addressed in our previous work [15]. The QbS system provides these invariances and has been subject to detailed user studies based on the MIRFLICKR-25000 image collection. In this paper, we address the second challenge and provide a user-friendly interface for Query-by-Sketch using interactive paper and digital pen technology that also seamlessly integrates functionality for the above mentioned invariances. Drawing pictures with the mouse is a highly inexact method, and even using graphic tablets involves constantly looking away from the hand. Another method, to alleviate such problems, combines the functionalities of the graphic tablets with those of the monitor (e.g., iPad). However, these interfaces have been tailored for rather simple gestures, either provided by a fingertip or with a coarse stylus and thus not meet the precision necessary for providing exact sketches. Therefore, the intuitively most promising approach is to just use pen and paper [11]. We propose an approach that relies completely on well adopted devices, thus will assist users much better in expressing their information need. The main challenges are the design of the user interface and the human-computer interaction via the paper interface, and the streaming transfer of queries (vectors representing the pen strokes on paper) via Bluetooth into an image retrieval application which performs the CBIR task. The results of the search can then be displayed inside a traditional GUI on a monitor and modifications of the query to refine and reissue searches can be performed on either the computer or the paper interface, whichever is more convenient for the user. The paper is organized as follows: Section 2 discusses related work. Section 3 provides a general introduction to the QbS approach to Query-by-Sketch. Section 4 introduces interactive paper and digital pen technology and shows how this has been linked to QbS. In Section 5, we present the results of user studies that have been performed with the paper interfaces on the basis of the MIRFLICKR25000 image collection. Section 6 concludes.

2

Related Work

Searching for information can be categorized in two fundamentally different classes. First, searching for known items, in which a user knows that the item(s) exist and where the search task ends successfully only if the user has found this/these item(s). Second, when searching for novel items, a user is looking for unknown item(s) that satisfy her particular information need. The search task

An Interactive Paper and Digital Pen Interface

319

ends successfully as soon as the user is provided with some items that reflect the information she is looking for. Content-based image retrieval (CBIR) supports both known image search and novel image search. However, in both cases CBIR requires a query image to start with that is similar to the final result. Without such a query image, it is difficult to achieve good retrieval quality. Query-bySketch addresses this problem and uses user drawn sketches as query images. The QVE system [7] uses a reduced resolution of the image and edge detection to generate so-called abstract images in a size of 64×64 pixels. These are compared to a rough sketch provided by the user by aligning blocks of 8×8 pixels within a range to provide limited invariance to translation. The score for ranking the results is determined by computing the overall correlation between all blocks. The QBIC system [6] extracts several features from the images in the database including a global 256-bin color histogram, 20 shape features of manually or semi-automatically identified objects like area, circularity, eccentricity, major axis orientation and algebraic moment invariants, as well as global texture like coarseness, contrast, and directionality. The user can either specify (even partial) color distribution or draw a binary silhouette image of the shape using a polygon drawing routine. Currently, the MPEG-7 Visual Standard [13] defines a large set of descriptors that can be used in image retrieval, out of which the Edge Histogram Descriptor (EHD) is frequently used for sketch-based retrieval. It represents the distribution of 5 types of edges partitioned into 4×4 non-overlapping blocks. Angular Radial Partitioning (ARP) [2] also uses spatial distribution of edges in which both sketch and image are partitioned into subregions according to an angular-radial segmentation. Therefore, ARP already provides some degree of scale and rotation invariance. In addition, by applying the Fast Fourier transformation, a higher degree of rotation invariance is achievable. ARP has shown to outperform EHD [2]. The image distortion model (IDM) [9] which has been used for handwritten character recognition and in medical automatic annotation tasks is another descriptor that has proven to work well as a distance function. It evaluates displacements of individual pixels between images within a so called warp-range and also takes the surrounding pixels (local context) into account. It is able to measure much finer deviations than ARP. However, it is computationally more expensive when compared to ARP and thus leads to higher retrieval times. So far, the main challenge Query by Sketch still continues to face is the unavailability of appropriate input devices for mass usage as mouse and keyboard are still the most common input method of choice. [15] showed the successful use of Tablet PCs for finding known items. The PhotoSketch system [5] and Sketch2Photo [3] use a similar setup with different features for retrieving images for the purpose of creating image montages. iPaper/iServer [12] is a content publishing framework that focuses on facilitating the authoring of links between printed documents and digital documents and services. However, this is the first approach that seamlessly integrates pen and paper into the Query-by-Sketch digital image retrieval process.

320

R. Kreuzer et al.

(a) Tablet PC

(b) Paper

(c) Digital Pen DP-201

Fig. 1. QbS system with TabletPC and interactive paper & digital pen user interfaces

The QbS System

3

QbS is an interactive application used to retrieve known images based on both low-level visual features and, optionally, textual features [15]. A first prototype runs on a Tablet PC, thus allowing the user to draw edges with a stylus directly on the screen as shown in Figure 1(a). The QbS system provides a fast query mechanism to retrieve some of the top ranked results to give the user a fast feedback about the retrieval result that can be expected from the final or full result. Then, the user is able to add further details to refine the search and remove misleading parts of the sketch without having to redo the complete sketch. 3.1

Visual Features

QbS focuses on edge information and does not rely on semi-automatic segmentation or annotations of objects in the image database, as the latter usually require significant work at the time of inserting images to the collections which users are frequently not willing to invest. A variant of the Canny edge detector is applied to generate edge maps [2] from the images. The generation of edge maps is controlled by the threshold value β as illustrated in Figure 2, where low values preserve many edges while high values retain only very few and prominent edges. To better compensate for effects due to level of detail and edge detection, QbS does not define a single value of for the entire collection, but rather extract the edge maps from the images at several values between 2 and 50.

(a) original image

(b) β = 2

(c) β = 20

(d) β = 30

Fig. 2. Example image (a), edge maps for various values of β (b)–(d)

An Interactive Paper and Digital Pen Interface

321

QbS incorporates three different sets of features and corresponding distance measures. First, Angular Radial Partitioning (ARP) is used as a compact and fast way to retrieve images when rough sketches and spatial layout are sufficient to separate the desired known item from all other images in the collection. Moreover, robustness against many invariances is provided to compensate for the deviations of the sketch w.r.t. to the known item. Second, an adapted version of the Image Distortion Model (IDM) on the same edge maps is used as a more complex, computationally more expensive solution whenever the user needs a more thorough comparison between the sketch and the images in the collection. For IDM, the user has to provide a sketch that is detailed and located close enough to expect meaningful results. In order to reduce the required time, ARP can be used as a filter to select candidates and IDM can be used only to re-rank these candidates in order to get the final results. Finally, QbS supports the Edge Histogram Descriptor (EHD), which is a compact descriptor focussing on the the local and global distribution of edges throughout an edge map or sketch. Similar to ARP, it can be used in QbS when a quick search is required and the user has generated a rough sketch. Search is then performed by applying a k-nearest neighbor sequential scan over the file(s) containing the features and results are displayed in order according to the similarity scores obtained. It is worth mentioning that the search can be restricted to images that have been tagged with particular keywords. This also radically reduces the time needed for similarity computations and therefore improves the search time for ARP, EHD as well as for IDM. QbS uses Lucene1 to build a full-text index. A more detailed explanation of the system and an evaluation using a TabletPC as input device can be found in [14]. For the remainder of this paper, we will focus on the visual features as they are the most intuitive to use with a digital pen and textual features can be added easily by Intelligent Character Recognition (ICR) as done in [1] or by simply using the text box inside the GUI that is running on some computer anyway to display the search results. 3.2

Support for Invariances

Usually, users differ in their drawing capabilities when providing an input sketch whose sole purpose is to find some known item(s). Thus, QbS does not expect a perfect sketch as input. Consequently, QbS has to properly deal with the following invariances. Translation Invariance. Both ARP and EHD natively support a slight degree of invariance to translation, as long as edge pixels are not moved across partitions. However, the finer the granularity is (number of angular and radial partitions used), the less tolerant the approach is to translations. Therefore, we have extended ARP to consider subregions of all database images and to compare a sketch with all subregions of an image in their original positions, and with small displacements. Finally, for IDM translation invariance is naturally supported by the choice of warp range and local context. 1

http://lucene.apache.org

322

R. Kreuzer et al.

Rotation Invariance. If a user expects even more rotation than already provided in ARP, invariance to rotation is provided by applying the 1D FFT on the features. Within IDM, image deformations are allowed within the warp range for individual pixels and the area around them (local context). Scale Invariance. Even though users might not always be able to estimate exact proportions when providing a sketch, major differences in scale usually do not occur in known image search. For supporting slight deviations in scale, we exploit the same heuristics also applied for translation invariance: a sketch is not only compared to the original images but also to subimages; moreover, the comparison is restricted to the regions that include edge pixels. Background Invariance. A user will usually draw only the parts of the image she considers essential and will most often not remember details in the background. If, however, the user knows exactly how the image should look like, then the background information should be used as given in the sketch. Thus, the interpretation of all white space on the sketch is left as a runtime decision to the user (via a toggle in the user interface). With ARP, EHD, and IDM, background invariance can be achieved by ignoring non-edge pixels in the sketch.

Digital Pen Interface to QbS

4

In order to make QbS more user friendly, we have added an interactive paper and digital pen interface. This is based on commercial pen and paper technology developed by Anoto2 . In short, strokes written on paper are sent via the bluetooth interface of the pen to a computer and integrated as sketch into QbS. 4.1

Digital Pen and Interactive Paper

Digital pens are designed for drawing on normal paper on which a proprietary and irregular dot pattern is printed. The pattern consists of very small dots arranged on a grid with a spacing of approximately 0.3 mm. Each dot can be placed on the pattern in four different ways: above, below, left or right of the center defined by the grid lines (as visible behind the letter A in Figure 1(c)). As soon as a user draws on paper, the pen which is equipped with an infrared LED camera can localize the position on paper by reading a 6×6 dot area on paper, corresponding to an area of 1.8×1.8 mm in size. By reading 6 × 6 dots, in total 46×6 = 272 unique combinations can be supported. Therefore, the uniqueness of the pattern is ensured on 60 million km2 (this exceeds the total area of both Europe and Asia). As the pen moves along the pattern, a camera and an infrared LED take digital snapshots of the local patterns at a rate of 100 fps. In addition, the pen has a pressure sensitive tip and a pen-down-message starts the transmission of the pen data. The pens store the pattern information in the form of pen stroke data, which are continuous curves made up of coordinates. The image processor calculates the exact position in the entire Anoto 2

http://www.anoto.com

An Interactive Paper and Digital Pen Interface

323

Fig. 3. Dataflow – from interactive paper to QbS

proprietary pattern. During image processing, snapshots are compared and information about how the pen is held is also gathered and stored. All the data from the image processor is packaged and loaded into the pen’s memory, which can store several fully written pages. The pen strokes are transmitted to a computer via Bluetooth or via a USB connection [10]. 4.2

Linking Paper Interface to

QbS

The QbS interactive paper and digital pen interface consists of an executable (streaming client) that receives streaming data from the pen via a Bluetooth socket and forwards it to a server on which the QbS application runs. This can be either a local or a remote server, so it is possible to use a notebook supporting Bluetooth for receiving the pen data and run the QbS application on a more powerful desktop computer. The components involved in transferring pen strokes to an application are shown in Figure 3. QbS allows pattern pages to be printed with any desired interface, as long as the Anoto pattern is not rendered invisible to the pen. All functionality is achieved by using XML configuration files. Only the global Anoto coordinates of the start page and the layout of the used pattern pages must be known in advance. By setting these values in the configuration file, areas with specific functionality can be defined by their actual local coordinates for all the pages used. Through the use of the XML configuration files, users are able to employ the same pattern pages for multiple paper interfaces. Hence, the QbS pen and paper interface can easily be used for other applications as well.

5

Evaluation

For our experiments we used the MIRFLICKR-25000 dataset [8]. Initially, 45 images have been randomly selected from the MIRFLICKR-25000 library. For these images, sketches have been created with the digital pen and their retrieval rank has first been calculated for a simple paper interface. Based on these evaluation results, an improved paper interface has been developed and finally, the retrieval ranks for this new interface have been calculated.

324

R. Kreuzer et al. Table 1. QbS improvement contributions of various features Feature

Contribution % Feature

Contribution %

ARP Trans.

16 %

ARP Scale / β -1

6%

ARP β 15

14 %

EHD Background

6%

ARP β 20

12 %

IDM Background

4%

ARP β 5 / 30

10 %

ARP Rotation/Background

2%

ARP β 35

8%

EHD Semi/Global

2%

In this first stage, we have started with a version of the QbS interface that is on purpose limited in terms of the functionality available, as it only supports the default settings for each feature descriptor. Figure 5(a) shows the retrieval ranks of the queries executed with this first paper interface for the MIRFLICKR25000 collection. Individual retrieval ranks are shown for all the descriptors. IDM, which is by far the slowest method, gives the best result but is only used on a filtered subset, containing the 160 best results from an ARP search done in advance. If the target image is part of this sub-selection, IDM will mostly deliver excellent retrieval ranks. On the other hand, if ARP does not rank the image under its top 160 results, IDM will not find it at all. With its proposed default settings EHD performs poorly in this use case. Enabling invariances can improve results. The possible improvements to achieve better ranks are shown in Table 1. A feature from each image descriptor is thereby considered an improvement, whenever its usage puts the retrieval rank in a higher category, e.g., if an image was found in Figure 5(a) among the Top 10 and it is now within the Top 5, this is considered an improvement. As expected, there are numerous cases for improvements in the retrieval ranks. Invariance support for ARP greatly improves the results as does the usage of different β values for the strength of the edge detection on the target images. Especially, translation invariance will often improve the retrieval as the position of the sketch often does not exactly match the detected edges of the target image. The β value on the other hand relates to the complexity of the drawn sketch. When many edges are drawn, a lower β will consider more edges from the target image when computing the distance function. Consequently, a higher β is more adequate when the sketch consists of a few prominent edges. The overall goal is to use the QbS digital pen and interactive paper interface in an interactive way, much like the Tablet PC interface can be used. This includes functionality to revise a query (i.e., to extend the sketch), and to adjust the search parameters. In order not to overload the interface, we have revised the first interface to only support the most promising and important search options. With these improvements, the interface for the MIRFLICKR-25000 use case has been created and printed on pattern pages (see Figure 4). By interaction on this new interface, it is now possible to find all the selected images at an acceptable retrieval rank as shown in Figure 5(b).

An Interactive Paper and Digital Pen Interface

(a) Simple interface

325

(b) Improved interface

Fig. 4. Scans of simple interface (a) and improved interface with advanced search options (b) using interactive paper (Gray background is caused by the Anoto pattern)

If the target image is not shown among the result set from the first search, translation invariance can be used for a quick second search. Since translation is not easy to avoid when drawing, this may often lead to better results as shown in Figure 6(a), where translation invariance makes a previously not found image (i.e., not among top 1000) a perfect match for the search. Other invariance options and β values, representing the level of detail for the edge detection can be tried when appropriate. Low β values should be used when most edges are drawn in the sketch, and higher β values when only the strongest edges of the image are drawn. As an example see Figure 6(b) where background invariance will turn a previously not found image into a perfect match. Although EHD has overall poor retrieval quality compared to ARP and IDM, it will lead to the best results when drawing sketches with many directional information in the edges, like shown in Figure 6 where an image is exclusively found by using EHD. Like ARP, EHD is very fast and will show realtime results. In the second stage, six images have been chosen from the library, printed and presented to multiple test persons, who then drew their sketches on the final version of the paper interface (see Figure 4). In total, 8 individuals participated

(a) Retrieval Ranks (Simple Interface) (b) Simple Vs. Improved Paper Interface Fig. 5. Simple and improved paper interface evaluations

326

R. Kreuzer et al.

(a) Translation

(b) Background

(c) EHD

Fig. 6. Sketches influenced significantly by enabling specific invariances or features

to this evaluation. The users were given some time to familiarize with the system before starting to search for the target images. A total of 48 sketches have been collected. Figure 7 show the target images and the sketches created for this evaluation. Note that all ranks are small fractions of the overall size of the collection of 25,000 images – therefore significantly reducing the number of images that a user would need to browse to find a desired image. The features and options and the ranks achieved also show, that a limited number of options can be sufficient to retrieve the known item even if the sketch is quite rough.

6

Conclusions and Future Work

Query-by-Sketch has been subject to intensive research in the past years. However, with the advent of novel user-friendly interfaces such as interactive paper and digital pens, one of the main problems Query-by-Sketch has suffered from, namely the limited quality of user provided sketches, can be overcome. These interfaces, together with powerful search support from the backend systems that provide robustness against small deviations in scale, rotation, translation, and/or missing objects in the background, finally lead to systems that can unobtrusively be used in various application domains, both by experts and lay users. In this paper, we have presented a novel and innovative interactive paper and digital pen interface to the QbS system. The various invariances QbS provides when comparing sketches and database images, together with these new interfaces, have been subject to detailed multiuser studies on the basis of the MIRFLICKR-25000 collection. The interactive paper and digital pen interface is operated in streaming mode, which supports Query by Sketching in a very natural and straightforward way, as the pen strokes of the user’s sketch are automatically transferred to the QbS system in the backend. The paper interfaces have been designed with the objective to make search interactive, i.e., to be able to change the parameters used for searching when reformulating a query. For this, dedicated paper interfaces have been designed for different collections, as we have found out that not all options are equally relevant in different settings. The evaluation has shown that the user interface, despite the fact that it is

An Interactive Paper and Digital Pen Interface

327

Fig. 7. Original images and user-drawn sketches used for the evaluation

printed on paper, is indeed interactive and allows as an input device to eventually find the object either in the first attempt with standard parameter settings, or at least after only a small number of iterations with changes in the search parameters specified directly on paper. In our future work, we will attempt to

328

R. Kreuzer et al.

apply QbS to sketch-based video retrieval, by adding specific gestures that will allow specifying the movement of sketched objects across subsequent frames. Acknowledgements. This work has been partly supported by the Swiss National Science Foundation, project PAD-IR (Paper-Digital System for Information Capture and Retrieval).

References 1. Agosti, M., Berretti, S., Brettlecker, G., del Bimbo, A., Ferro, N., Fuhr, N., Keim, D.A., Klas, C.-P., Lidy, T., Milano, D., Norrie, M.C., Ranaldi, P., Rauber, A., Schek, H.-J., Schreck, T., Schuldt, H., Signer, B., Springmann, M.: DelosDLMS: The Integrated DELOS Digital Library Management System. In: Digital Libraries: Research and Development – 1st Int. DELOS Conf., pp. 36–45 (February 2007) 2. Chalechale, A., Mertins, A., Naghdy, G.: Edge Image Description using Angular Radial Partitioning. IEE Vision, Image & Signal Processing 151(2) (2004) 3. Chen, T., Cheng, M.-M., Tan, P., Shamir, A., Hu, S.-M.: Sketch2photo: internet image montage. ACM Trans. Graph. 28(5) (2009) 4. del Bimbo, A., Pala, P.: Visual Image Retrieval by Elastic Matching of User Sketches. IEEE Trans. on Pattern Analysis & Machine Intelligence 19 (1997) 5. Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: PhotoSketch: A Sketch based Image Query and Compositing System. In: ACM SIGGRAPH (August 2009) 6. Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by Image and Video Content: The QBIC System. Computer 28(9), 23–32 (1995) 7. Hirata, K., Kato, T.: Query by Visual Example. In: Pirotte, A., Delobel, C., Gottlob, G. (eds.) EDBT 1992. LNCS, vol. 580, pp. 56–71. Springer, Heidelberg (1992) 8. Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In: ACM Int. Conf. on Multimedia Information Retrieval, MIR 2008 (2008) 9. Keysers, D., Deselaers, T., Gollan, C., Hermann, N.: Deformation models for image recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 29(8), 1422– 1435 (2007) 10. Koutamanis, A.: Sketching with Digital Pen and Paper. In: Computer Aided Architectural Design Futures 2005 (June 2005) 11. Kurtenbach, G.: Pen-based computing. ACM Crossroads 16(4), 14–20 (2010) 12. Norrie, M.C., Signer, B., Weibel, N.: General framework for the rapid development of interactive paper applications. In: CoPADD 2006, Workshop on Collaborating Over Paper and Digital Documents, vol. 6, pp. 9–12 (2006) 13. Sikora, T.: The MPEG-7 Visual Standard for Content Description – an Overview. IEEE Trans. on Circuits & Systems for Video Technology 11(6), 696–702 (2001) 14. Springmann, M., Al Kabary, I., Schuldt, H.: Experiences with QbS: Challenges and Evaluation of Known Image Search based on User-Drawn Sketches. CS Technical Report CS-2010-001, University of Basel, Switzerland (2010), http://informatik.unibas.ch/research/publications_tec_report.html 15. Springmann, M., Al Kabary, I., Schuldt, H.: Image retrieval at memory’s edge: Known image search based on user-drawn sketches. In: Conference on Information and Knowledge Management (CIKM 2010), pp. 1465–1468 (October 2010) 16. Veltkamp, R.C., Tanase, M.: Content-Based Image Retrieval Systems: A Survey. Technical Report UU-CS-2000-34, Utrecht University (2000)