Museum Tour Guide Robot With Augmented Reality - IEEE Xplore

Museum Tour Guide Robot With Augmented Reality Byung-Ok Han, Young-Ho Kim, Kyusung Cho, and Hyun S. Yang AIM Lab., Dept. of CS Korea Advanced Institute of Science of Technology Daejeon, Republic of Korea {bohan, yhkim, qtboy, hsyang}@paradise.kaist.ac.kr

Abstract—Many mobile robots have been deployed in various museums to interact with people naturally. The key requirements of a museum tour guide robot are how well it interacts with people and how well it localizes itself. Once those are accomplished, the robot can successfully educate and entertain people. In this paper, we propose a museum tour guide robot which uses augmented reality (AR) technologies to improve human-robot interaction and a localization method to find out its precise position and orientation. The AR museum tour guide robot can augment such multimedia elements as virtual 3D objects, movie clips, or sound clips to real artifacts in a museum. This characteristic of the robot can be achieved by knowing its whereabouts precisely, which is achieved by use of a hybrid localization method. The experimental results confirm that the robot can communicate with people effectively and localize accurately in a complex museum environment. Keywords – museum tour guide robot, augmented reality, robot localization

I.

INTRODUCTION

In a museum, many visitors have to search through a booklet in order to find information about exhibits. However, this searching method is very uncomfortable and tedious. Furthermore, the information is insufficient to meet the visitor’s interests. One way to solve this problem is to use robots as museum guides. A robot which interacts with people in the museum is a valuable application to explain exhibits, to guide visitors, and to introduce additional information. Interactive museum guide robots are able to automatically find and instantaneously retrieve information about exhibits. There are several approaches that allow visitors to interact with a guide robot. The Kusunoki et al. [1] proposed a system which interacts with the children by providing them with visual and auditory feedback via a sensor board. Even if this system is evaluated as an easy and effective method to interact with children, simple vision and audio feedback is not sufficient to interact with people who have questions about objects. Sebastian Thrun et al. [2] suggested MINERVA, which was designed to provide safe navigation in dynamic environments and short-term human-robot interaction. They proposed this safe navigation by using a texture map of the ceiling. Recently, many approaches have concentrated on objects recognition and improvement of interaction by using various methods. Herbert Bay et al. [3], demonstrated the object recognition performance as well as the speed of the SURF algorithm. However, they only dealt with objects used to retrieve additional information simply displayed on a tablet PC.

978-1-4244-9026-4/10/$26.00 ©2010 IEEE

In order to communicate with individual visitors, Masahiro Shiomi et al. [4] use RFID tags. They performed exhibitguiding by moving around several exhibits and explaining the exhibits based on sensor information. This method is very attractive to interact with each visitor. However, such methods do not provide specific information to visitors. Felix Faber suggested advanced tour guide robots by using speech, facial expressions, gestures, and body language to interact with people [5]. From the point of view of improving interaction, this is excellent research. However, even if the robot supports visitor’s specific interests, there will be limitations in terms of recognition rate, luminance, and pose. To further improve interaction with visitors, we present a museum tour guide robot with AR technologies which displays supplementary information on a monitor or projector screen. The AR is known to enhance a user’s perception of and interaction with the real world [6]. In the museum, the visitors cannot touch exhibits; thus, AR technologies can add more information on real exhibits. In order to display additional information onto a screen, our system offers fast and robust objects tracking, which is related to a markerless visual tracking [7]. Moreover, our approach needs to provide robust navigation in order to interact with visitors and to accompany with them during a touring crowded museum. There are numerous obstacles, such other visitors walking in different directions, so compared to [2][4], we use local sensors with a laser sensor, which is called hybrid scheme. This proposed system is able to perform robust object tracking and robust navigation in the museum in real time. The paper is organized as follows. In the next section, we propose our system architecture, and then, we present an AR method with robust object tracking. In Section 4, we present a robot navigation method at the museum. Then, we present the results with real experiment data in Section 5. Finally, we conclude in Section 6. II.

SYSTEM OVERVIEW

The proposed system consists of three parts: the guiding planner, localization module, and AR Module. The guiding module needs to interpret the user’s desired information, such as interests about exhibits, the average age of visitors, and the needs of expertise, through various sensors. Then, the guiding planner plans to effectively introduce exhibits with the localization module. That information gives the commands our localization and AR modules. Input from sensors such as voice, gestures, and manual input give the robot information on where to move and what to do through the guiding planner part. The

- 223 -

robot moves to a specific region and then augments AR information to visitors as a guide.

and virtual world. To solve this problem, the augmented tour guide robot requires a visual tracking method through the robot’s camera which recognizes the current exhibit among numerous exhibits and calculates a 6 DOF pose of it in real time. This visual tracking method requires a training process for real time requirement. The training process uses several training images for a 3D object or one image for a 2D object from a user and trains generic randomized trees using the training images. The overall procedure of recognition and tracking is shown in Fig. 3.

Figure 1. System Architecture

The Fig. 2 shows the scenarios of the museum tour guide robot.

Figure 3. Overview of markerless visual tracking Figure 2. Guide Scenario

The scenario of the guide consists of on-line mode and offline mode. In on-line mode, the robot is used for remote users so that we also call this mode ‘remote mode’. Users who want to visit the museum remotely log on to our system and input their interests on exhibits. Then, they can select between manual control and automatic control modes. In off-line mode, the robot can get information about users from various sensors using voice recognition or gesture recognition. Once the input process is complete, the robot plans a guiding procedure. If the visitor wants to get information on a specific piece of porcelain, the robot searches for the location of the porcelain, and then the robot guides the visitor to the porcelain room by using navigation method. Afterward, the robot asks the visitor if he/she wants more information. As the robot gets information, it moves to a suitable location such as 15-1, 15-2, or 15-3 to show the target exhibit. Finally, the robot shows appropriate information by using AR technology. III.

AUGMENTED REALITY

To give visitors additional information about exhibits in a museum, we propose a robot that uses an AR technique. When the robot approaches any type of artifacts in the museum, it augments multimedia elements such as 3D objects, movie clips or sound clips to real artifacts. The most important problem with the AR technologies is the registration between the real

First of all, keypoints are extracted from the input image through the FAST detector [8]. These keypoints are used in the overall procedure of recognition and tracking. If there is no valid ID of exhibits, the recognition process is performed; otherwise, the tracking process is performed. The valid ID of exhibits arises if the current exhibit is recognized or if the exhibit tracking process is executed successfully at the prior frame. To obtain the precise initial pose of the recognized exhibit, the PROSAC method [9] is used for outlier removal among matching pairs and Schweighofer et al.’s method [10] is used for pose estimation. Once an exhibit is recognized from the system and its initial 6 DOF pose is calculated in the recognition process, a valid ID of an exhibit is created. The 6 DOF pose can be represented by rotation matrix R and translation vector T. These results, including matrix R, vector T, and an exhibit ID, are conveyed to the tracking process. The tracking process is executed repeatedly until the tracking process fails due to the large movements of the robot. A. Recognition process In the recognition process, the generic randomized forest (GRF) plays a key role in both exhibit recognition and keypoint matching. It maximizes the reusability of a randomized forest (RF) [11]. The GRF can perform exhibit recognition and widebaseline keypoint matching simultaneously. Moreover, the nodes tests in the GRF are independent of the training data so

- 224 -

that it is possible to reuse them for another exhibit. Thus, node tests of one RF can be shared efficiently. A GRF consists of NT randomized trees T1, T2, ... ,TNT, and all node tests of the GRF are built in a random manner. The structure of the GRF is basically equivalent to the original RF apart from the probabilities stored in the leaf nodes, as shown in Fig. 4.

‫ݐܾ݄݅݅ݔܧ‬ଓƸ ൌ ܽ‫ݔܽ݉݃ݎ‬ሺ ൌ ݅ȁଵ ǡ ǥ ǡ ୒୘ ǡ ଵ ǡ ǥ ǡ ୒ ሻ(3) ୒

୒୘

୨ୀଵ

୲ୀଵ

ͳ ͳ ൌ ܽ‫ ݔܽ݉݃ݎ‬෍ ෍ ሺ ൌ ݅ȁ݈݂݁ܽሺ୲ ǡ ୨ ሻሻǡ where mj represents one keypoint and ݈݂݁ܽሺ୲ ǡ ୨ ሻ is the leaf node that mj reaches in Tt. 3) Wide-baseline keypoint matching session If an exhibit is recognized, keypoint matching is executed. The keypoint matching session is performed as the following equation (4): ‫݇ݐ݊݅݋݌ݕ݁ܭ‬෠ ൌ ܽ‫ݔܽ݉݃ݎ‬௞ ሺ ൌ ݇ȁଵ ǡ ଶ ǡ ǥ ǡ ୒୘ ǡ ୨ ሻ(4) ୒୘

ൌ ܽ‫ݔܽ݉݃ݎ‬௞

ͳ ෍ ሺ ൌ ݇ȁ݈݂݁ܽሺ୲ ǡ ୨ ሻሻǤ ୲ୀଵ

B. Tracking process In the previous recognition process, a 3D world map was already constructed. Then, the following procedure is executed at every frame. i) ii) iii) iv)

v) Figure 4. Generic randomized forest

1) Training session Every leaf node of the GRF has one probability distribution for exhibit recognition and Nc probability distributions for keypoint matching. The posterior of exhibit class i for recognition are obtained using (1): ሺ ൌ ݅ȁɂ୲ǡ୪ ሻ ൌ

୒౪ǡౢǡ౟ ୒౪ǡౢ

,

(1)

where ɂ୲ǡ୪ represents the l-th leaf node in t-th tree, ୲ǡ୪ is the total visiting number in the leaf node, ɂ୲ǡ୪ , and ୲ǡ୪ǡ୧ is the frequency belonging to exhibit class i in the leaf node, ɂ୲ǡ୪ . The posterior of keypoint class k for keypoint matching is obtained using (2) ൫ ൌ ݇ห݅ǡ ߝ୲ǡ୪ ൯ ൌ

୒౪ǡౢǡౡ ୒౪ǡౢ

,

(2)

where ୲ǡ୪ǡ୩ is the frequency belonging to keypoint class k in the leaf node, ɂ୲ǡ୪ . 2) Exhibit recognition session Once all of NT trees are constructed, the classification of exhibits is ready for exhibit recognition. If the robot is in front of one exhibit, N keypoints are extracted from an image taken by the camera of the robot. Thus,

A prior pose is estimated from a motion model. Map points in the world are projected into the image according to the estimated prior pose. A coarse search is performed with 60 map points, and the camera pose is refined. A fine search is performed with at most 500 map points, and the final pose is computed from the matching. The motion model is updated.

1) Camera pose estimation session Camera motion, M, represents a six-vector, μ, that consists of a translation and a rotation. Given the previous camera pose ෡ , can be estimated through an P, the new camera pose exponential map [12]. Thus, ෡ ൌ ൌ ݁‫݌ݔ‬ሺߤሻ . (5) 2) Patch search session Every map point Xw in the world coordinate is projected into the current image frame to find matching pairs as in (6): ‫ݔ‬௪ ෡ ௪ ൌ ሾሿ቎‫ݕ‬௪ ቏ , ‫ݔ‬௜ ൌ (6) ‫ݖ‬௪ ͳ where xi is a 2D point in an image coordinate and K is the intrinsic matrix of the camera. To consider viewpoint changes between the 8ஔ8 image patch from the keyframe in the world map and the current camera position, affine warping is performed [13]. The best match between the projected map and a keypoint in the current image frame can be found within a fixed radius. 3) Coarse-to-fine matching session To obtain the precise camera motion, a coarse search and fine search are performed, respectively. In the coarse search, the patch search is done with a large radius based on only 60 map points from highest levels of the image pyramid of the

- 225 -

current image frame. Then, the camera pose is refined with successful matching pairs. In the fine search, the patch search is performed with up to 500 map points based on the refined pose. At this point, the final camera pose is calculated and the camera motion is updated. IV.

NAVIGATION

In the museum, there are crowds of people who are usually moving to various exhibits. The tour guide robot must be able to navigate in the dynamic and complex environment. Because there are valuable exhibits in the museum, the robot needs to localize its position and orientation very precisely. Also, to combine the robot and the AR technique, the robot needs to know its exact position and orientation so that the AR method augments various multimedia elements to exhibits accurately. To meet the needs listed above, we used an infrared-based Local Positioning Sensor (LPS), an Laser Range Finder (LRF), and wheel encoders. Information from these sensors are effectively combined by using a hybrid scheme for robot localization and a map building method [7]. This method combines topological and metric paradigmes to improve the effectiveness of the LPS.

B. Hybrid map representation for LPS The hybrid approach combines the advantages of the metric and the topological approaches in such a way that the LPS can be used effectively and easily. Passive landmarks can be used as states in the topological method because the LPS offers data on the local position, orientation, and ID of the passive landmarks; they also provide precise data on the local location of the robot with a sufficiently high resolution for the metric method. To apply this approach for a museum tour guide robot, we need to do the following to ensure the environment is suitable: i)

Install passive landmarks on the ceiling in places that are important and difficult to localize, such as around exhibits, corners, intersections, doors, or hallways. The passive landmarks with distinctive IDs represent states called LPS zones.

ii)

Ensure one of the LPS zones is a region with a radius of 2.5m. Install the landmarks so that they do not overlap.

iii)

Use the landmarks ID numbers to represent distinct locations by a state table.

Once the installation is completed in accordance with the above-mentioned requirements, we can model the environment for the hybrid method with the LPS, as in Fig. 7.

Figure 5. The stargazer and a passive landmark

A. Local positioning sensor The indoor LPS is called the “stargazer” and is based on infrared. It analyzes an infrared ray image reflected from a passive landmark attached on the ceiling. Fig 5. shows the appearance of the stargazer and the landmark. Every landmark has a unique ID, and the LPS can measure position and orientation using the image reflected by the passive landmarks. Details of the landmark are shown in Fig. 6. Figure 7. A hybrid representation for the LPS: EX represents an exhibit place, D is a door, R is a room, E is an elevator, C is a corner, and I is an intersection.

C. Hybrid map building strategy

Figure 6. Details of a passive landmark

The stargazer’s output on the position and orientation of the robot correspond to a resolution of 2 cm and 1 degree, respectively, with a data acquisition speed of 10 Hz. The stargazer can localize in a circle with a radius of 2.5m from the center of passive landmarks. In addition, it can be robust under the various illumination conditions, such as fluorescent lights, sunlight, and darkness.

1) Metric map building An iterative closest point (ICP) algorithm [14] is used for the metric map building using an LRF, two wheel encoders, and an LPS. A major difference from the classical ICP algorithm is that we use the relative displacement of a robot from an LPS on the LPS zone to obtain precise odometry information of the robot. The classical ICP algorithm is vulnerable to the rotation of the robot because of the nature of the ICP algorithm. TheICP algorithm extracts corresponding points from the closest point of the reference scan. Thus, we need an accurate estimation of the robot displacement for the first iteration of the ICP algorithm so that we set the landmarks

- 226 -

TABLE I.

on the corners. Once the precise relative coordinate of the robot is obtained, fast convergence of the ICP algorithm is achieved. The following calculation is needed to obtain the displacement from the LPS:

TOPOLOGICAL REPRESENTATION BY TABLE

Pose of Landmark State Neighbors (x,y,q) 353 (60, 20, 1.1) Door 301, 256 301 (60, 10, -2,1) Corner 351, 353 351 (70, 10, 1.8) Exhibit 1 352, 301 352 (80, 10, 1.7) Door 302, 351 302 (90, 10, -0.3) Corner 257, 352 257 (90, 30, 0,7) Intersection 303, 302 303 (50, 30, -0.4) Corner 256, 257 256 (50, 20, 0.9) Exhibit 2 353, 303, 354 354 (40, 30, 1.6) Door 256 A landmark ID is independent of others IDs. The pose of a landmark is measured by the position in meters, and the orientation is measured in radians. States inform the robot about information on the local surroundings. Neighbors are places that are adjacent to the robot’s current location. ID

ο ሺοߠ ൅ ߨሻȂ ሺοߠ ൅ ߨሻ ୬ୣ୵ െ ୰ୣ୤ ଴ ൌ ൤ ൨ ൌ ቈ ቉ቂ ቃǡ(7) ο ሺοߠ ൅ ߨሻ ሺοߠ ൅ ߨሻ ୬ୣ୵ െ ୰ୣ୤ t where [οx οሿ is the relative displacement of the robot, οߠ ൌ ߠ୬ୣ୵ െ ߠ୰ୣ୤ . Fig. 8 shows the moment at which the robot moves into the LPS zone.

Figure 8. The LPS coordinate system.

2) Topological map building The topology of a map is achieved by using a landmark’s ID representation and its connectivity. When the robot is exploring the museum, it makes a table of information of the LPS location. Table 1 shows the topological information. D. Hybrid Localization strategy 1) Metric localization For the metric localization, we use the Monte Carlo localization (MCL) algorithm [15]. The MCL algorithm has a problem with initializing the robot’s pose because of the number of particles. If the environment for a map is significantly large, many particles are needed to cover the entire environment. However, the use of many particles often causes an incorrect localization result, and the initialization process takes a long time. Thus, we use the position and orientation information from the LPS for the initialization. 2) Topological localization A topological localization usually provides data on the state of the robot and a simple representation of the map through the ID of a landmark provided by the LPS. In an LPS zone, the robot can localize its pose even though there are many people and other dynamic obstacles when localizing itself. The metric information is added from a state through the hybrid map. Fig 9 represents the coordinate system of the hybrid map. Following equation is needed for transforming local coordinates to global coordinates. ୥୪୭ୠୟ୪ ୪୭ୡୟ୪ ሺߨ െ ߠ௜ ሻെሺɎ െ Ʌ୧ ሻ ୧ ൥୥୪୭ୠୟ୪ ൩ ൌ ൥െሺɎ െ Ʌ୧ ሻሺɎ െ Ʌ୧ ሻ ୧ ൩ ൌ ቈ୪୭ୡୟ୪ ቉(8) ͳ ͳ ͲͲͳ

Figure 9. The map coordinate system that represents the transformation from the LPS zone’s coordinate system to the global map coordinate system.

V.

DEMONSTRATION

Figure 10. Museum tour guide with the robot

To test the effectiveness of our approach, we have realized an application for the robot AMIGO, offering a tour at our laboratory. The robot guides visitors with augmented reality like Fig. 10. They can see information of our laboratory on a labtop monitor now like below picture. There are several pictures that show the history of our lab, as indicated in Fig. 11-(a). The robot can navigate around the room using the result of localization. Fig. 11-(b) represents the LPS zones and a

- 227 -

trajectory of the robot to explain the history of our lab effectively. j

j

l] ^ _

lX

lY

]

W

l\ lZ

X \

Y

j

l[

j

y

Z

[

environment that involves numerous obstacles. Furthermore, our system is able to guide visitors using on-line and off-line methods because of interactive AR technology and accurate navigation. For these reasons, our system, which provides a different museum robot from others, can be a more interactive museum guide. As we reported in the experiment section, the robot was able to give a tour at our lab and to interact with visitors in such difficult situations. In future research, we plan to extend more effective ways to display the AR for visitors. Even though our robot has precise navigation technologies, the small monitor fixed on the robot limits the interaction with visitors. Therefore, we are considering projecting AR onto empty walls by using beam projectors equipped on the robot. We are also considering content connectivity with smart phones for enhancing interaction. ACKNOWLEDGMENT

OP

OP

Figure 11. (a) A map of the laboratory. (b) The localization result.

If there is a picture to augment multimedia elements around the robot in the map, the robot shows the AR result to visitors.

This research is supported by Foundation of Healthcare Robot Project, Center for Intelligent Robot, the Ministry of Knowledge Economy (MKE), and a grant (07High Tech A01) form High Tech Urban Development Program funded by Ministry of Land, Transportation and Maritime Affairs of Korean government. REFERENCES [1]

[2]

Figure 12. (a) An original picture. (b) A feature map. (c) A result of the AR (the robot view).

Firstly, the robot finds the picture location from the localization module. Then, the robot extracts the features of the picture, like Fig.12-(b), and augments 3D elements to the picture through the AR module, like Fig. 12-(c). In this moment, the robot introduces the history of our laboratory through a text to speech module and gestures. Fig. 13 shows the result of the AR with video clips of our robots.

[3]

[4] [5]

[6] [7]

[8]

[9] Figure 13. (a) An original picture. (b)(c) Augmented reality with video clips

VI.

[10]

CONCLUSIONS

In this paper, we described a museum tour guide robot with AR which allows robust recognition of exhibits under difficult environmental conditions. This proposed system has two approaches for guiding tours through the museum. First, FAST detector and GRF methods enhance fast keypoint extraction and robust tracking on variant scales and rotation in real time. This method helps to adjust AR technologies to the museum environment. Second, the hybrid navigation technologies improve the navigation of the museum, which has a complex

[11]

[12] [13]

- 228 -

Kusunoki,F.,Sugimoto,M.,Hashizume, H. “Toward an interactive museum guide with sensing and wireless network technologies,” WMTE2002,Vaxjo,Sweden,2002. S. Thrun, M. Bennewitz, W. Burgard, A. Cremers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg, N. Roy, J. Schulte, D. Schulz, “Probabilistic algorithms and the interactive museum tour-guide robot minerva,” International Journal of Robotics Research 19(11) (2000) 972-999. H. Bay,B. Fasel, L. Gool “Interactive Museum Guide: Fast and Robust Recognition of Museum Objects,” International Workshop on Mobile Vision, 2006. M. Shiomi, T. Kanda, H. Ishiguro, N. Hagita, “Interactive Humanoid Robots for a Science Museum,” IEEE Intelligent Systems, 2007. F. Faber, M. Bennewitz, C. Eppner, A. Gorog, “ The Humanoid Museum Tour Guide Robotinho,” IEEE International Symposium on Robot and Human Interactive Communication(RO-MAN). Toyama, Japan, 2009. R. Azuma, “A Survey of Augmented Reality,” Teleoperators and Virtual Environments 6,4(August 1997), 355-385. K. Cho, J. Yoo, H. Yang, “Markerless Visual Tracking for Augmented Books,” Joint Virtual Reality Conference of EGVE-ICAT-EuroVR, 2009. Rosten E., Drummond T., “Machine learning for high-speed corner detection.”, In Proc. 9th European Conference on Computer Vision (2006), pp. 430-443. Chum O., Matas J., “Matchin with PROSAC-Progressive Sample Consensus.”, In Proc. IEEE Conference on Computer Vision and Pattern Recognition (2005), pp. 220-226. Schweighofer G., Pinz A., “Robust Pose Estimation from a Planar Target.”, IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 12 (2006), 2024-2030. Lepetit V., Fua P., “Keypoint Recognition using Randomized trees.”, IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 9(2006), 1465-1479. Varadarajan V., “Lie Groups, Lig Algbras and Their Representation.”, SpringerVerlag, 1974. Klein G., Murray D., “Parallel tracking and mapping for small AR workspaces.”, In 6th IEEE and ACM International Symposium on Mixed and Augmented Reality(2007)”, pp. 255-234.

[14] P. J. Besl, N. D. Mckay, “A method for registration of 3-D shapes”, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 14, No. 2, Feb. 1992. [15] S. Thrun, D. Fox, W. Burgard, F. Dellaert, “Robust monte carlo loclization for mobile robots”, Artificial Intelligence, 2001, pp. 99-141.

- 229 -