Robust SLAM System Based on Monocular Vision ...

Robust SLAM system based on monocular vision and LiDAR for robotic urban search and rescue Xieyuanli Chen, Hui Zhang, Huimin Lu, Junhao Xiao, Qihang Qiu, and Yi Li College of Mechatronics and Automation National University of Defense Technology Changsha, Hunan, China, 410073 [email protected], [email protected] Abstract—In this paper, we propose a monocular SLAM system for robotic urban search and rescue (USAR), based on which most USAR tasks (e.g. localization, mapping, exploration and object recognition) can be fulfilled by rescue robots with only a single camera. The proposed system can be a promising basis to implement fully autonomous rescue robots. However, the feature-based map built by the monocular SLAM is difficult for the operator to understand and use. We therefore combine the monocular SLAM with a 2D LIDAR SLAM to realize a 2D mapping and 6D localization SLAM system which can not only obtain a real scale of the environment and make the map more friendly to users, but also solve the problem that the robot pose cannot be tracked by the 2D LIDAR SLAM when the robot climbing stairs and ramps. We test our system using a real rescue robot in simulated disaster environments. The experimental results show that good performance can be achieved using the proposed system in the USAR. The system has also been successfully applied and tested in the RoboCup Rescue Robot League (RRL) competitions, where our rescue robot team entered the top 5 and won the Best in Class small robot mobility in 2016 RoboCup RRL Leipzig Germany, and the champions of 2016 and 2017 RoboCup China Open RRL competitions. Keywords—Urban search and rescue; rescue robots; monocular SLAM; LiDAR SLAM; relocalization.

I. INTRODUCTION Simultaneous localization and mapping (SLAM) is a key component for most applications in urban search and rescue (USAR). It is normally described as the problem of estimating the robot pose and at the same time reconstructing the urban disaster environment incrementally. Considering that cameras are very cheap and easy to use, and the monocular SLAM problem has been much explored, our work aims to introduce the monocular SLAM to the USAR carried out by the mobile robots (e.g. tracked wheel robots). The state-of-the-art monocular SLAM systems perform well in rich featured, static or lowly dynamic environments. However, since using only one camera is difficult to calculate the accurate depth of observed features, tracking failure is a big problem in most monocular SLAM systems. Due to the fact that the terrains of the USAR are usually highly cluttered and unstructured, the tracking failure problem seems to be inevitable, and therefore monocular SLAM systems are seldom used in the USAR. The most effective solution to this problem is to add an relocalization

Fig. 1. The overview of the proposed SLAM system based on monocular vision and 2D LiDAR. It includes two sections: the LiDAR SLAM and the monocular SLAM. We use the LiDAR SLAM to obtain the 2D map and the real scale of the environment, while the monocular SLAM is used to localize the rescue robot in six degrees of freedom.

module to recover the SLAM process. In our previous work [1], a robust relocalization system based on active loop closure was proposed, which can well solve the tracking failure problem in most situations. In this paper, we combine it with an object recognition module utilizing the same local image feature to realize a compact and complete monocular SLAM system for USAR missions. The improved monocular SLAM system performs well in the localization, mapping and object recognition for the USAR, and it is a very promising approach for rescue robots to achieve a high-level autonomy. However, the highly cluttered and unstructured characteristics of USAR terrains make it extremely difficult for robots to realize fully autonomous operation. As a result, to date, the semiautonomous control is still the most popular control method [2] in the USAR, and it requires a human operator in the loop to help guide the robot remotely. During the process of Human-Robot Interaction (HRI), the map obtained by the feature-based monocular SLAM is difficult to understand and use, while the traditional 2D grid-based map obtained by the LiDAR SLAM is easier to understand, but they are normally incapable to localize the robot on stairs or ramps. Therefore, we combine these two SLAM methods to realize a 2D mapping and 6D localization SLAM system (shown in Fig. 1) which is robust, accurate as well as friendly to users.

1.

http://www.trustie.net/organizations/23/videos

II.

RELATED WORK

A. Monocular SLAM Monocular SLAM was initially solved by filtering [3], and now it has developed into two main branches: featurebased approaches and direct approaches. Feature-based Approaches. These methods include two sequential steps: features are extracted from the images, and then the robot/camera pose is calculated and the environment is mapped based on the extracted features. Early versions of this kind of approaches are based on filters (e.g. [3], [4]), using all the measurements to update the probability distributions of features and camera poses, which are less accurate and can only be used in small-scale scenes. In order to better realize modern applications, feature-based monocular SLAM are now mostly based on keyframes and Bundle Adjustment (e.g. [5], [6]), which has high accuracy and low computation cost. The limitation of these approaches is that some important information (e.g. the information stored in edges and shapes) is discarded during the SLAM process, which makes the maps constructed by these methods less visually appealing and sometimes even unreadable from the perspective of users. Direct Approaches. Direct methods can overcome this limitation by optimizing the geometry directly on the image intensities (e.g. [7], [8]). They can build impressive maps and meanwhile achieve high accuracy and robustness. However, this kind of methods are comparatively weak in object and place recognition, because the image intensities are only available for tracking. So, these methods are difficult to be used in the USAR which requires rescue robots not only generate the map of the environment and localize themselves, but also detect hazards and search for victims in the rescue scenarios. In summary, current direct approaches are not suitable for rescue robots to fulfill the USAR tasks. B. 2D LiDAR SLAM 2D LiDAR SLAM systems can be divided into three branches: Scan matching. Scan matching is the computation of the incremental, open loop maximum likelihood trajectory of the robot by matching consecutive scans. Hector-SLAM [9] is one of the well-known scan matching method used in the USAR. In small environments (e.g. RoboCup Rescue field), a scan matching algorithm is sufficient to obtain accurate maps with high resolution and small computational effort. Rao-Blackwellized Particle Filter. Gmapping [10] is based on Rao-Blackwellized particle filters. The advantage of these approaches is that they use an informed proposal distribution for particle generation optimized by laser range data which can improve the accuracy of the SLAM. However, Gampping uses odometry to reduce the number of particles, while the odometry is unreliable in highly uneven terrains, which makes this method unusable for the USAR. Graph Optimization. This approach constructs a graph out of the sequence of measurements. Every node in the graph represents a pose along the trajectory taken by the robot and the measurements obtained at that pose. Google’s

Cartographer [11] is one of the graph-based 2D LiDAR SLAM method. Based on graph optimization, Cartographer can reduce the accumulated error and realize loop closure in large-scale environments. However, this method also uses odometry and is less robust in highly uneven terrains. Generally, LiDAR-based approaches can obtain accurate robot/camera poses and the real scale of the environments. Furthermore, they are invariant to illumination, and the 2D grid-based maps created by them are easy to understand and use. However, in rescue environments, especially when rescue robots climb stairs and ramps, LiDAR-based approaches usually fail to localize the robots, and the map built by them will be highly distorted. C. SLAM in Rescue Environments Various SLAM approaches have been developed for USAR applications, but most of them are based on special sensors. For instance, in [12], based on Extended Information Filter, a CSEM Swiss Ranger SR-2 range camera and a 2D camera are used to build dense 3D maps for indoor USAR environments. In [13], a laser range finder is used to build a 3D map of the rubble based on an EKF-based SLAM technique. In [14], a novel 3D sensory system is proposed using a digital fringe projection and phase shifting technique to provide real-time 2D and 3D sensory information of the environment. In [15], the monocular and stereo vision together with an IMU are used in a 6D-SLAM for the highly uneven USAR terrains. [16] and [17] also realize the monocular SLAM for the USAR. However, they both use mosaicking methods on aerial vehicles, while our system is based on visual features which can be used on both aerial vehicles and mobile robots fulfilling more challenging tasks including object recognition. D. Contributions of this work This paper is built on our previous works. The robust relocalization module was proposed in [1], and based on it we introduce the monocular SLAM method into the USAR tasks combining with an object recognition module. All the proposed algorithms in this paper have been well implemented on our rescue robot, and successfully applied and tested in the RoboCup RRL competitions. Our rescue robot is described in [18] and [19]. All the test videos can be found on our website1. The major contributions of this paper are as follows: •

To our knowledge, it is the first trial to use a monocular SLAM in the USAR on ground mobile robots. Based on the robust relocalization module and object recognition module, this improved monocular SLAM system can complete most USAR missions, including localization, mapping, exploration and object recognition using the same local visual feature.

•

To make the HRI more friendly, we combine the monocular SLAM and the 2D LiDAR SLAM. The monocular SLAM is used to localize the robot when it climbing stairs and ramps, while the 2D LiDAR SLAM is used to map the rescue environments when the robot running on the ground. Utilizing the combined system, we can solve the problem of the

scale drift and the unreadable map in monocular SLAM, as well as the problem that the robot pose cannot be tracked by the 2D LiDAR SLAM when the robot climbing stairs and ramps. III.

MONOCULAR SLAM SYSTEM

The proposed system is based on the state-of-the-art realtime monocular SLAM, ORB-SLAM [6], but the system is of strong generalization and can be integrated into any appearance-based SLAM system. A. System Overview Fig. 1 shows the architecture of the proposed system. The monocular SLAM section is composed of six parts: tracking, loop closing, local mapping, relocalization, active loop closure and object recognition. Following the original ORBSLAM, we use the same tracking thread in charge of localizing the camera in every frame and deciding when to insert a new keyframe, the same local mapping thread in charge of processing new keyframes and maintaining the local map, and the same loop closing thread in charge of searching loops and correcting the accumulated drift if closing the loop successfully. In [1], we added a relocalization module and an active loop closure module, which make the SLAM system more robust to the tracking failures caused by the blur, sudden motion and occlusion. In this work, we combined this improved system with an object recognition module to fulfill the missions of the USAR. B. Rubost Relocalization Module When the tracking fails, the proposed system will employ the image-to-map relocalization module on each incoming frame. The ORB [20] features are firstly extracted, and matched to those stored in the local map. Then, the current robot/camera pose is estimated from the correspondences, and finally the estimated pose is optimized, which allows the SLAM system to continue the pose tracking. The details of each step are described as follows: •

Feature Extracting: The first step is to extract ORB features in the current frame. Like ORB-SLAM, we use the image pyramid and detect FAST keypoints in each level to make it more accurate.

•

Descriptor Matching: The second step is to find the matches between the extracted features and mappoints stored in the local map. Instead of using complex classifier-based methods, we employ the simplest exhaustive nearest neighbor searching to establish the associations based on the local map, which is fast and accurate.

•

Pose Estimating: The third step is to estimate the robot pose from the set of 2D to 3D matches by solving a PnP problem [21] using a RANSAC scheme.

•

Pose Optimizing: Finally, we use the LevenbergMarquadt method implemented in g2o [22] to refine the pose. Then, the SLAM process will continue.

During the tracking failure, the SLAM system cannot collect enough information to perform accurate localization

and mapping, which will lead to a larger drift after the relocalization. To solve this problem, our previous work employed an active loop closure module to reduce the drift after the relocalization and acquire more accurate maps through path planning and robot movements. However, in the USAR scenarios, rescue robots usually run at low speeds in small-scale areas, which will not cause the drift increasing problem. So, in this case we disable the active loop closure module (shown in dash lines in Fig. 1) to reduce the amount of the computation. C. Autonomous Object Recognition Module In the USAR, the health and safety of victims are always of primary concern. Although human teleoperation can be effective, the unknown cluttered characteristic of the environments makes the tasks of robot navigation and victim identification highly challenging. The issues such as the latency or the loss of communication can arise. Additionally, operators may become stressed and fatigued very easily in such highly tense environments, causing crucial errors in the robot control and victim identification. In this case, a good SLAM system for the USAR needs to not only generate the maps of the rescue environments and localize rescue robots, but also help operators identify the objects of interest, e.g. autonomously detecting hazards or recognizing survivors. To achieve this goal, we add an object recognition module into the monocular SLAM utilizing the same ORB feature. We use the bag-of-words approach [23]. The samples of the objects of interest (e.g. victims, hazard labels or QR codes) are trained in advance into visual words which refer to vectors of binary descriptors. Then, the vocabulary made up by these visual words will be loaded into the monocular SLAM during the initialization. During the SLAM process, the object recognition module will deal with every incoming frame by comparing the ORB features extracted from the current frame with those stored in the vocabulary using the nearest neighbor distance ratio (NNDR) method. Since the ORB feature is a kind of binary feature, we directly compute the Hamming distance between features, which is very fast using only an xor operation. Two features are matched if the distance to the nearest neighbor is TNNDR times less than the distance to the second-nearest neighbor, and a target is found if the number of the matches exceeds the threshold Tmatches. Once detecting a target of interest, this module will compute the homography between the sample and the current frame, mark the target in the map and send a notice (e.g. circling the target in the image) to the operator, which can greatly minimize the operator’s workload. The object recognition procedure is summarized in Algorithm 1. Algorithm 1: Object Recognition with image I 1: Procedure Object Recognition(I) 2: descriptors ← Extraction(I) 3: (matches, N) ← Matching(descriptors, vocabulary) 4: If N > Tmatches then 5: position ← homography(matches) 6: map ← mark(position) 7: End If 8: End Procedure

Benefiting from the ORB feature, the recognition module can be run very fast and invariant to the rotation and scale in a certain extent. Because of utilizing the same local visual feature, all modules of the monocular system can share the same image and map data. Namely, there is no need to convert the data into different types and some steps (e.g. feature extraction) can be shared by different modules, which makes the whole system very compact and efficient. IV.

MONOCULAR VISION AND LIDAR COMBINED SLAM

In the previous section, we introduce a monocular SLAM system for the USAR tasks, which provides a robust 6D SLAM as well as a real-time object recognition module. However, current rescue robots require human operators in the loop to remotely help them fulfill the USAR tasks. Thus, a good HRI plays an important role in realizing the USAR goals. Considering that the map of the feature-based monocular SLAM is hard to understand, and the 2D LiDAR SLAM is not able to localize robots when climbing stairs or ramps, we therefore combine these two SLAM methods to realize a 2D mapping and 6D localization SLAM system. A. Hector-SLAM Hector-SLAM [9] is a 2D LiDAR SLAM system based on robust scan matching and 3D navigation technique using an inertial sensing system. It focuses on the estimation of the robot movement in real-time, making use of the high updating rate and the low measurement noise from modern LiDARs. The scan matching is solved using a GaussianNewton equation, which finds the rigid transformation that best fits the laser beams with the map. In addition, a multiresolution map representation is used, to avoid getting stuck in the local minima. The odometry information is not used, which gives the possibility to implement this approach in aerial robots or in ground robots operating in highly uneven terrains. On the other hand, it may have problems when only low updating rate scans are available. For instance, the SLAM process may fail when used in a high speed mobile robot, because the 2D pose estimation is based on the optimization of the alignment of beam endpoints with the map obtained so far, and the localization tracking may fail if the speed of the robot is too fast. The 3D state estimation for the navigation filter is based on EKF. However, this is only available when an Inertial Measurement Unit (IMU) is present, while low-cost IMUs usually cannot work well in USAR environments. Therefore, Hector-SLAM cannot work when the robot is climbing stairs or ramps. B. Combining 2D LiDAR SLAM with monocular SLAM To solve the problems mentioned above, we use a cheap 2D camera performing monocular SLAM to replace the IMU. We combine these two methods using the monocular SLAM as an odometry when the robot is climbing stairs and ramps, while using the Hector-SLAM to map the rescue environments when the robot is running on the ground. Based on scan matching, LiDAR SLAM aligns laser scans with each other or with an existing map. Since modern laser scanners have low distance measurement noise and high scan rates, when used in the rescue robot systems, the precision of the laser scanner is much higher than that of

visual data. Thus, when the robot running on the ground, we use the localization tracking results of LiDAR SLAM and the Least Squares Method (LSM) to calibrate the camera data and unify the scale of the monocular SLAM. 1) Using LSM to align these two SLAM methods We follow the Hector-SLAM to define the navigation coordinate system as a right-hand system with the origin at the starting point of the robot, the z axis pointing upwards and the x axis pointing into the yaw direction of the robot at the startup. The full 3D state is represented by:

x = ( ΩΤ p Τ v Τ 　 )

(1)

Ω = (φ , θ ,ψ )Τ  Τ   p = ( px , p y , pz )  Τ  v = ( vx , v y , vz ) 

(2)

Τ

where φ，θ，ψ are the roll, pitch and yaw Euler angles, and px , p y , pz and vx , v y , vz are the position and velocity of the robot expressed in the navigation frame. Based only on a 2D LiDAR, the scan matching is only available in the xy-plane. Namely, we can only get a 2D rigid transformation from the LiDAR SLAM:

ξ L = ( pLx , pLy ,ψ L )

Τ

(3)

For monocular SLAM, the absolute transformation of the camera pose in SE(3) is:

t  R TC =  C C  0 1   nx n y nz  Τ R C =  ox o y oz  , t C = ( t x , t y , t z ) a a a   x y z

(4) (5)

The Euler angles and the robot position obtained from the monocular SLAM can be calculated as follows:

φC = arc tan 2(n y , nx ) θC =arc tan 2(−nz , nx cos φ + n y sin φ ) (6) ψ C = arc tan 2(ax sin φ − a y cos φ , −ox sin φ + o y cos φ )  ( pCx , pCy , pCz ) = ( t x , t y , t z ) After unifying coordinate systems, we use the LSM to align the pose obtained by the monocular SLAM with that obtained by the LiDAR SLAM. For simplicity, we assemble the camera and the LiDAR in the same vertical line of xyplane (shown in Fig. 2) and assume that the two SLAM systems can initialize at the same time and starting point. Taking the px as an example, the objective function is shown in formula (7). m

S ( λx ) = ∑ pLxi − λx pCxi i =1

2

(7)

To minimize the sum of squares, we differentiate S on λ , and then we can estimate the real scale of the monocular SLAM in the x direction:



m

m

m



λx =  m∑ pLxi pCxi − ∑ pLxi ∑ pCxi  

=i 1

=i 1 =i 1

(8)



m m  m   m∑ pCxi pCxi − ∑ pCxi ∑ pCxi  = =i 1 =i 1  i1 

Finally, we can align the pose obtained from the monocular SLAM as follows:

(

)

 φˆC ,θˆC ,ψˆ C = Rφ (φC ,θC ,ψ C )   = ( pˆ , pˆ , pˆ ) λxtx , λyt y , λx 2 + λy 2 tz  Cx Cy Cz

(

)

(9)

Fig. 2. Our rescue robot participated in the RoboCup 2016 RRL competition. The left figure shows our robot running on the elevated ramps, while the right one shows our robot climbing the stair debris. A 2D camera is assembled right over the top of a 2D LiDAR.

2) Status switching The rough ground of the rescue scenarios can be overcome by adjusting the orientation of the LiDAR using a stabilizer with two servos and the pose information gained by the Monocular SLAM. However, the adjustability of the stabilizer is limited. It cannot deal with the steep stairs and ramps. Therefore, we set a working range [ pz min , pz max ] for the LiDAR SLAM, where pz min and pz max are the lowest and the highest position of the terrains. Once the robot is beyond this range (into the state of M shown in (14)), we suspend the mapping thread of the LiDAR SLAM and use the pose obtained from the monocular SLAM to continue the localization. Once the robot is back to this range (into the state of H), we resume the mapping thread, reactivate the scan matching and continue calculating the scale factor λ to align the pose tracked by these two SLAM systems. The state switching can be summarized as follows:

 H , if pz min ≤ pˆ Cz ≤ pz max State =   M , else

Fig. 3. Using the monocular SLAM in the USAR. (a) shows the real-time robot/camera pose tracking and feature-based mapping, while (b) shows the current frame and the extracted ORB features (colored in green), and (c) shows the victim recognition result (marked by a purple squre). In (a), the blue boxes represent the key frames, the green line represents the robot trajactory, and the points represent the features extracted from the rescue environment.

(10)

V. EXPERIMENTS In this work, we conducted two groups of experiments. In the first group, we tested the proposed monocular SLAM system in a RoboCup rescue environment (shown in Fig. 4(c)), while in the other group, we tested the monocular vision and LiDAR combined SLAM system in the same environment including stairs and ramps. All the experiments were performed using our rescue robot equipped with a 2.4GHz i7 CPU and 4GB memory (shown in Fig. 2). A. Monocular SLAM for USAR missions In the first of experiments, we tested the proposed monocular SLAM in the USAR scenarios. The experimental results show that the monocular SLAM can realize most USAR missions, including the localization (shown in Fig. 3(a)), object recognition (shown in Fig. 3-(c)) and mapping (shown in Fig. 4) using the same ORB feature. Although the proposed monocular SLAM can provide a 3D expression of the rescue environment, this map is difficult for operators to understand (shown in Fig. 4-(a)). In addition, the space scale of the map obtained by monocular SLAM is not the real scale of the environment, so the positions of the victims are not accurate, which makes it hard for the rescue teams to carry out rescue missions.

(a)

(b)

(c) Fig. 4. The map built by the monocular SLAM. (a) is the top view of the feature-based map, while (b) is the front view. (c) is the left view of the rescue environment corresponding to these maps. The points represent the features extracted from the rescue environment. The purple ones indicate the area where the victim exists, and the red ones indicate the area where the robot is looking at.

B. Combining 2D LiDAR SLAM and monocular SLAM In this group of experiments, we tested the combined system by remotely controlling the rescue robot climbing up a highly inclined ramp, turning an U-turn on the high platform and going down the stairs. The SLAM results shown in Fig. 5 illustrate that the combined system can accurately track the robot/camera pose even when the rescue robot is climbing the stairs and ramps, and can continue mapping the environment after the robot goes back to the ground. Then, we compared our combined SLAM system with the original Hector-SLAM. The experimental results shown in Fig. 6 illustrate that our system is more robust than the original Hector-SLAM. After utilizing LSM to align the two SLAM methods, the combined SLAM system can accurately localize the robot and map the rescue environment. Meanwhile, the positions of the detected victims can also be accurately marked in the 2D map using the object recognition module.

(a)

VI. CONCLUSION In this paper, we propose a real-time monocular SLAM system enhanced with an object recognition module for the USAR. To our knowledge, this is the first trial to use a monocular SLAM in the USAR on ground mobile robots. This monocular SLAM can realize most USAR missions and is a promising basis for high-level rescue robot autonomy. Ho wever, the feature-based representation o f the environment is hard for operators to understand and use. So, we combine the monocular SLAM with the 2D LiDAR SLAM. The combined system can not only provide a clear map of the disaster environment with real scale, which can greatly help the operator to remotely control the rescue robot, but also solve the localization tracking problem of the 2D LiDAR SLAM when the robot climbing stairs and ramps, which enhances the robustness of the system in more complex situations (e.g. running up and down the stairs and ramps). However, we only use LSM to align these two SLAM methods and separately apply them in different situations. In the future, we will try to integrate these two

(b)

(c)

(d) (a)

(b)

Fig. 5. The SLAM results after combining the LiDAR SLAM and the monocular SLAM. (a) shows that the rescue robot is running on the ground and mapping the rescue environment using the LiDAR SLAM, while (b) shows that the rescue robot has already gone up the ramp and stairs using the monocular SLAM to track the robot pose, and can continue mapping the environment after the robot goes back to the ground. The purple line representes the 2D robot trajactory obtained by the LiDAR SLAM, while the green one is the 3D robot trajactory obtained by the monocular SLAM when the robot is clibming strairs and the ramp.

Fig. 6. Comparing the map obtained by the combined SLAM system with that obtained by the original Hector-SLAM. (a) shows the top view of the map obtained by the combined SLAM system, while (b) shows the front view. (c) zooms in on a particular section in (a) to show the marked position of victims and the robot trajectory on stairs and ramps. (d) shows the result of the original Hector-SLAM. In (a) and (c), the red points are the positions of the recognized survivors. The purple line representes the 2D robot trajactory obtained by the LiDAR SLAM, while the green one is the 3D robot trajactory obtained by the monocular SLAM when the robot is clibming strairs and the ramp.

methods in a more compact way to achieve better performance. We will also try to enhance the map to be more readable and achieve high-level autonomy for rescue robots. ACKNOWLEDGMENT This work is supported by National Science Foundation of China (No. 61403409 ， No. 61503401), and China Postdoctoral Science Foundation (No. 2014M562648) REFERENCES [1] X. Chen, H. Lu, J. Xiao, H. Zhang, and P. Wang, "Robust relocalization based on active loop closure for real-time monocular SLAM," in The 11th International Conference on Computer Vision Systems (ICVS), 2017. [2] Y. Liu and G. Nejat, "Robotic urban search and rescue: A survey from the control perspective," Journal of Intelligent & Robotic Systems (JIRS), vol. 72, p. 147, 2013. [3] A. J. Davison, "Real-time simultaneous localisation and mapping with a single camera," in IEEE International Conference on Computer Vision (ICCV), 2003, pp. 1403-1410. [4] E. Eade and T. Drummond, "Monocular SLAM as a graph of coalesced observations," in IEEE International Conference on Computer Vision (ICCV), 2007, pp. 1-8. [5] G. Klein and D. Murray, "Parallel Tracking and Mapping for Small AR Workspaces," in IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), 2007, pp. 225-234. [6] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, "ORB-SLAM: A Versatile and Accurate Monocular SLAM System," IEEE Transactions on Robotics (ITRO) vol. 31, pp. 1147-1163, 2015. [7] J. Engel, T. Schöps, and D. Cremers, "LSD-SLAM: Large-scale direct monocular SLAM," in European Conference on Computer Vision (ECCV), 2014, pp. 834-849. [8] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, "DTAM: Dense tracking and mapping in real-time," in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2320-2327. [9] S. Kohlbrecher, O. Von Stryk, J. Meyer, and U. Klingauf, "A flexible and scalable slam system with full 3d motion estimation," in IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 2011, pp. 155-160. [10] G. Grisetti, C. Stachniss, and W. Burgard, "Improved techniques for grid mapping with rao-blackwellized particle filters," IEEE Transactions on Robotics (ITRO), vol. 23, pp. 34-46, 2007. [11] W. Hess, D. Kohler, H. Rapp, and D. Andor, "Real-time loop closure in 2D LIDAR SLAM," in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1271-1278. [12] L. P. Ellekilde, S. Huang, J. Valls Miró, and G. Dissanayake, "Dense 3D map construction for indoor search and rescue," Journal of Field Robotics (JFR), vol. 24, pp. 71-89, 2007. [13] Y. Yokokohji, M. Kurisu, S. Takao, Y. Kudo, K. Hayashi, and T. Yoshikawa, "Constructing a 3-D map of rubble by teleoperated mobile robots with a motion canceling camera system," in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2003, pp. 3118-3125. [14] Z. Zhang, G. Nejat, H. Guo, and P. Huang, "A novel 3D sensory system for robot-assisted mapping of cluttered urban search and rescue environments," Intelligent Service Robotics (ISR), vol. 4, pp. 119-134, 2011. [15] F. Jesus and R. Ventura, "Combining monocular and stereo vision in 6D-SLAM for the localization of a tracked wheel robot," in IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 2012, pp. 1-6. [16] A. Birk, B. Wiggerich, H. Bülow, M. Pfingsthorn, and S. Schwertfeger, "Reconnaissance and camp security missions with an Unmanned Aerial Vehicle (UAV) at the 2009 European Land Robots Trials (ELROB)," in IEEE International Workshop on Safety, Security and Rescue Robotics (SSRR), 2009, pp. 1-6. [17] A. M. Pinto, H. Pinto, and A. C. Matos, "A Mosaicking Approach for Visual Mapping of Large-Scale Environments," in International Conference on Autonomous Robot Systems and Competitions (ICARSC), 2016, pp. 87-93.

[18] X. Chen, H. Zhang, H. Lu, J. Xiao, P. Wang, Q. Qiu, et al. (2017). RoboCup Rescue 2017 Team Description Paper NuBot. Available: https://www.trustie.net/organizations/23 [19] Y. Liu, Y. Zhong, X. Chen, P. Wang, H. Lu, J. Xiao, et al., "The design of a fully autonomous robot system for urban search and rescue," in IEEE International Conference on Information and Automation (ICIA), 2016, pp. 1206-1211. [20] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: An efficient alternative to SIFT or SURF," in International Conference on Computer Vision (ICCV), 2011, pp. 2564-2571. [21] V. Lepetit, F. Moreno-Noguer, and P. Fua, "Epnp: An accurate o (n) solution to the pnp problem," International Journal of Computer Vision (IJCV), vol. 81, p. 155, 2009. [22] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, "g2o: A general framework for graph optimization," in IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3607-3613. [23] J. Sivic and A. Zisserman, "Video Google: A text retrieval approach to object matching in videos," in IEEE International Conference on Computer Vision (ICCV), 2003, pp. 1470-1477.