Monocular Vision based Autonomous Navigation for a Cost-Effective ...

CONFIDENTIAL. Limited circulation. For review only.

Monocular Vision based Autonomous Navigation for a Cost-Effective Open-Source MAVs in GPS-denied Environments Inkyu Sa, Hu He, Van Huynh, Peter Corke Abstract— In this paper, we present monocular vision guided autonomous navigation system for Micro Aerial Vehicles (MAVs) in GPS-denied environments. The major problem of a monocular system is that the depth scale of the scene can not be determined without prior knowledge or other sensors. To address this problem we solve a cost function, which consists of a drift-free altitude measurement together with up-to-scaled position estimation from a visual sensor. We also evaluate the proposed system in terms of: accuracy of scale estimator, controller performance and accuracy of state estimation by comparing with motion capture ground truthed data. All resources including source code, tutorial documentation and system models are available online4 .

I. INTRODUCTION In recent years, aerial robotics is an active and growing field in terms of its broad applications in civil and military domains, such as aerial transportation, manipulation and surveillance. Particularly, quadrotor micro-aerial vehicles (MAVs) are one of the most flexible and adaptable platform for aerial research due to their small size, high maneuverability. Thus, remarkable large amount of recent work have been presented for aerial robotic research using quadrotor platform. [1] [2] [3] [4] [5]. Most of this work, however, is either performed with expensive platforms or platforms where the operating software is closed which makes it difficult for other researchers to replicate or extend the work. Therefore, we are motivated [6] to provide open source software for low cost out of the box quadrotor platforms (e.g., AR.Drone) to achieve the fundamental abilities of hovering and way point following, on which some high level applications (e.g., manipulation, navigation and exploration) can be built. In this paper, we choose a Parrot AR.Drone as the experimental platform. Using the sensors on this platform, we applied a key-frame based visual SLAM system(i.e., PTAM [7]) to provide 6 DOF pose estimations. Since PTAM only provides up-to-scaled positions in Euclidean space, which cannot be directly applied for autonomous control, we estimated the metric scale with the assistance of sonar measurements. Then a Kalman Filter (KF) is used for state estimation of the AR.Drone, which provides input to a PID controller to generate control commands to achieve autonomous manoeuvres. Indoor and outdoor experiments are undertaken to show the performance of our system, and motion capture device (Vicon) based ground truthed The authors are with the CyPhy Laboratory, School of Electrical Engineering and Computer Science, Queensland University of Technology, Australia. [email protected], [email protected], thvan [email protected], [email protected]

AR.Drone coordinate

Fr ontcamer a

{B} x

y

z θ

φ Sonarsensor

ψ

Fig. 1. AR.Drone platform. It has a front and a downward camera. The ultrasonic sensor estimates altitude. Red, green and blue denotes x,y and z axis used in this paper. φ ,θ and ψ is rotation along x,y and z axis respectively and is governed right-hand rule. Accurate lateral velocity estimation based on aerodynamics and optical flow is a key feature of this product and allows it to fly stably.

experiments are presented to evaluate the performance quantitatively. The remainder of this paper is organized as follows. Section I-A addressed the related works, and methodologies are described in Section II. In Section III, we show indoor and outdoor experiments and Vicon ground truth based evaluation. In the end, conclusions and further discussion are drawn in Section IV. A. Related work Research on MAVs has been accelerated with the help of considerable growth in sensor technology and integrated circuit technology. UAVs are getting smaller and lighter with an increase in capabilities allowing them to fly for longer periods of time with larger payloads. Ascending Technologies1 has been developing a wide range of UAVs for commercial and research purposes. Although these platforms can provide scientifically rich information such as IMU measurements in SI unit and high update rates, high cost could be a barrier for both amateur hobbyists and researchers. MikroKopter2 has also been developing commercial grade UAVs that are suitable for broadcasting systems or novel filming systems. Even though they have advantages of being open source, modest cost and having GPS based autonomous position holding system, the lack of academic oriented documentation has made it difficult for researchers to access resources [8] [9]. Recently, Parrot3 has released high-tech recreational 1 Ascending

Technologies : http://www.asctec.de : http://www.mikrokopter.de 3 Parrot SA. : http://ardrone.parrot.com 2 MikroKopter

Preprint submitted to 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Received February 3, 2013.


UAVs that are capable of indoor and outdoor navigation and utilising low-cost IMUs, vision systems and aerodynamic models [10]. The company has also disclosed the software development toolkits and academic oriented documentation for both developers and researchers. We are now able to access a variety of sensor data from this platform: 3 axis accelerations, 3 axis gyros, 3 axis magnetometers, 3 axis angles, barometer , sonar height measurement, images from front and down-looking camera. In addition, the platform is economical and costs approximately USD300 and is easy to equip with tools available from local retail stores. Some researchers have used this platform for research purposes in the past couple of years. [11] [12] [13]. In order to fly indoor and outdoor, micro Unmanned Aerial Vehicles must estimate current states for control. GPS guided system have been utilised for outdoor navigation and has shown stable performance [14]. For indoor, GPS-denied, navigation flying robots must constantly observe the surrounding environment to estimate their current location. S.Shen et al. [15] and D.Ivan et al. [16] developed the platforms for surveillance and exploring indoor environments. Their UAVs have the capability of mapping multiple-floor building and autonomous navigation in a given environment. However they require a high energy-consuming laser range finder for state estimation which in turn compromises flying time. Daniel et al. [17] [3] have suggested designing of dynamically feasible trajectories to drive quadrotors. A controllerparameterised fragment of trajectories is generated and then incrementally refined through successive experimental trials to compensate noise in the actuators and IMU sensors. Even though these promising experimental results showed potential for flying aerial robots, these experiments were performed with motion capture devices which can provide 6 degrees of freedom with sub-millimeter accuracy states with 100Hz in limited space. In this paper we focus on state estimation and control only using a monocular camera without any prior knowledge of the indoor and outdoor environment. The major challenge toward this goal is that the estimation of scale cannot be determined without either prior knowledge such as known metric information of the scene or with the aid of other sensors. From the literature, various researchers have demonstrated using a RGBD camera, Kinect [18], or stereo vision camera approach [19] on a UAV platform not only to estimate states but to resolve scale estimation problems. Processing RGBD data and stereo images require intensive computation power and flying robots must be equipped with a high payload computer. As a result, it reduces flying time dramatically. A monocular camera based approach could be an alternative to this drawback. G. N¨utzi et al. [20] have presented scale estimation by employing spline fitting with least square optimization and using Extended Kalman Filter (EKF) framework with IMU and vision data. In their work, they only presented simulated results for least square optimization with artificial noise and experimented using Matlab, which has limitations when running in real time. In addition, acceleration sensors cannot be modelled by simply

adding white gaussian noise since it also contains bias that could be effected by highly nonlinear factors, temperature, the vibration of the frame, misalignment and scale factors. In contrast to this, we propose scale estimation using the un-biased and drift-free metric altitude measurement and forward camera vision measurements with real time least square solver. J.Engel et al. [12] showed computationallyefficient closed form scale calculations and state estimation with a EKF. Recently they also released source code to the public. Although their work is similar to our proposed approach, it is difficult to argue that scale estimation and state estimation are accurate by comparing with manually measured ground truth. Moreover, it is even more challenging to measure while flying. We could also achieve competitive results and will return knowledge to the community that will allow the serious amateur class or other researchers to access and reproduce the same results for their own purposes. We evaluate the performance of our implementation on AR.Drone platform with respect to Vicon ground truth, which is missing in [12]. A video demonstration is available online at www.youtube.com/watch?v=zvNhFcbvXSI II. M ETHODOLOGY In this section, we state the key parts of the proposed system (Fig. 5). Specifically, we describe the monocular SLAM system for camera pose tracking, and introduce a metric scale estimation scheme to transfer up-to-scaled positions from PTAM to metric form. Then metric horizontal positions x and y from the vision system are fused with velocities vx and vy from the AR.Drone’s internal implementation using a KF. Lastly, we present system identification and PID controllers for x, y, z and ψ. A. Monocular SLAM system As described in [12], we also use the PTAM system. After take-off, the automatic initialization is implemented for PTAM to generate initial maps. In order to achieve a successful initialization, several salient features in the field of view of the front camera are needed. Following initialisation the map and camera poses are updated continually. However, camera positions are still up-to-scale with respect to the PTAM frame. In section II-B, the scale estimation method will be described in detail. Yaw estimation from PTAM is more robust than that from the IMU sensor which has significant drift. As shown in Fig. 2, IMU based yaw estimation started to drift noticeable after about 40 s. Note that the quadrotor is landed after around 110 s for the yaw comparison. Quantitative yaw estimation performance between PTAM and ground truth can be found in Fig. 8. B. Scale estimation As mentioned above, in order to control the quadrotor using pose estimation from PTAM, scale is required which can convert PTAM based poses into metric form. Vertical displacement between time tk and tk+1 for the forward facing camera and sonar should the same due to camera and sonar having rigid configuration on the quadrotor.



15

vSLAM yaw IMU yaw

Degree

10

2) Process model: Linear constant-velocity process model is

5

xk+1 =xk + δ (cos(ψk ) − sin(ψk ))x˙k

0

ï5 0

20

40

60

80

100

120

yk+1 =yk + δ (sin(ψk ) + cos(ψk ))y˙k

140

Time(s)

x˙k+1 =x˙k

2

Fig. 2. Yaw angle comparison between PTAM and IMU. Yaw angle esti0 mation from IMU clearly shows drifting quickly, however PTAM doesn’t. Quadrotor is landed ∼110 s.

y˙k+1 =y˙k

Degree

Error yaw

ï2

ï4 0

z in PTAM z in world z after fitting

ï6 0 ï0.02

20

40

60

80

100

120

140

Time(s)

where δ is sample rate of the Kalman Filter running at 160 Hz and ψk is the drift-free yaw angle estimation from vSLAM. 3) Measurement model:

z coordinate in PTAM and world(m)

ï0.04

H =I4

(3)

zk =HXk

(4)

ï0.06

where zk is a 4 × 1 position and velocity measurement vector and all states are observable. Position is obtained from PTAM with scale estimation at 18 Hz and velocity comes from the AR.Drone onboard at 160Hz. We choose process noise, Q, and measurement noise, R, as

ï0.08

ï0.1

ï0.12

ï0.14 0

20

40

60

80

100 Time(s)

120

140

160

180

w pos = 0.1 m, wvel = 0.3 m/s

200

v pos = 0.1 m, vvel = 0.05 m/s Fig. 3. Scale estimation for PTAM. The dash line denotes the original z from PTAM, while bold solid line is the metric form z from PTAM with estimated scale λ . The thin solid line is the altitude measure from a sonar sensor. This plot implies that the scale is accurately recovered since the metric measurement and the scale recovered estimation are close.

The remaining steps, prediction and update, follow normal discrete Kalman Filter procedures. More details can be found in the technical documentation4 . D. Control

4zsonar k

4zcamera k

Let and denote the vertical distance change for sonar and camera between time tk and tk+1 , respectively. It should follow 4zsonar ≈ λ 4zcamera with k k unknown scale λ ∈ R. In order to compensate the occasional error introduced by sensor noise, we minimise the residual error cost function in Eq. 1 over time duration T . T

− λ 4zcamera k2 λ ? := argmin ∑ k4zsonar k k

(1)

λ k=1

As the cost function has quadratic form, it can be easily solved by Levenberg-Marquardt algorithm. Fig. 3 illustrates that the proposed scale estimation method that can convert the up-to-scaled pose from PTAM into metric form accurately. C. State estimation

We exploit traditional position PID controllers for x,y,z and ψ. An onboard attitude controller stabilizes φ and θ . Inputs for the controllers are position or angle error in between a goal and a current state. Outputs are desired angle for φ , θ and desired velocity for z and ψ. System identification is also conducted for gain tuning and future research. Researchers can purchase a factory calibrated uniform platform from local retail stores and apply these dynamics models directly. Fig. 4 shows an example for height system identification. We could find the 3rd order system transfer function: H(s) =

5.74 s3 + 2.444s2 + 10.06s + 3.009

(5)

More details and system identification models for x,y,ψ can be found in the technical documentation4 . III. E XPERIMENTS AND RESULTS

A Kalman Filter is employed to track 4 states, lateral position and velocity, with a linear-velocity process model. Furthermore all states are observable. Although it is also possible to estimate states using more sophisticated filters such as the Extended Kalman Filter [12] or Particle Filter, the AR.Drone is already capable of navigation and has control system that relies on an aerodynamics model, IMU and computer vision. To our best knowledge, it is sufficient to use a computational efficient Kalman Filter for state estimation. 1) States: States are expressed w.r.t world coordinate. T Xk = xk yk x˙k y˙k (2)

In this section, we present results of the controller performance evaluation while hovering that includes position and orientation estimation and a state estimation performance evaluation with ground truth. A. Implementation All software is implemented on the ROS platform and runs in real-time. Fig. 5 shows the overall system implementation. Autonomy driver plays roles in retrieving and sending data from the MAV. vSLAM is a key framing based front-end SLAM solution that can provide up-to-scale position and orientation. Scale estimator runs Levenberg-Marquardt



Measured and simulated model output 0.5

0.2

Model Height

VICON X Goal X

Metre

0.1 0.4

0 ï0.1 ï0.2 0

0.3

10

20

30

40

50

60

70

80

90

Time(s) 0.2 0.2

VICON Y Goal Y

Metre

Metre

0.1 0 ï0.1

0.1

ï0.2 0

10

20

30

40

50

60

70

80

90

Time(s) 0

Metre

0.85

ï0.1

ï0.2 0

VICON Z Goal Z 0.8

0.75 0 2

4

6 Time

8

10

10

20

30

40

Fig. 4. Dotted line presents altitude measure and solid line is estimated dynamics model. We use MATLAB continous-time transfer function model in order to estimate the model which fits 87.52%.

50

60

70

80

90

Time(s)

12

Fig. 6. Controller performance evaluation with ground truth data while hovering. Note that top two rows and the bottom have the different scale in vertical axis. The reason why x and y are started from non-zero state is 0 second means when vSLAM is initialized after taking off.

< = 0 > 1 2 ? 3 @ ;4 5 6 /7 8 9 :.A -B + ,C *# D $ % )"E (F !& G 'K JH L M IN ~ O }P |m Q u w v{zyxtR sln ko p q rS T jU g h V ia b d ce W f_ `^]X \Y Z [

Scale estimator

Sonar@20Hz

LM nonlinear solver

vSLAM

TABLE I

Parallel Tracking and Mapping @18Hz

C ONTROLLER PERFORMANCE RESULTS : T IME PERIOD = 0 S ∼ 90 S . RMSE x,y,z(m) RMSE z(m)

Image@18Hz

Wireless Lan

AR.Drone Platform

Autonomy driver

0.0386 0.0057

Pose estimator IMU@160Hz

PID controllers x,y,z,yaw@160Hz

Kalman Filter @160Hz 0.3

Offboard computer

VICON X OUR X

0.2 Metre

0.1 0 ï0.1

Fig. 5. System overview. AR.Drone and a offboard computer communicate through WiFi network. Each box denotes ROS nodes running on a remote computer.

ï0.2 0

10

20

30

40

50

60

70

80

VICON Y OUR Y

0.2 Metre

0.1 0 ï0.1 ï0.2 0

10

20

30

40

50

60

70

80

90

Time(s) 0.85 VICON Z OUR Z Metre

optimization given observation data to estimate scale at 18 Hz. Pose estimator is an implementation of the Kalman Filter algorithm. Four PID controllers close the loop for each x,y,z and ψ in PID controllers. All software is available online4 .

90

Time(s) 0.3

0.8

0.75 0

10

20

30

40

50

60

70

80

90

Time(s)

B. Controller performance evaluation

0.3 Error X

Metre

0.2 0.1 0 ï0.1 ï0.2 0

10

20

30

40

50

60

70

80

90

Time(s) 0.3 Error Y

Metre

0.2 0.1 0 ï0.1 ï0.2 0

10

20

30

40

50

60

70

80

90

Time(s) 0.015 Error Z 0.01 Metre

To evaluate the performance of the controllers the MAV is flown within an indoor motion capture device system. Since this device can provide sub-millimetre accuracy position estimation at 100 Hz, we regard it as the ground truth. Fig. 6 shows given target position (x=0,y=0,z=0.8) and states of the quadrotor. The robot lost wireless connection at 30s for seconds and it introduces unstable manoeuvre. After recovering the network, the robotic platform is able to restore its normal state. Fig. 9 depicts controller performance in different environments. The MAV is able to hold its position more accurately in indoor environments. Outdoor flying is more challenging with wind disturbances as well as changing illumination which causes more error on feature tracking of PTAM and scale estimation. Table I shows the root mean squared error of Euclidean distance in 3D space between the goal and the MAV.

0.005 0 ï0.005 ï0.01 0

10

20

30

40

50

60

70

80

90

Time(s)

Fig. 7. Top two rows are translation state estimation and the third row is raw ultrasonic height measurement with ground truth data. The bottom three rows are their errors while hovering. Note that wireless connection is lost around from 30s to 35s and it introduces inaccurate state estimation. Although it affects on control performance and state estimation, the quadrotor recovers its state tracking after re-establishing the connection.

4 https://code.google.com/p/ardrone-qut-cyphy



10 VICON roll OUR roll

Deg

5

IV. C ONCLUSIONS

0 ï5 ï10 0

10

20

30

40

50

60

70

80

90

Time(s) 10 VICON pitch OUR pitch

Deg

5 0 ï5 ï10 0

10

20

30

40

50

60

70

80

90

Time(s) 5 VICON yaw OUR yaw

Deg

0 ï5 ï10 ï15 0

10

20

30

40

50

60

70

80

90

Time(s)

4 Error roll

Deg

2 0 ï2 ï4 0

10

20

30

40

50

60

70

80

90

Time(s) 4 Error pitch

Deg

2 0 ï2 ï4 0

10

20

30

40

50

60

70

80

90

Time(s) 4 Error yaw

Deg

3 2 1 0 ï1 0

10

20

30

40

50

60

70

80

90

Time(s)

Fig. 8. Top two rows are orientation state estimation from AR.Drone onboard and the third row is drift-free yaw estimation from PTAM. The bottom three rows are their errors while hovering. Roll and pitch angle estimation is accurate and yaw angle estimation is getting more accurate as more features are observed. TABLE II S TATE ESTIMATION ACCURACY RESULTS :T IME PERIOD = 0 S , 90 S RMSE x(m)=0.0208 RMSE φ (◦ )=0.8875

RMSE y(m)=0.0748 RMSE θ (◦ )=0.6054

RMSE z(m)=0.0308 RMSE ψ(◦ )=0.7529

In this paper, we have presented a single camera based autonomous navigation system that can be used for a MAV in GPS-denied environments. A major challenge for systems with monocular vision is recovering the scale of the estimated scene. We propose a scale estimator that optimizes a cost function fromed from altitude measurement and forward facing visual measurement. By comparing the state estimation with the ground truth, we are able to confirm that the scale estimator produces accurate results. We demonstrate in a variety of environments to show the robustness and stability of our system. Open-source, self explanatory technical documentation and system identifications distill our learnings about the AR.Drone and can help other researchers to reproduce the same results and build further research on top of the proposed system. We have two ongoing plans: exploration in large GPS-denied environment and transporting service over long distance with MAVs. The feature tracking of PTAM system we used in this paper was fragile with a fairly narrow field of view camera which could lose tracking when yaw angle rapidly changes more than 90◦ /s. We found ambiguity in the forward looking camera when insufficient translation led to inaccurate estimates of camera position and the 3D map estimation. To improve this, IMU-aided vSLAM systems can determine rotational or translational motion. Grasping an object will induce change in the system dynamics and an adaptive control with online learning may deal with it. ACKNOWLEDGMENT The authors would like to thank Aaron Mcfadyen and Luis Mejias for assistance while using the Vicon system at the Australian Research Centre for Aerospace Automation (ARCAA). R EFERENCES

C. Estimator accuracy performance evaluation We perform estimator accuracy evaluation by using Vicon data as a comparison. Fig. 8 and Fig. 7 display the proposed state estimation and its error with ground truth data. The estimator only uses a process model around 30s until new observations are available. Table II summaries the performance of the state estimator. Note that translation state estimation along x, y axis come from a Kalman Filter and z and φ , θ is from AR.Drone onboard measurements. Key framing based vSLAM is able to provide drifting free yaw angle estimation. As shown in Fig. 8, vSLAM observes more features in order to create a 3D map and it leads to gradually reducing yaw angle estimation error over time. The AR.Drone uses a complementary filter to fuse de-biased gyro for attitude estimation, φ , θ and ψ. Even though φ and θ can be compensated by using de-based acceleration measurement as a reference, ψ is infeasible to correct without exteroceptive sensor such as visual sensor or magnetometers.

[1] S. M. Weiss, Vision based navigation for micro helicopters. PhD thesis, ETHZ, 2012. [2] R. Mahony, V. Kumar, and P. Corke, “Multirotor Aerial Vehicles: Modeling, Estimation, and Control of Quadrotor,” Robotics Automation Magazine, IEEE, vol. 19, pp. 20 –32, Sept. 2012. [3] D. Mellinger and V. Kumar, “Minimum Snap Trajectory Generation and Control for Quadrotors,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2520– 2525, May 2011. [4] P. Pounds and R. Mahony, “Design principles of large quadrotors for practical applications,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 3265 –3270, 2009. [5] H. Lim, H. Lee, and H. Kim, “Onboard flight control of a micro quadrotor using single strapdown optical flow sensor,” in Proceedings of IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), pp. 495 –500, Oct. 2012. [6] P. I. Corke, Robotics, Vision & Control: Fundamental Algorithms in MATLAB. Springer, 2011. ISBN 978-3-642-20143-1. [7] G. Klein, Visual Tracking for Augmented Reality. PhD thesis, University of Cambridge, 2006. [8] I. Sa and P. Corke, “Vertical Infrastructure Inspection using a Quadcopter and Shared Autonomy Control,” in the International Conference on Field and Service Robotics(FSR), July 2012. [9] I. Sa and P. Corke, “System Identification, Estimation and Control for a Cost Effective Open-Source Quadcopter,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2202– 2209, May 2012.



1

0.8

0.6

0.4

y(metre)

0.2

0

ï0.2

ï0.4

ï0.6

ï0.8

ï1 ï1

ï0.8

ï0.6

ï0.4

ï0.2

0 x(metre)

0.2

0.4

0.6

0.8

1

ï0.8

ï0.6

ï0.4

ï0.2

0 x(metre)

0.2

0.4

0.6

0.8

1

1

0.8

0.6

0.4

y(metre)

0.2

0

ï0.2

ï0.4

ï0.6

ï0.8

ï1 ï1

Fig. 9. Top and bottom rows show indoor and outdoor experiments respectively. The first column is images of the environment where the MAV is flown. The second column shows 3D points reconstruction and position estimation from vSLAM. The third column shows the corresponding images from MAV’s forward camera with feature extraction. The last column shows the top-view x,y position estimation while hovering. These clearly show that manoeuvring outdoor is more challenge than indoor environments due to wind gusts or dynamic features.

1

VICON Goal

VICON Goal

1.5

0.9

0.8

1

0.7

x(metre)

z(metre)

0.6

0.5

0.5

0.4

0.3

0

0.2

0.1

ï0.5

0

ï0.5

0

0.5 x(metre)

1

0.5

1.5

0

ï0.5 y(metre)

ï1

ï1.5

1

z(metre)

0.8 0.6

VICON Goal

0.4 0.2 0 1.5 1 ï1.5 0.5 ï1 ï0.5

0 0 ï0.5 x(metre)

0.5

Micro Air Vehicles, vol. 3, pp. 183–200, Dec. 2011. [14] G. M. Hoffmann, Autonomy for Sensor-Rich Vehicles: Interaction between Sensing and Control Actions. PhD thesis, Stanford Univ., 2008. [15] S. Shen, N. Michael, and V. Kumar, “Autonomous Multi-Floor Indoor Navigation with a Computationally Constrained MAV,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 20– 25, May 2011. [16] I. Dryanovski, W. Morris, and J. Xiao, “An Open-Source Pose Estimation System for Micro-Air Vehicles,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4449– 4454, May 2011. [17] D. Mellinger, A. Kushleyev, and V. Kuma, “Mixed-Integer Quadratic Program Trajectory Generation for Heterogeneous Quadrotor Teams,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 477– 483, May 2012. [18] Huang, Bachrach, Henry, Krainin, Maurana, Fox, and Roy, “Visual Odometry and Mapping for Autonomous Flight using and RGB-D Camera,” in Int. Symposium on Robotics Research (ISRR), Aug. 2011. [19] G. H. Lee, F. Fraundorfer, and M. Pollefey, “MAV Visual SLAM with Plane Constraint,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 3139– 3144, May 2011. [20] G. Nuetzi, S. Weiss, D. Scaramuzza, and R. Siegwart, “Fusion of IMU and Vision for Absolute Scale Estimation in Monocular SLAM,” Journal of Intelligent and Robotic Systems, vol. 61, pp. 287–299, 2011.

y(metre)

Fig. 10. Way points following results. Top left shows side view of the trajectory for evaluation of z controller while way points following. Top right is top view and it clearly shows scale estimation is accurate. The bottom perspective view presents qualitatively evaluation of way points following.

[10] P.-J. Bristeau, F. Callou, D. Vissi`ere, and N. Petit, “The Navigation and Control technology inside the AR.Drone micro UAV,” in 18th IFAC World Congress, pp. 1477–1484, 2011. [11] C. Bills, J. Chen, and A. Saxena, “Autonomous MAV flight in indoor environments using single image perspective cues,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on, pp. 5776 –5783, May 2011. [12] J. Engel, J. Sturm, and D. Cremers, “Camera-Based Navigation of a Low-Cost Quadrocopter,” in Proceedings of IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), pp. 2815– 2821, Oct. 2012. [13] N. Dijkshoorn and A. Visser, “Integrating Sensor and Motion Models to Localize an Autonomous AR.Drone,” in International Journal of