An Image-Processing-Based Gimbal System Using

2 downloads 0 Views 582KB Size Report
On the ground, the pilot is outfitted with virtual reality goggles with integrated attitude ... Key words: First person view (FPV), remote vision, fisheye, UAV, goggles, ...
Computer Technology and Application 2 (2011) 85-93

An Image-Processing-Based Gimbal System Using Fisheye Video Osamah Rawashdeh and Belal Sababha Department of Electrical and Computer Engineering, Oakland University, Rochester, Michigan 48309, United States

Received: November 30, 2010 / Accepted: December 28, 2010 / Published: February 25, 2011.

Abstract: This paper presents a low-cost remote vision system for use in unmanned aircraft that provide a first person view (FPV) to vehicle operators in real-time. The system does not require a traditional electromechanical gimbal setup. Instead, the system uses a wide-angle (fisheye) lens and a video camera setup that is fixed on the vehicle and captures the full viewing area as seen from the cockpit in each video frame. Video is transmitted to a ground station wirelessly. On the ground, the pilot is outfitted with virtual reality goggles with integrated attitude and heading sensors. The received video is recertified and cropped by the ground station to provide the goggles with the appropriate view based on head orientation. Compared to traditional electromechanical setups, the presented system features reduced weight, reduced video lag, lower power consumption, and reduced drag on the airborne vehicle in addition to requiring only a unidirectional downlink. The video processing is preformed on the ground, further reducing computational resources and bandwidth requirements. These advantages, in conjunction with the advancement in miniature optical sensors and lenses, make the proposed approach a viable option for miniature remotely controlled vehicles. The system was successfully implemented and tested using an R/C airplane. Key words: First person view (FPV), remote vision, fisheye, UAV, goggles, gimbals.

1. Introduction One of the greatest difficulties in remotely operating any vehicle is giving the operator an easy-to-understand, intuitive way to control it. An effective way of dealing with this problem is through providing a first-person vision-based interface. From an on-board perspective, the user can better understand the surroundings and get more intuitive feedback of the position and the state of the remotely operated vehicle (ROV). This paper presents an approach to providing a first-person view (FPV) to operators that allows Osamah Rawashdeh, Ph.D., P.E., assistant professor, research fields: embedded systems design, fault-tolerance, reconfigurable computing, instrumentation and autonomous aerial vehicles. Corresponding author: Belal Sababha, doctoral candidate, research fields: embedded systems design, fault-tolerance, and reconfigurable computing. E-mail: [email protected].

panning and tilting without the use of electromechanical actuators to physically pan and tilt the camera. This results in a lower size, weight, and cost. These design constraints are becoming increasingly critical as ROVs are becoming smaller and more commonly employed for surveillance and scouting applications in confined and/or unpredictable environments. This paper describes a proof-of-concept prototype of the system, which was tested using a remote control airplane. The paper is organized as follows: section 2 gives a background about fisheye lenses and fisheye camera modeling and projection. Section 3 shows the experimental calibration. Section 4 describes the First-Person View approach. Section 5 explains the system architecture. Section 6 discusses the system

An Image-Processing-Based Gimbal System Using Fisheye Video

86

performance. Section 7 overviews future work. The paper is concluded in Section 8.

2. Fisheye-Lens Background This section describes fisheye lens characteristics, features, parameters, and distortions. Calibration of the selected fisheye lens and sensor used follows in section 3. 2.1 Fisheye Lenses Fisheye lenses are gaining popularity in various applications due to their wide field of view (FOV). Unlike conventional lenses with narrow fields of view, a fisheye lens is able to capture half of its surroundings. Figs. 1-2 show two images captured using a camera equipped with a fisheye lens and a conventional lens respectively. The tradeoff to having a wide FOV is the loss of spatial resolution. Using a fisheye lens, the spatial resolution is maximal at the center and reduces from the

Fig. 1

Image through a fisheye lens.

lens center outwards. This is tolerable in the here presented application because the region of interested is usually held in front of the ROV while the side views mainly serve obstacle avoidance and attitude perception, which can be achieved at lower resolutions. More details on the FOV of fisheye lenses and how image distortion affects the practical use of these lenses in this application are described in section 3. 2.2 Fisheye Lens Camera Model and Calibration For the purposes of getting a clear understanding of fisheye camera systems, intrinsic and extrinsic calibration of these systems are reviewed. Intrinsic calibration is the process of estimating some important parameters such as the principal point, focal length, and the aspect ratio of the camera. Intrinsic calibration is necessary to get a proper mapping of the camera coordinates into the image coordinates. Extrinsic calibration, on the other hand, is the process of determining the projection model of 3D objects into the images’ 2D domain. 2.2.1 Intrinsic Calibration Usually the manufacturer provides the parameters required for intrinsic calibration. If these are not available or insufficient, the following techniques could be considered to improve the understanding of the lens’ properties: (1) Focal length: The Focal Length is defined as the distance between the lens center and the sensor plane. Fig. 3 shows the perspective projection of an object into the image plane for a conventional lens. The focal length f can be found using the following relationship: f /h = D/H (1) where,

Fig. 2

Image through a conventional lens.

Fig. 3

Conventional lens perspective projection.

An Image-Processing-Based Gimbal System Using Fisheye Video

f : is the focal length;

r = 2 f tan(θ / 2)

D: is the distance of the object from the lens;

r = fθ r = 2 f sin(θ / 2)

H: is the object’s size.

r = f sin θ

h: is the image size;

The Fisheye lens, however, introduces distortion, which makes it more difficult to assess its focal length. This can be overcome by selecting a small object in the FOV, where the distortion is minimal, and using it to calculate a focal length approximation; (2) Principal Point: Before explaining what a Principal Point is, we need to define the Principal Surface. The Principal Surface is the imaginary surface that contains all the points where rays are refracted inside the lens. The Principal Point is the intersection of this imaginary surface with the optical axis. Many different techniques can be used to estimate the Principal Point [1]. In this work, the employed fisheye lens produces an image smaller than the plane of the CCD. Therefore, the fisheye lens projects a symmetrical circular image around the center leaving the CCD corners unexposed, i.e., black. The center of the produced circle is considered the principal point [1-5]. 2.2.2 Extrinsic Calibration Several projection models for fisheye lenses have been presented in the literature Refs. [1-5]. Usually a polynomial is used to describe these models. A general projection form will look like this [2]:

Fig. 4 plots these formulae with unity focal length f. In reality, fisheye lenses, due to manufacturing caused distortion, may not exactly obey these formulas. The fisheye lens used in this research follows the orthogonal projection model ( r = f sin θ ) as we will see in section 3. The fisheye camera model is shown in Fig. 5. By knowing the Cartesian coordinates of a point in the real object plane, and by converting these coordinates into

Fig. 4 Fisheye lens projection models.

r (θ ) = aθ + bθ 2 + cθ 3 + dθ 4 + eθ 5 ..... θ

where θ is the angle between the optical axis and the incoming ray, and r is the distance between the image point and the principal point, and f is the focal length. For computation purposes, however, the number of terms is clipped. Then the coefficients a, b, c, … will be such that r (θ ) is monotically increasing over the interval [0, 90o]. Furthermore, the perspective projection of a pinhole camera is described by:

ω

r = f tan θ While fisheye lenses, depending on their design, could have one of the following projections [2]:

87

Fig. 5 Fisheye lens model.

88

An Image-Processing-Based Gimbal System Using Fisheye Video

the polar system, all the angles shown in Fig. 5 can be computed. Hence the point’s (x, y) coordinates in the image plane will be: x = r cos(ω ) (2) y = r sin(ω ) (3) where, r is found from the fisheye lens projection model mentioned earlier. To convert into pixel coordinates, the number of pixels per unit distance has to be found by performing a simple experiment described in more detail in Section 3. The experiment involves choosing a point on a plane, then start changing the angle of view θ . As the angle is changed, the pixel locations (x, y) of that point will change as well. The use of the collected angle and pixel values will provide the required pixel–distance ratio. By knowing the pixel-distance ratio and the principle point coordinates from the intrinsic calibration mentioned earlier; the actual pixel coordinates of the point in the image plane can be found by the following equations: x pixel = k x + x0 (4)

y pixel = k y + y0

on the edges of the captured image are 94o away from the center. Ultra-wide angle lenses, such as this fisheye lens used, suffer from some amount of barrel (spherical) distortion, as illustrated in Fig. 7. Barrel distortion affects the image by reducing magnification as pixels move farther from the orthogonal optical axis, i.e., the center of the image frame. As shown in Fig. 7, the lens’ barrel distortion wraps the imaged regular rectangular grid to a spherical appearance. On the other hand, fisheye lenses can map very wide angles of the object plane onto the limited area of the CCD sensor. To examine the barrel distortion, we conducted an experiment in which images of a regular grid target on a sheet of graphing paper were taken through the fisheye lens. The viewing distance was 10 cm and the target grid line spacing was 6 mm. The experiment included changing the viewing angle of the camera, i.e., the angle between the line perpendicular to the plane

(5)

where, x pixel and y pixel are the x and y coordinates in pixels respectively. While

kx

and

k y are the

pixel-distance ratio in the x and y directions respectively. Finally x0 and

y0 are the (x, y)

coordinates for the principal point. By knowing the intrinsic parameters, the extrinsic calibration could be performed by specifying a number of 3D points and their 2D projection. Then through a non-linear least squares optimization technique, an approximation of the lenses’ projection model can be found. The next section will discuss and provide the plot of the exact projection model for the used lens.

Fig. 6

The fisheye lens used.

3. Experimental Calibration The fisheye lens used in this research (Fig. 6) is a 1.2 mm aspheric lens with a 188 degree viewing angle. The wide view angle allows the capture of +/- 94o around the horizontal optical axis and the vertical axis. Targets

Fig. 7 Fisheye lens barrel distortion.

An Image-Processing-Based Gimbal System Using Fisheye Video

89

and the optical axis where the camera is pointing. The distance between the center of the lens/frame and the center of the target (in pixels) was recorded as a function of viewing angle. The results are shown in Fig. 8 and reveal a close to linear trend for angles of interest that are less than 60o. The linear Degrees-Pixels relationship equation used is: (6) p(θ ) = 3.6771θ + 412.13 where, θ is the viewing angle of the camera (angle between the perpendicular line on the image plane and the camera’s optical axis), and p (θ ) is the pixel value

Fig. 8 Used fisheye lens projection model with real angle and pixel scales.

at θ angle. On the other hand, a more accurate polynomial that may model this curve is: p (θ ) = a o − a1θ + a 2θ 2 − a3θ 3 + a 4θ 4 − a5θ 5

(7)

where, p (θ ) : The exact pixel value at θ angle. θ : Viewing angle of the camera. ao through a5 are constant parameters for this specific fisheye lens. ao = 469, a1 = 5.3175, a2 = 0.4329, a3 = 0.0085,

Fig. 9 Linear trend estimation error compared to the polynomial.

a4 = 7E-05, a5 = 2E-07 For more accuracy, Eq. (7) may be used. However, the assumption of linearity simplifies the conversion. Fig. 9 shows the stem plot of the error resulting from using the linear trend, Eq. (6), instead of the polynomial shown in Eq. (7). Another experiment was conducted to model the behavior of the fisheye lens with respect to viewing distance changes. Similar to the first experiment, a number of target points were set on a sheet of graphing paper. The distance between the camera and the target plane was varied from 40 mm to 300 mm. The results show how the x and y pixel coordinates change as a function of distance. As shown in Figs. 10-11, the numbered targets are moving towards the center of the image where the optical axis and target plane intersect. The colored lines track this motion. Target point 1 is in the center, and as expected, does not move as the camera distance is increased. Similarly, the y coordinate on target point 6 also stays constant. The

Fig. 10 Target point movement through a fisheye lens as the viewing distance is varied.

Fig. 11

Target location change with viewing distance.

90

An Image-Processing-Based Gimbal System Using Fisheye Video

remaining target points move both in the x and y directions. Similar to the conclusion from the data in Fig. 8, the elevation relationships in Fig. 11 shows that the lens distortion is close to linear.

4. The First-Person View System Attaching a stationary camera to the vehicle offers the operator a view of what is directly in front of the vehicle but limited view of the surroundings. The solution to this problem is to install a camera that allows the operator to change views based on head movements. This gives the pilot of a remotely operated plane the perspective of being in the plane. This is traditionally achieved by placing a camera on an electromechanical gimbal. This gives the pilot the in-plane perspective by using servo motors which moves the camera with the operator’s head movements. Such a system has several disadvantages, which include: increased response time, computational resources, and power requirements. Furthermore, the gimbal by its nature is sensitive to vibration and will amplify the vibration to the camera. The here presented alternative method uses a high resolution video camera with a fisheye lens. The camera-lens combination is securely mounted on the front of the plane fuselage as shown in Fig. 12. The video captured is streamed in real-time to a laptop on the ground. The laptop crops and rectifies the video and feeds it to the goggles to match the orientation of the pilot’s head (Fig. 13). Attaching the camera directly to the airframe reduces camera vibration. Using only the camera with a wide-angle lens, the system furthermore only requires a unidirectional communication to operate. Finally, the system results in reduced onboard computational requirements. One main disadvantage of the system is that only a fraction of the camera’s full resolution is used at a time.

5. System Architecture A high-level system block diagram is shown in Fig. 14. The video acquisition setup consists of a camera with

Fig. 12 Forward looking camera with fisheye lens mounted on R/C plane.

Fig. 13 sensors.

Fig. 14

Pilot wearing goggles with integrated motion

Software data-flow diagram.

a mounted fisheye lens with a wide field of view. It is a charge-coupled device (CCD) camera [6] that has a 1/3 inch Sony lens, weights 22 grams, and has a 420 TVL horizontal resolution. The wide-angle lens captures the entire field of view of a pilot theoretically located in the cockpit of the vehicle. The frames from the KX191 color CCD camera are then sent through a 1.3 GHz, 300mW composite video transmitter receiver pair by Range Video, Inc. Received video on the ground is

An Image-Processing-Based Gimbal System Using Fisheye Video

digitized using a USB interfaced digital video converter [7] connected to a laptop running MatlabTM. The video is represented in Matlab as a three dimensional matrix that can be easily processed. The Vuzix video glasses [8] are virtual reality glasses, shown in Fig. 13, that are equipped with sensors that can measure the user’s head movements and orientation. This information is sent though a USB serial link to the laptop. A program was implemented using the software development kit (SDK) [9], provided by the video glasses manufacturer, that processes the serial data and extracts roll, pitch, and yaw angles. These angles are sent to the Matlab application though a shared ASCII file. The Matlab application performs appropriate cropping of the full fisheye frames and sends the resulting view to the video glasses through a VGA connection.

center where the distortion is minimal. However, the distorted parts of the image near the edges will not have much drawback on the ability to remotely pilot the plane, because the earth’s horizon will still be useful for attitude perception. Field testing of the system was conducted using a Mini Telemaster R/C hobby airplane. This first prototype of the system performed well. The pilot was immediately able to complete several flights including take-offs and landings successfully. The system allowed the plane to be flown at altitudes and at distances that are not possible in normal operation of the R/C vehicle. Landings and takeoffs where observed to be much smoother using the FPV system by all flight testers due to the more intuitive feel the system provides. Fig. 17 shows the in-cockpit view of a pilot looking left.

6. Discussion The system is able to compute head movement and display the appropriate portion of the full half-spherical view captured by the fisheye-lens. This action is performed at a speed where the operator does not notice a delay. The system gives the viewer a smooth and responsive video frame made by the Matlab application. The camera used unitizes the PAL format with a resolution of 720 lines by 576 rows. This is then reduced to a viewing window of 180 lines by 144 rows to give the proper depth of field perception. The window size leaves the video coarser but gives the viewer 180 degree of viewing freedom, which responds in real-time. The wide-angle lens used has an 188o angle of view providing the full view shown in Fig. 15. The limit of using a standard definition camera and a standard video wireless transceiver is that the large view is compressed into a standard video frame. Fig. 16 provides a sample view after cropping the full view of Fig. 15. As discussed in previous sections, lens distortion increases from the center out. From a remote pilot point of view, the focus will be on targets around the image

91

Fig. 15

Full view of fisheye lens.

Fig. 16

Cropped view for user looking forward.

92

An Image-Processing-Based Gimbal System Using Fisheye Video

They were chosen to speed up the development process to reach the testing phase of a proof-of-concept prototype. Goggles without embedded sensors are available at a fraction of the cost. These could be combined with continuously lower costing MEMS based accelerometers and a magnetometer to further reduce the total system cost.

7. Future Work Fig. 17

Fig. 18

The in-cockpit view of a pilot looking left.

Full view of fisheye lens.

Electromechanical-gimbal based setups always provide full resolution images to the pilot. The major drawback of the system presented here is the loss of resolution due to cropping. The system could therefore be improved by utilizing a higher resolution camera, which are becoming more available and by using a digital communication links for video transmission. Furthermore, a dual fisheye video capturing setup can be used to provide a 3-D perspective to the operator. The resulting depth perception can be very valuable for navigating confined spaces. The advantages of using the fisheye approach over traditional gimbal setups would increase in significant in this case due to the mechanical complexity associated with steering two stereoscopic cameras. Finally, although not very significant in cropped fisheye views, the system could be enhanced by further developing the video processing algorithm to undo the fisheye distortion.

8. Conclusions

Fig. 19

Cropped view for user looking forward.

Fig. 18 shows a complete fisheye video frame captured by the system, and the corresponding cropped area (Fig. 19). The cropped area was experimentally adjusted to keep the correct distance perception. The main components of the system as tested are listed in Table 1. The total cost is $724. The majority of the cost (~$400) is associated with the VR goggles.

This paper presented a software based first-person view system using fisheye video that provides the operator full in-cockpit perspective of a remotely operated vehicle without the use of electromechanical actuators for panning and tilting. Full video is transmitted to a ground-based computer that is responsible for all processing, resulting in a reduction of resource requirements onboard the vehicle compared to traditional gimbaled setups. A proof of concept prototype was developed and successfully tested in several flights using an R/C airplane.

An Image-Processing-Based Gimbal System Using Fisheye Video

Table 1

Parts list.

Manufacturer Pinnacle Range Video Vuzix Range Video Range Video Range Video

Product Dazzle DVD Recorder Plus f2.10m Superwide iWear® VR920 Video Eyewear KX191 color and night mode CCD camera 1.3GHz 300mW audio/video transmitter 1300MHz dual output receiver

Description Video capture Lense Display headset Camera Wireless transmitter Wireless receiver [3]

Acknowledgments The authors would like to acknowledge the contributions of students affiliated with the Embedded Systems Lab at Oakland University. Special thanks go to Philip Profit (BS EE 2010) and Hong Chul Yang (MS ECE 2010) for their contributions and support.

[4]

[5]

References [1]

[2]

93

M. Mostafa, J. Hutton, Direct positioning and orientation systems: How do they work? what is the attainable accuracy?, in: Proceedings, American Society of Photogrammetry and Remote Sensing (ASPRS) Annual Conference, St. Louis, MO, 2001. B. Grinstead, A. Koschan, M. Abidi, A comparison of pose estimation techniques: Hardware vs. video, in: Proceedings SPIE Unmanned Ground Vehicle Technology VII, Orlando, FL, 2005, pp. 166-173.

[6] [7] [8] [9]

Price $49.99 $10.00 $399.95 $105.00 $75.00 $85.00

R.V. De Leo, F.W. Hagen, Pressure sensor for determining airspeed, altitude and angle of attack, United States Patent, Patent number: 4096744, (1978). J. B. West, S. Lahiri, K.H. Maret, R.M. Peters, Jr., C.J. Pizzo, Barometric pressures at extreme altitudes on Mt. Everest: Physiological significance, Journal of Applied Physiology 54 (1983) 1188-1194. S.E. Hrabar, P.I. Corke, G.S. Sukhatme, K. Usher, J.M. Roberts, Combined optic-flow and stereo-based navigation of urban canyons for a UAV, in: Proceedings of the IEEE/RSJ Int. Conference on Intelligent Robots and Systems, Edmonton, Canada, 2005. RangeVideo KPX191 CCD Camera Product Page, available online at: http://www.rangevideo.com, 2010. Dazzle DVD Recorder Plus Product Page, available online at: http://www.pinnaclesys.com, 2010. iWear VR920 Product Information Page, available online at: http://www.vuzix.com/iwear/products_vr920.html, 2010. iWear Software Development Kit (SDK), available online at: http://vrdeveloper.vr920.com, 2010.