Specialized Architectures for Optical Flow Computation - CiteSeerX

6 downloads 6525 Views 86KB Size Report
A Performance Comparison of ASIC, DSP, and. Multi–DSP .... I/O server. For debugging purposes, the. image buffers and the optical flow compo-. nents can be ...
Specialized Architectures for Optical Flow Computation: A Performance Comparison of ASIC, DSP, and Multi–DSP Thomas R¨owekamp, Marco Platzner, and Liliane Peters GMD – German National Research Center for Information Technology E-mail: [email protected] Abstract In this paper we present three specialized architectures for optical flow computation based on: i) an ASIC in CMOS standard cell technology, ii) a single DSP TMS320C40, and ii) a TMS320C40-based multiprocessor. The pros and cons of each architecture are discussed. keywords: optical flow estimation, ASIC, multi-DSP

1

Introduction

Autonomous mobile vehicles, e.g., service robots, must be able to detect dynamic obstacles and to avoid collisions through navigation manœuvres. Most robot vision approaches for this task are based on optical flow methods, leading to systems that consist of two parts, e.g., [4]. In the low-level part, the optical flow is calculated from image sequences. In the subsequent high-level part, parameters like direction-of-impact and time-to-impact are determined from the optical flow.

The most important requirements for an autonomous robot vision system are: i) real-time capability and ii) constrained form factors. Collision avoidance is a task where hard timing constraints apply. Hence, real-time capability means predictability and - dependent on the objects’ actual velocities - a sufficiently high computational performance. As the mobile vehicle operates autonomously, the physical form factors of the vision system, e.g., size, weight, and power consumption, must be minimizable. The above requirements can only be satisfied by the development of specialized computer architectures. Thus we have developed a smart sensor system for the computation of the optical flow. The core of this sensor system is an ASIC in standard cell technology. To evaluate the performance characteristics of the smart sensor and to prove its suitability, we have designed reference systems based on TMS320C40 processors as computing elements. In this paper we present and compare the different systems for optical flow computation and discuss the pros and cons of each architecture.

processing unit

imager

control unit

image data

processing pipeline

u optical flow v

ASIC

PIXCLK

driving circuitry

digital interface

components

VSYNC

memory PIXCLK HSYNC VSYNC

synchronization signals

Figure 1: Architecture of the ASIC-based smart sensor system.

2

Optical Flow Algorithm

There exist numerous computational methods for optical flow estimation. An exhaustive survey has been presented by Beauchemin and Barron [1]. One of the fundamental methods is the technique developed by Horn and Schunck [2]. Their approach is known as differential technique as it computes the optical flow from the spatio-temporal derivatives of the image intensities. Let’s consider, the intensity I of local image regions is approximately constant under motion over a short duration of time, dI/dt = 0. Then the gradient constraint equation is derived as Ix u + Iy v + It = 0,

(1)

where u and v denote the x- and ycomponents of the optical flow vector. Ix , Iy , and It are the spatial and temporal derivatives of the image intensity. They can be approximated as the difference in intensity of neighboring pixels in space and time. Equation (1) is under-determined for the computation of the optical flow compo-

nents. However, as neighboring object points have similar velocities, a smoothness regularization term can be introduced. This allows the computation of the optical flow by minimizing a cost function derived from the gradient constraint equation and the smoothness regularization term. This minimization problem can be solved through iteration using the relaxation formula u v

!

= n+1

!

Ix u¯ + Iy v¯ + It Ix u¯ − 2 v¯ Ix + Iy2 + 4α2 Iy

!

, (2) n

where α is a constant, which weights the influence of the regularization term. The variables u¯ and v¯ denote the averaged local optical flow, approximated from neighboring pixels in space. Originally, n indicates the iterations to be performed for one image frame. However, assuming continuous operation and smooth object motions, this number of iterations can be reduced to one. This allows to interpret the parameter n in Equation 2 as frame counter. At start-up, the optical flow estimation converges to its correct values after an initialization phase of a few frames.

Specialized Architectures

3.1

preprocessing

ASIC-based Smart Sensor

Figure 1 presents the architecture of the smart sensor system1 [6], that is composed out of three basic components: (i) the imager, (ii) the processing unit, and (iii) the digital interface. The imager forms the interface to the visual world and consists of a CMOS photodiode array (FUGA 15) and accompanying driving circuitry. This circuitry adopts the front end of the sensor to the processing unit and generates the line and frame synchronization signals, HSYNC and VSYNC. The processing unit is the core of the sensor system and performs the optical flow computation. It outputs the x- and y- components of the optical flow, each as a serial stream of 8-bit coded values, driven by the pixel clock, PIXCLK. The flow components together with the synchronization signals form the digital interface. The sensor system operates continuously on a sequence of image frames, each of 128 × 128 pixel size. Therefore, the smart sensor can be treated like a digital camera and be connected to any computing system for high-level processing. The data path of the processing unit forms a pipeline with a feedback path, driven by the pixel clock. The latency of the pipeline is four image rows plus 13 pixels, i.e., 525 pixel clock cycles. Figure 2 presents the partitioning of the pipeline into several functional blocks: (i) the preprocessing, (ii) the computation of derivatives, (iii) computation of the optical flow components, and (iv) the computation of local optical flow averages. In the preprocessing block, the image is smoothed to enhance the signal-to-noise ratio. Three frame-sized external SRAM memories are required to store the previous frame for 1

image data

German patent pending

I u v feedback path

3

computation of spatio-temporal derivatives

It local optical flow averages

Ix

Iy

computation of optical flow components

u v optical flow components

Figure 2: Data path of the ASIC. the computation of the temporal derivative and to feed back the averaged flow components for the computation of the subsequent frame. The block that computes the optical flow components implements Equation 2. The data path and the control logic of the processing unit were implemented as an ASIC in 0.7 µm digital CMOS technology using a standard cell library and compiled memory cells (Alcatel MIETEC). The area of the circuit is 47 mm2 . 3.2

DSP-based System

To evaluate the ASIC-based smart sensor, a µP-based reference system was implemented. As processor, the DSP TMS320C40 (Texas Instruments) [3] was chosen. DSPs are obviously well suited for image processing algorithms like optical flow calculation, as they offer a high I/O bandwidth, parallel instructions, fast internal memories, zero overhead loops, etc. The DSP system is shown in Figure 3. It consists of a standard video (PAL) CCD camera, a TMS320C40 module with an at-

frame grabber

DSP

video camera TMS320C40

A

host PC (ISA bus)

VRAM (R,G,B)

D

DSP

DSP TMS320C40

TMS320C40

...

Figure 3: Architecture of the DSP-based system. In the multi-DSP system, the DSPs are connected by their bidirectional communication channels (denoted as dashed lines). tached RGB frame grabber (Transtech), and a host PC. The DSP executes the image processing loop outlined in Figure 4. In line 2, the DSP transfers an image from the VRAM into an image buffer, optionally converting RGB to intensity. All subsequent image operations, shown in lines 3-6, are performed on image buffers. The host PC is connected to the DSP system via one of the DSP’s bidirectional communication channels. The host is used to download the DSP program and as an I/O server. For debugging purposes, the image buffers and the optical flow components can be displayed on the PC. 3.3

Multi-DSP-based System

The loop in Figure 4 is parallelized using the inherent data-parallelism of the image processing functions. The computation of the flow components is a point-wise operator; all other functions are neighborhood operators, either in time or space. This allows the partitioning of the input image into p subimages, assuming p processors are available.

1 2 3 4 5 6 7

do forever grab image smooth image I compute derivatives It , Ix , Iy compute flow components u, v average flow components u¯, v¯ end do

Figure 4: Basic image processing loop.

The DSP with the frame grabber attached is called the master DSP and splits the original image into subimages by partitioning the vertical image dimension. The subimages are sent to the other DSPs in the system. Then each DSP executes lines 3-6 of the image processing loop in Figure 4 for its subimage. Finally, the optical flow components are sent back to the master DSP that merges the p optical flow components to the overall result. The architecture of the multi-DSP is shown in Figure 3. The DSPs are connected by their bidirectional communication channels [5].

4

Comparison, Conclusion

The comparison of the different systems is based on the criteria form factor, flexibility, and performance. Obviously, the ASIC-based smart sensor is minimizable in its form factors, whereas the softwarebased DSP solutions offer the advantage of greater flexibility. The performance of the systems is measured by the throughput and the latency. The throughput is the number of image frames that are processed in a given time interval. This parameter determines the maximum object velocities in the scene. The latency is the time required for the computation of one image frame. For applications where the computation of the optical flow is part of a control loop, the latency is mostly the crucial parameter. Table 1 compares the performance of the different implementations. The ASIC can be operated at clock frequencies up to 25 MHz, leading to a throughput of 1500 frames/sec for an image of 128 × 128 pixel size. However, experiments with the presented smart sensor revealed, that the imager limits the overall system’s throughput at 50 frames/sec. The overall latency of the smart sensor is composed of the time to read one image from the imager (20 msec) and the latency of the pipelined ASIC (0.62 msec). In the DSP-based systems, the imager operates at a fixed frame rate of 25 frames/sec (PAL). Although macropipelining is used, i.e., the grabbing of an image and the computation of the optical flow are overlapped, the computations form the bottleneck. However, parallelizing the optical flow algorithm lead to the promising speedups of S(2) = 1.93 and S(3) = 2.84. To conclude, our experiments revealed that with the presented ASIC-based smart sensor a throughput and a latency is

system based on throughput [frames/sec] ASIC 50 1 DSP 5.16 2 DSPs 10.05 3 DSPs 14.75

latency [msec ] 20.62 232.48 139.52 107.79

Table 1: Performance comparison of the different optical flow architectures for an image of 128 × 128 pixel size.

achieved, superior to DSP-based solutions. This smart sensor provides the high performance required for applications in robotics. However, to fully exploit the ASIC’s speed, it must be connected to a faster imager, i.e., a high-speed CCD array.

References [1] S. S. Beauchemin and J. L. Barron. The Computation of Optical Flow. ACM Computing Surveys, 27(3):443–465, September 1995. [2] B. K. P. Horn and B. G. Schunck. Determining Optical Flow. Artificial Intelligence, (17), 1981. [3] M. Graves et al. High speed image processing using the TMS320C40 parallel DSP chip. In Proceedings of the SPIE, volume 2597, pages 70–81, 1995. [4] N. Ancona et al. A real-time, miniaturized optical sensor for motion estimation and timeto-crash detection. In Proceedings of the AFPAEC, Europto ’96, Berlin, October 1996. [5] R. Simar et al. Floating–Point Processors Join Forces in Parallel Processing Architectures. IEEE Micro, pages 60–69, August 1992. [6] T. R¨ owekamp and L. Peters. Intelligent RealTime Sensor System for Optical Flow Estimation. In Proceedings of 30th ISATA, Dedicated Conference on Robotics, Motion & Machine Vision, Florence, Italy, June 1997.