Motion Vision Sensor Architecture with

0 downloads 0 Views 320KB Size Report
Motion is calculated by measuring the time of travel of ... M. Arias-Estrada, M. Tremblay, D. Poussart, “A Motion Vision Sensor ... where ∆x is the inter-pixel distance, and ∆t is the time ... their coordinates when an illumination change is detected. ... pulse1 pulse2. Velocity pixel2 pulse1 pulse2. ∆t pixel1 edge time t1 t2 a) b) v ...
Motion Vision Sensor Architecture with Asynchronous Self-Signaling Pixels Miguel Arias-Estrada, Denis Poussart Computer Vision and Systems Laboratory Dept. of Electrical and Computer Engineering Laval University, Quebec, Qc. Canada, G1K 7P4 [ariasmig,poussart]@gel.ulaval.ca Abstract A custom CMOS imager with integrated motion computation is described. The architecture is based on correlating in time moving edges. Edges are located in time by a custom sensor, and correlated in a coprocessing module. The sensor architecture is centered around a compact pixel with analog signal processing and digital self-signaling capabilities. The sensor pixels detect moving edges in the image and communicate their position using an addressevent protocol associated to temporal stamps. The coprocessing module correlates the edges and computes the velocity vector map. The motion sensor could be used in applications such as self-guided vehicles, mobile robotics and smart surveillance systems. The article details the motion sensor architecture, the simulated performance, the VLSI implementation and some preliminary results on fabricated prototypes.

1. Introduction Visual motion information applications are varied. Detection of objects by relative motion, collision avoidance, tracking, focus-of-expansion and time-to-contact are tasks that need low-level motion processing. Autonomous vehicles and mobile robots applications demand dynamic visual feedback. Unfortunately, motion estimation algorithms are computational expensive and require specific architectures for real-time execution. Mobile robot systems impose other limitations like small size, low-weight and low-power consumption. Computational sensors [1], also called Neuromorphic vision chips [2], integrate sensing and processing on a VLSI chip or a set of intimately coupled chips, in a small size, low-power and real-time computing front-end. The programming flexibility of a general purpose computer is traded for the real-time performance, and overall convenience of a computational sensor. Analog VLSI tech-

Marc Tremblay HexaVision Technologies inc. 2050 Rene-Levesque O., suite 101 Ste-Foy, Qc, Canada, G1V 2K8 [email protected] niques are often used in the design of computational sensors because they interface well to the real world, require little silicon area and consume low power [3]. Analog computation 1ow precision can be exploited in the first stages of motion computation, and it can be combined with digital techniques to improve the overall sensor performance. There has been several research prototypes for computational motion sensors. An extensive review is given by Koch and Li [4]. The pioneering works by Carver Mead’s team implemented a motion sensitive silicon retina by mapping a gradient-based algorithm to analog circuitry [5]. First prototypes demonstrated the feasibility of the approach, but they were unsuitable for practical vision machines due to analog computation inaccuracy. Delbruck’s Motion Silicon Retina [6] is the first 2-D circuit which compute bidimensional motion, but it is tuned to only one spatiotemporal frequency. The circuit has interesting properties by it is not suited for a general purpose vision machine. Dron’s architecture [7] implements a CCD based imager with edge extraction capabilities, and a digital system to correlate edge positions in time. Correlating edges in time is more robust since edges are insensitive to illumination changes. Etienne-Cummings et al [8] developed a different approach for computing motion in a focal plane sensor. Motion is calculated by measuring the time of travel of image edges. The architecture locates edges using spatial filtering and correlate them in time by measuring the timeof-travel between adjacent locations. The measurement is done at the pixel level by integration circuits triggered by the detected edges. The sensor performs well but, the pixel size compromise the sensor resolution. Analog computing circuits developed by Koch’s team [9,10] compute motion by measuring the time-of-travel of edges. Edges are located in time and the temporal measurement is done on pixel using neural-like circuits. Again, pixel area is compromised due to the capacitors required in the implementation.

M. Arias-Estrada, M. Tremblay, D. Poussart, “A Motion Vision Sensor Architecture with Asynchronous Self-Signalling Pixels,” CAMP 97 (Computer Architectures for Machine Perception), Boston, MA, IEEE Computer Society Press, Oct. 1997, pp 75-83.

In our work, a visual motion sensor was investigated. The final design explores an architecture composed of a custom motion detection sensor and a companion digital coprocessor for velocity computation. The sensor detects motion at the pixel level using analog signal processing techniques and a digital communication channel. Velocity is computed in the digital domain using temporal stamps in the coprocessing module. The architecture overcomes some of the limitations and difficulties encountered on previous approaches for motion vision sensors [12]. The architecture is the first reported computational sensor that incorporates the event-address protocol for motion computation. The remainder of the paper is organized as follows: first, a description of the motion computation algorithm is given. Then the architecture is described from the pixel level to the communication protocol and the velocity computation process. An overview of the velocity computation process follows. The next section details the VLSI implementation. Finally, some preliminary results of fabricated prototypes are presented.

the diagonal pairs. Based on the directions covered by the pairs, it is possible to measure velocity in four directions. The measured time differences can be interpolated to increase the vector computation accuracy. Only one velocity vector is computed per each set of four pixels. The time of travel measurement has been implemented in several sensor prototypes [8-10]. A reference current is integrated (at each pixel) during the time-of-travel into a velocity dependent voltage. The disadvantage of this approach is that the pixel design requires more components to implement the velocity computation circuitry, resulting in a large pixel area and a low resolution sensor. Another option is to convert the edge signals into digital levels and monitor them using a digital timer. Thus, the resolution and accuracy of the measurement can be programmed. In a high density pixel array, there is the problem of how to monitor the state of all pixels at the same time. To overcome the limitation, the event-address protocol was incorporated. The pixels have the capacity to communicate their coordinates when an illumination change is detected.

2. Motion Computation Algorithm The motion computation architecture is based on the motion detection pair [8-12]. The particular implementation in our design is a modified version using a time-oftravel measurement (temporal correlation) [8-10]. The motion pair (see figure 1a) is formed by two detection units separated by a fixed distance. The inputs are temporal adaptive pixels [13] which respond to temporal varying signals (i.e. moving edges) without relying on absolute intensity levels. The temporal adaptive pixel outputs are converted to digital levels and the velocity module measures the time difference between the on-sets of both branches. The velocity vector is computed based on the velocity equation: ∆x v = -----∆t

v

Moving edge

d pixel1

pixel2

pulse1

pulse2

Velocity

a) v

(1)

edge pixel1

where ∆x is the inter-pixel distance, and ∆t is the time difference measured by the velocity module. Edge velocity is assumed constant during the edge displacement between pixels. The condition applies for real images if ∆t is small.

pixel2 pulse1 pulse2

In a bidimensional array, pixels can be grouped in bidimensional sets of four pixels to cover X and Y directions, as illustrated in figure 2. Time differences are computed in the X and Y motion pairs. Furthermore, it is possible to compute the time difference of the motion pairs formed by

b)

∆t t1

t2

time

Figure 1. a) Motion detection pair, b) Time of travel algorithm

1 Vx

1.0 3

4 1.4142

Figure 2. Four pixels set and the four directions where motion pairs measure velocities. An external temporal stamp is then assigned to the coordinate and used later to compute the time differences.

3. Motion Sensor Architecture The Motion Vision Sensor is formed by a custom CMOS imager with self-signaling pixels, and a companion digital module that computes the velocity vectors. Separating motion detection from the velocity computation process simplify nicely the hardware for the motion computation process. Furthermore, the pixel design is simplified compared to previous approaches, allowing higher resolution sensors with present day microelectronic technologies. The velocity computation module is thus, implemented in a separate module and silicon area is devoted entirely to the sensor design. Communication between the sensor and the velocity module is completely asynchronous with digital signals. Data transfer rate depends on the generation of motionbased events in the sensor.

The pixels are composed of a time-adaptive photoreceptor [13], analog conditioning circuitry and a 1-bit digital memory cell (see figure 4a). The photoreceptor uses adaptation to detect temporal illumination changes produced by moving edges in the image. The photoreceptor provides two outputs, an instantaneous illumination output and an time-adapted output that responds with rapid voltage transitions when temporal illumination changes are sensed. The photoreceptor adaptive output is compared to its instantaneous response, then compared to an external threshold voltage (which sets the sensitivity to the edge spatiotemporal contrast) and finally converted to a digital level. The signal is then used to trigger the pixel memory cell which indicates the detection of motion. The memory cell is interfaced to row and column lines used to initiate a request signal. Once the request has been served, the surrounding arbitration circuitry resets the memory state by asserting row and column Acknowledge lines. scanner

The pixels detect illumination changes in the image using analog VLSI techniques. If motion is detected, the pixel initiate a signalization cycle by sending requests through row and column lines to arbitration circuits.

pixel

encoder

3.1 Sensor

arbiter X-arbiter tree

Figure 3 is a block diagram of the motion sensor. The architecture consists of an array of self-signaling pixels with an event-address communication protocol to send offchip the pixel coordinates.

Y-arbiter tree

arbiter

2

encoder

V45

V135

The event-address system is implemented with two asynchronous arbiter trees [14, 15] that decide on requests sequencing, avoiding collisions during multiple motionbased pixel signalization. Encoder circuits codify the signaling pixel position into two coordinate buses. Arbiter circuits implement the off-chip communication by handshake signals REQ and ACK. In addition, the architecture includes scanning circuitry to read out the illumination value from each pixel. The scanners select an individual pixel through row and column lines and route out a copy of the photoreceptor instantaneous response.

scanner

Vy

REQ

ACK X0-5 Y0-5

Figure 3. Functional block diagram of the motion sensor

VREF

Iout

R1 A

pix_sel

R2

A2

A0

D

R1

A1

Ry

R2

A2

Ay

R1

Vc Vo Vm

A

A1

SRAM

R0

R2

a) Rx Ax

b)

R0

A0

Figure 4. a) Motion-based Self-signaling Pixel, b) 2:1 Cascadable Arbiter Module The arbiter trees are build around a basic cascadable 2:1 arbitration cell operating asynchronously (see figure 4b). The arbiter module asserts only one acknowledge line if any of the input request lines (or both) are active. Request activity is passed at deeper levels by OR-ing both request signals (Ro). The arbiter module works only if a deeper level enables the module through the Ao signal, that is, if the arbiter at a deeper level has decided on a request. Two encoders send out the signaling pixel coordinates during a transfer cycle. The communication process is coordinated externally through a request (REQ) and an acknowledge (ACK) line. The motion sensor works as follows: when a moving edge triggers one or more pixels in the array, the pixels initiate a communication cycle. Only one pixel is served at a time. The pixels requests are served first by the Y-arbitration tree (row). Once a row is acknowledged, the X-arbitration tree (column) decides on a column and asserts the external REQ line. When the interruption is detected by the external processor, the x-y buses are enabled, communicating the pixel coordinates off-chip. External circuitry reads the pixel coordinates and asserts the ACK line to finish the communication cycle. The pixel is reseted, releasing the REQ line, and leaving the system ready for a new communication cycle. The Motion Sensor operation is continuous in time. Communication is completely asynchronous, limited internally by the propagation delay of the arbitration and encoding circuitry, and externally by the speed of the coprocessing module to read out the pixel coordinates.

3.2 Velocity computation module Velocity computation is carried on externally by a companion digital processor that serves the sensor requests (see Figure 5). The velocity computation module consists on a digital control module, a digital timer, RAM memory and a host interface. The control module serves the requests from the Motion Sensor and uses the timer values to generate the time stamps. For each motion request, the digital module assigns a temporal label (time-stamp) to a RAM location corresponding to the pixel coordinate. The time-stamps are used later to compute the time difference among neighbor pixels and obtain the velocity vector. The velocity computation process is executed at a fixed rate. Motion events are captured during a predefined period of time (measurement period) that depends on the scale of motion to be measured. RAM memory is initially set to zero values, and then filled with the time stamps from the sensor-coprocessor communication process. At the end of the measurement period, the time-stamp list is scanned and a one-pass algorithm is applied to compute the velocity vectors. A velocity vector table is generated and sent to a host computer through the system interface. The Motion Vision Sensor is time-scale programmable. The time reference can be programmed to the range and resolution of the velocity computation required for an specific application. The possibility of pixel self-signaling allows asynchronous communication at high transfer rates in the order of microseconds. On the other hand, the time constants of image feature motion are in the order of 10’s of millisec-

Motion Sensor

Velocity coprocessor module Time reference

2

1

3

4

Digital module

RAM memory

t_stamp 1 t_stamp 2 t_stamp 3 t_stamp 4

one pass algorithm

velocity vector

Look-up table Edge in motion

Host computer Figure 5. Velocity computation module. The module assigns to each pixel request a temporal stamp used later to compute the time-differences and the velocity vectors with a one pass algorithm. One velocity vector is computed for each set of four pixels. onds, providing a wide time interval for multiple pixel data transfers. Bottleneck problems are not a concern, even for dense pixel arrays. The communication channel is optimally used since data is only transmitted when motion is detected.

4. Velocity computation In a motion pair monitored by a digital timer, minimum and maximum velocity values are limited by the maximum and minimum time-differences measured, respectively. The velocity range is given by: ∆x V max = ------------∆t min

(2)

∆x V min = -------------∆tmax

(3)

where: ∆x is the distance between adjacent photoreceptors, equal to 1 pixel for a 1-D array, ∆t max and ∆tmin are the maximum and minimum measured time-of-travel or time difference. Additionally, the measurement of ∆t is confined to a fixed measurement period that will be called frame thereafter for convenience, although the concept of frames as in

digital sequences is not valid, because the edge detection and the velocity computation are contained in the continuous time between frames. The minimum time measurement ∆t min , corresponds to the smallest time step ( Tstep ) used in the time-base (digital timer), but it corresponds to a extremely large velocity not implemented in practical systems. Properly chosen, the minimum time step can be far from the noise limit imposed by the system physics. The maximum velocity that the motion pair can measure has to be limited to avoid spatial alias effects. Spatial alias occurs when an edge triggers more than one motion pair during the same measurement period, reporting false velocity vectors as a trace. A maximum velocity would correspond to a measure of several time steps from the digital time reference. ∆t max defines the minimum velocity measured. Given that the minimum velocity that a motion pair can measure is 1 pixel/frame, ∆tmax sets the maximum count allowed in the digital time reference during a frame.

The measurement period or frame is thus, divided in discrete time steps: T F = NQ T step

(4)

where TF is the measurement period corresponding to a frame, it could last any arbitrary period of time. NQ is the number of quantization steps to measure the minimum

velocity of Vmin=1 pixel/frame. Tstep is the duration of the quantization step programmed in the time base, that is, the time it takes the timer counter to increment one count. In high density arrays where motion pairs are adjacent to each other, the maximum velocity allowed to avoid spatial alias effects is 3 pixels/frame. In cartesian topology (see figure 2), motion pairs can be formed by grouping the pixels along the X and Y axis (motion pairs 1-2, 3-4, 2-3 and 1-4). Additionally, pairs oriented to 45o (motion pairs formed by pixels 1-3 and 2-4) can be used to enhance the angular resolution. The 360o is divided in 8 angular regions covering 45o. Each motion pair covers an angular resolution of 45o distributed symmetrically around its own axe in a ±22.5o interval. Figure 6 presents results from velocity computation simulations. An edge travels over the four pixel set at different velocities (from 1 to 3 pixels/frame) and with differ-

ent incidence angles covering 360o. Velocity is computed from the motion pair that measures the largest time difference. This measurement corresponds to the motion pair detecting the edge which normal is traveling ±22.5o around the motion pair axe. Diagonal motion pair measurements are corrected for the interpixel distance by considering ∆x = 2 pixels. Circular diagrams (figure 6, left) illustrate the velocity accuracy for two different quantization resolutions (NQ) values. Angular performance diagrams (figure 6, right) display the measured angle and the error value due to edges not traveling perpendicular to a motion pair. Irregularities in the curve with NQ=16 are due to the poor quantization resolution as noted in the more regular when NQ=64. Higher values of NQ increase the accuracy of the velocity computation but reduce the time allowed to motion events receive the same time-stamp. A good compromise is NQ=64.

Cartesian response 4

Performance. Real angle and detected angle

3

135

Measured angle

90

45

[pixels/frame]

2

1

50

100

150

200

250

300

350

250

300

350

Incidence angle 50

−1

−3

Error

225

315

−2

270 0

−1

1

2

3

0

−50 0

4

50

100

150

200

[pixels/frame]

Incidence angle

Cartesian response

Performance. Real angle and detected angle

4

90

3

135

Measured angle

−4 −4

45

2

300

200

100

1 0 0

0 180

50

100

150

200

250

300

350

250

300

350

Incidence angle

0 50

−1

Error

[pixels/frame]

100

0

−3

−2 225

−3

b)

200

0 0

0 180

−2

a)

300

−4 −4

−3

0

315

−2

−1

270 0

1

[pixels/frame]

2

3

4

−50 0

50

100

150

200

Incidence angle

Figure 6. Velocity computation based on temporal differences in a four pixel set. Edge incident at different angles around 360o and velocities between 1 and 3 pixel/frame. a) NQ = 16 quantization points, 0.25 pixels/frame intervals. b) NQ = 64 quantization points, 0.2 pixels/frame intervals.

a)

b) Figure 7. High level simulation using real images to validate the architecture. a) Translational motion, camera moving to the left. b) Divergent motion, camera approaching the tree. The Motion Vision Sensor has been simulated in software to test the algorithmic performance with real image sequences. Several image sequences were used to test and validate the velocity computation process. Simulation results from two sequences are illustrated in Figure 7. The first sequence has translational motion that is successfully recovered with some noise caused by random illumination changes in the background texture. False signalizations are due to the low contrast of features. The second sequence is from a divergent motion produced by a camera approaching the scene. The algorithm achieve good estimation of motion. The architecture does not deal with the aperture problem, but the velocity field provided by the sensor can be easily processed by higher levels of the system. With real time operation, further improvements are possible by integrating in time the motion results from several frames, eliminating false detections.

5. VLSI Implementation The architecture was validated using simulations at different levels. At the pixel level, analog simulations using HSPICE were performed, and results were confirmed with early VLSI prototypes. The communication protocol and the architecture performance were validated using an HDL language. Finally, the whole system was simulated with a programming language to validate the motion computation algorithm. The architecture was implemented in a 1.5 microns 2metal, 2-poly, CMOS process. The fabricated prototype is shown in figure 8. The pixel occupies an area of 100 x 100 microns2 and it contains 33 transistors, including illumination read-out circuitry. The pixel design minimizes the use of capacitances, which are not as scalable as transistors in high performance microelectronic processes.

The latest prototype is a sensor with a 48x48 pixels resolution in a total area of 6.1 x 6.1 mm 2. Pixel coordinates are encoded in two 6-bits buses and communicate off-chip with Request and Acknowledge lines. The sensor also integrates scanning circuitry to read-out the illumination image through a dedicated analog output line.

20ms

Vo

Vc

5V

The sensor is mounted in a custom camera for testing purposes. The velocity computation module is implemented as a high performance 8-bits microcontroller. The microcontroller serves the sensor requests using interruption routines, and time-stamps are generated with one of the microcontroller timers. The microcontroller serial interface is used to communicate the velocity data to a host computer. Figure 9 presents the custom camera where the prototypes are being tested.

Ry

5V

Vm

500mV 200mV

Figure 8. Fabricated prototype

Figure 9. Prototype camera for sensor test A test from an individual pixel is shown in figure 10. The sensor is illuminated with a pulsing LED at low frequency (1 to 30 Hz). The LED is fed with an square wave which rise and fall time are controlled to simulate different edge/background contrasts. The instantaneous output (Vo) follows the illumination changes, while the time-adapted output (Vm) responds to the edge with a small spike and then adapts to the illumination level. Both signals are compared with a non-linear comparator and the output is used to trigger the static RAM. The comparator output (Vc) is to narrow to be seen. There are some false comparisons in the negative transitions not affecting the static RAM due to a special mechanism to avoid retriggering. The state of the memory cell is depicted as an inverted signal (Ry) used to pull down the row request line. The SRAM is set correctly at positive and negative edge transitions. It is reseted by an external signal simulating the acknowledge value. The whole sensor is under test at the moment of writing the article. Results will be presented in another forum.

Figure 10. Self-signaling pixel measurements. Pulsing light at 5 Hz simulating a moving edge. The memory is triggered on the positive and negative illumination transitions (see text).

6. Conclusions We have developed an alternative architecture for motion computation combining neuromorphic approaches with asynchronous communication and digital processing. The sensor detects moving edges and communicates their positions in real-time. An address event protocol, combined with a time-stamp scheme, labels each motion event for later velocity computation. Separating velocity computation from motion detection simplifies the sensor pixel

design without sacrificing performance, benefiting the system with an optimal pixel size for large array applications. The use of temporal stamps in the digital domain, eliminate the problems associated with on-pixel analog computation of velocity vectors, prone to low precision and large area consuming. High resolution sensors are feasible using current microelectronics technologies. For example, with a 0.5 microns CMOS process, a 256x256 sensor design would fit in a 10x10 mm2 die. The architecture is an optimum balance between analog VLSI and digital techniques. The design could be integrated effectively to other focal plane architectures for smart imagers or complex vision machines.

References [1]

T. Kanade and R. Bajcsy, editor., Computational Sensors, Report from DARPA Workshop, University of Pennsylvania, May 11-12, 1993. [2] C. Koch, B. Mathur. Neuromorphic vision chips, IEEE Spectrum, May 1996, pp. 38-46. [3] E.A. Vittoz. Analog VLSI signal processing: why, where and how?, VLSI Signal Processing, vol 8, pp 27-44, 1994. [4] C. Koch and H. Li, editors. Vision Chips, Implementing Vision Algorithms with Analog VLSI Circuits. IEEE Computer Society Press, 1995. p. 511. [5] J. Tanner and C. Mead. Optical motion detector, in C. Mead (editor), Analog VLSI and Neural Systems, Reading, MA. Addison-Wesley. pp 229-255, 1989. [6] T. Delbruck. Silicon retina with correlation-based velocitytuned pixels, IEEE Trans. on Neural Networks, 4(3), pp 529541, May 1993. [7] L. Dron. Computing 3D motion in custom analog and digital VLSI, Ph.D. thesis, Massachusetts Institute of Technology, 1994. [8] R. Etienne-Cummings, S. Fernando, N. Takahashi, V. Shtonov, J. Van der Spiegel, P. Mueller. A New Temporal Domain Optical Flow Measurement Technique for Focal Plane VLSI Implementation, Proc. Computer Architectures for Machine Perception, pp. 241-250, 1993. [9] J. Kramer, R. Sarpeshkar, C. Koch. An analog VLSI velocity sensor, Proc. ISCAS, Vol 1, pp 413-416, 1995. [10] J. Kramer, R. Sarpeshkar, C. Koch. Analog VLSI motion discontinuity detectors for image segmentation, Proc. of the IEEE Symposium on Circuits and Systems. Vol 2, pp 620623, 1996. [11] M. Arias-Estrada, M. Tremblay, D. Poussart. A Focal Plane Architecture for Motion Computation, Journal of Real-Time Imaging, Special Issue on Special-Purpose Architectures for Real-Time Imaging, 2(6), pp 351-360, Dec. 1996. [12] M. Arias-Estrada, M. Tremblay, D. Poussart. Computational Motion Sensors for Autoguided Vehicles, 30th ISATA, dedicated conference on Robotics, Motion and Machine Vision in

the Automotive Industry. pp 101-108, june 1997. [13] T. Delbruck and C. Mead. Phototransduction by ContinuousTime, Adaptive, Logarithmic Photoreceptor Circuits, Tech. Rep., California Institute of Technology, Computation and Neural Systems Program, CNS Memorandum 30, Pasadena, CA 91125, 1994. [14] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sivilotti, D. Gillespie. Silicon auditory processors as computer peripherals, IEEE Transactions On Neural Networks, Vol. 4, No. 3, pp. 523-527, May 1993. [15] M. Mahowald. An Analog VLSI System for Stereoscopic Vision, Kluwer international series in engineering and computer science. Kluwer Academic Publishers, 1994. p 215.

Acknowledgments Research on computational sensors for motion is supported by Project MS-4 of the Institute of Robotics and Intelligent Systems of Canada. Design tools and fabrication access were provided by the Canadian Microelectronics Corporation. Miguel Arias-Estrada was supported financially by a CONACYT-MEXICO postgraduate scholarship, and a grant from the Universidad de Guanajuato, Mexico.