Embedded Software for Autonomous Vehicle Control Using Optical ...

6 downloads 33811 Views 2MB Size Report
developing novel image processing and control software for AV use. This paper surveys progress in new algorithms for human and vehicle tracking within images, new techniques for modelling .... select the best colour space model (CSM). ... (b)-(f). Rank- ordered 5 images in YCbCr, RGB, YIQ, Lab and Luv colour spaces.
Embedded Software for Autonomous Vehicle Control Using Optical Sensing: SEN002 Contributions and Futures Greg Michaelson1, Andy Wallace2, Kevin Hammond3, Armelle Bonenfant3, Zezhi Chen1, Christoph Herrmann3 and Benjamin Gorry1 (1) Computer Science, Heriot-Watt University (2) Electrical & Computer Engineering, Heriot-Watt University (3) School of Computer Science, University of St Andrews, St Andrews

Abstract The SEN002 project explored both the development of models and analyses of predictable space and time bounds for the Hume programming language, and their deployment in developing novel image processing and control software for AV use. This paper surveys progress in new algorithms for human and vehicle tracking within images, new techniques for modelling and analysing resource use, presents evaluations of both Hume and automatic worst case execution time analysis, and looks forward to future research building on the project’s contributions. Keywords: embedded systems, resource bounds, worst-case execution time, motion tracking, AV control. Introduction The SEN002 project, ‘Embedded software for autonomous vehicle control using optical sensing’, was supported by the SEAS DTC from September 2005 to November 2007. Its objectives were to: 1.

model time and space resource consumption for high level language constructs;

2.

construct formally verifiable automatic program analyses based on these resource cost models, and thereby enhancing system integrity;

3.

develop new high-level algorithms for ego-motion computation, target tracking and environmental mapping for autonomous vehicle control, which meet strong resource requirements, and which use our language technology;

4.

extend models and analyses to incorporate external components written using either our own or more

conventional approaches (such as Matlab-generated C source code). Connections between these objectives are shown in Figure 1. We have made significant progress against all of our objectives, as reported in [1,2]. Here we survey new results in algorithms, models and analyses, report the use of Hume as a language for image processing, and the accuracy of automatic worst case execution time (WCET) analysis.

Figure 1: SEN002 Objectives

Algorithms Since the last SEAS DTC presentation [2], we have made a number of changes and

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

improvements to the basic mean-shift [3] and level set [4,5] methodologies, developing a hybrid approach, to track human or other deforming subjects through video sequences. In general, the subject has been defined by an enclosing but deforming contour and a colour distribution within that contour, and the background may be static (fixed camera) or moving (panning camera) and defined by another colour distribution. The colour distributions within the foreground and background may change due to a different viewpoint or change of illumination. There are two algorithmic developments. First, we have made the colour model adapt to changes in the foreground and background distributions, and second we have included a motion vector field as an additional discriminator to cope with those cases where the foreground and background colour distributions are very similar. Each of these is important when both the camera and the subject are moving.

E (C ) = μ ⋅ length(C ) +

λ fg λbg

∫∫ (F ∫∫ (F Ω fg

Ωbg

fg ( I ( x,

bg ( I ( x, y ), cbg

R f + Rb = M . The energy formulation has the form: k b cbg = (c kb1 ,..., c kR ), b

)dxdy

(1)

Ω fg where C is the boundary curve of Ω fg = c kf1 U ... U c kRf f (shaded in Figure 2). is the foreground (object) which is inside C, and the complement of b b Ω bg = c k1 U ... U c kRb is the background which is outside C.

c 12b

c 11b

c11f c1bRb

c1fR f

ckb1

ckf1

ckb2

b ckR b

ckRf f

c Nb 1

Review of the Level Set Method As described earlier [2], our methods included the minimization of an energy based function, based initially on colour distributions, to perform segmentation and tracking. Describing image foreground and background regions by a variational model increases the flexibility of the representation, allowing additional features, such as shape knowledge, texture, and motion vectors. Originally, we assumed a-priori knowledge of the colours of the object to be isolated. Given an Nchannel image I ( I 1 ,...I N ) , and a set of different colours/intensities c = (c1, c2, …, cM). Then, ci , (i = 1,...M ) are vectors of length N. The components of foreground and background colours of the kth channel c kfg = (c kf1 ,..., c kRf f ) are and

)

y ), c fg dxdy +

c

c Nf 1

b N2 f cNR f

b c NR b

Figure 2: An image with N channels and a set of M different colours

Adaptive Tracking The main objective in using colour is to distinguish an object from the background. The problem we have, particularly in the case of a panning camera and with variable illumination, is that the colour distributions can change as the tracking proceeds through a sequence. To adapt to changing colour distributions, we use a ‘centre-surround’ approach to sample pixels from object and background. In the first implementation we employed a 16 component adaptive colour model [6] to select the best colour space model (CSM). We sorted the several candidate CSMs using the Bhattacharyya coefficient, an

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

Chamfer Distance Transform (NCDT) to improve the accuracy of target representation and localization, discussed in [7]. However, we have subsequently replaced the 16 components, which are not statistically independent, with the 5 different colour spaces, i.e. RGB, Luv, YIQ, YCbCr and Lab. Although more justifiable, in practice there is little difference in the results of tracking using either model, either in terms of accuracy of tracking or in terms of processing time per frame.

approximate measurement of the amount of overlap between the two distributions of foreground and background. An inner, rectangular set of pixels is chosen to represent the object pixels, while an equal area surrounding rectangular annulus of pixels represents the background. For the internal rectangle of size h × w pixels, the outer margin of width ( 2 − 1)hw / 2 pixels forms the background sample. Rather than use the standard, Epanechnikov kernel, we also used a kernel weighted by the Normalised

(a)

(d)

(g)

(h)

(b)

(c)

(e)

(f)

(i)

(j)

Figure 3: (a) A sample image with concentric boxes delineating the object and background. (b)-(f). Rankordered 5 images in YCbCr, RGB, YIQ, Lab and Luv colour spaces. (g)-(j) Applying the adaptive, hybrid technique to the video sequence

Figure 3(a) shows a sample image from an extended sequence with concentric boxes delineating the object and background. The set of all 5 candidate images after rank-ordering the features are shown in

Figure 3(b)-(f). The images with the most and least discriminative features are 3(b) and 3(f). Figures 3(g)-(j) show how the algorithm is robust to clutter and occlusion. Two people cross in the third

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

picture, yet the algorithm adapts the contour to track the non-occluded portion of the woman, then re-grows the contour as she re-emerges from behind the man. Including Motion Vectors In the above, we used colour as the primary property to distinguish the foreground and background. The motion competition model is based on the assumption that objects are defined in terms of homogenously moving regions, extending the Mumford-Shah [8] functional of piecewise constant intensity to piecewise parametric motion [9]. Hence, the level set method can be adapted to incorporate motion field information given two or more consecutive frames from an image sequence. A simple way to do this is to include the motion vector field within the energy formula (1). In these experiments we used a 5-channel image (R, G, B, Vx, Vy) where Vx and Vy are the x and y components of the motion vectors, respectively. We did not use the adaptive colour model. Figures 4-6 show experimental results from a real image sequence, tracking a vehicle. The segmentation results without and with motion vectors, in addition to colour, are shown in Figures 4 and 5 respectively. The key observation is that segmentation on the basis of colour will include both the car and the cube, but if the motion vector field is also included, then the car is uniquely segmented as the region of interest. When the car is tracked using the hybrid-mean shift and level set method through the whole sequence, the shape of the car is maintained as it passes the cube as shown in Figure 6.

comparison with results presented previously, we have a new, improved Hume compiler at our disposal, developed by Robert Pointon as part of an EU programme. The input data is a sequence of 320x240x3 colour images. The unit of execution time is a second. This includes both the I/O and the processing times. The Matlab code has been run on a PC Pentium 4 CPU 3.40 GHz with 1GB RAM. The Hume and C++ programs were run on a BWLF01 server, with a Pentium 4 CPU 3.00GHz and 507MB RAM. The results are given in Table 1. As can be seen, this results in a significant increase in processing speed over the previously reported results, even when I/O is included. For example, it is now possible to track an object within a video sequence at a rate of 8.23 frames.s-1, rather than 0.97 frames.s-1 using the old compiler. This increases to 15.38 frames.s-1 if I/O is not included. Clearly, this is by no means comparable with straight C code, which can track 1480 frames.s-1 if I/O is not included, but is nevertheless useful for experimental work. Similarly, if the image and kernel sizes are changed, then the processing times change. For a small gray image size of 32x24 pixels, and a target size of 19x7 pixels, the processing speed is much faster than with the larger colour image in HumeC (2007). If including I/O time, the speed is 223 frames per second. If not including the I/O time, this increases to 249 frames per second. This code can be altered relatively easily for a smaller image kernel, which may be necessary to run this as an exemplar on a Renesas architecture for full cost analysis.

Comparing Segmentation and Tracking Algorithms in Matlab and Hume First, we consider a comparison of the basic mean shift algorithm [3]. In

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

Figure 4: Segmentation result without motion vectors

Figure 5: Segmentation result with motion vectors

Figure 6: Tracking the car with motion vectors

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

Table 1: Comparisons of execution rates in frames.s-1 for segmentation and tracking using the 2007 Hume Compiler Execution / Language

Including I/O

Not including I/O

Humec

8.23

15.38

Matlab

4.705

75

C++

6.274

1480

associated with each (sub-) expression, and the analysis will generate a set of constraints over those variables that give an upper bound on the WCET. The constraint set is solved using a (currently) linear equation solver, and concrete cost solutions are mapped back to the source program. In this way, WCET costs are associated with each expression and each function that is used in the program. For more details, see [12].

Models and Analyses Our work aims to construct fully automatic source-level static Worst Case Execution Time (WCET) analyses that are correlated to actual execution costs. Since we need to provide formal guarantees on WCET bounds, we base our work on a highquality abstract interpretation approach (AbsInt GmbH’s aiT tool [10]), to give low-level timing information for bytecode instructions. We combine this with an equally formal, type-based approach that lifts this information to higher-level language constructs so that it can be applied to source programs, without reanalysing bytecode timings for individual programs. This amortised cost approach (eg. [11]) allow costs to be averaged according to use. The basic intuition is that by amortising over the time costs incurred by common usage patterns (e.g. that for a stack, every push is balanced by a pop), we can construct timings that reflect more accurately real worst-case times. We have constructed formally-correct type based automatic analyses for determining worstcase execution-time costs, based on the amortised cost approach described in [12], and produced a prototype implementation. Each construct in the source program is given a type using a normal type-inference algorithm. At the same time, the usage of potential is calculated for that expression. (Internal) cost variables are automatically

Example: Drilling Robot Our case study here is the simulation of a robot for drilling printed circuit boards. The robot can move a drilling head in fixed-sized increments (here represented as integers) in two dimensions: pos_ok (xpos, ypos) = if xpos==0 && ypos==0 then 1 else 0; step (xpos,ypos,actions,dps) = case actions of [] -> (dps, pos_ok (xpos, ypos)) | (A:acs) -> step (xpos, ypos, acs, ((xpos,ypos):dps)) | (L:acs) -> step (xpos-1, ypos, acs, dps) | (R:acs) -> step (xpos+1, ypos, acs, dps) | (U:acs) -> step (xpos, ypos-1, acs, dps) | (D:acs) -> step (xpos, ypos+1, acs, dps);

At each position, the drilling head can perform a drilling action. After all holes have been drilled, the drilling head is to be moved to its starting point. The robot can perform five operations: A is the drilling action; the other actions move the head by

3rd SEAS DTC Technical Conference - Edinburgh 2008

A2

one position: L leftwards, R rightwards, U up and D down. Amortisation allows us to avoid calculating the worst case cost of a list of actions as the length of the list multiplied by the worst case cost of any individual action. Our amortised analysis gives a WCET bound WCET in clock cycles depending on the number of occurrences #X of each action constructor X in the input, which is: WCET