Extracting Interesting Vehicle Sensor Data Using ...

1 downloads 0 Views 515KB Size Report
video clip1 illustrates the effect. Note how the sampling rate is generally low which gives the effect of watching the video in “fast forward” mode but interesting ...
Extracting Interesting Vehicle Sensor Data Using Multivariate Stationarity Kari Torkkola, Keshu Zhang, Chris Schreiner

Noel Massey

Motorola Labs 2900 S. Diablo Dr., MD ML286 Tempe, AZ 85282

Motorola Labs 1295 E. Algonquin Rd. Schaumburg, IL, 60196

[email protected]

[email protected]

Abstract— Unsupervised modeling of sequentially sampled sensor data typically results in modeling resources getting allocated in proportion to the occurrence of different phenomena in the training data. This is a problem when most of the data is uninteresting but there are rare interesting events. As a consequence of this inbalance, the rare events will either not become well represented in the model or an undesirably large model will be needed to satisfy performance measures. We present an approach to resample the data in proportion to the interestingness of the data where interestingness is defined as the multivariate stationarity of a weighted set of important variables. We present a case in modeling vehicle sensor data with the intent of modeling driver actions and traffic situations. We analyzed driving simulator data with this approach and report results where instance selection using the interestingness filtering resulted in models that correspond much better to human classification of different driving situations.

I. I NTRODUCTION Instance selection refers to the transpose problem of feature selection common in data mining and machine learning: selection of data instances in order to condense a large data set to a manageable size or to preserve some important aspects of the data set [8]. The subject of this paper belongs to the latter category and is concerned with observing a stream of sensor data, from which some rare incidents or anomalies need to be detected. Such problems include, for example, network intrusion detection, surveillance, face detection, and driver monitoring. Constructing accurate models of such data is a challenge for the following reason. Unsupervised modeling, such as clustering, of sequentially sampled sensor data typically produces the result that modeling resources get allocated in proportion to the occurrence of different phenomena in the training data. If most of the training data is uninteresting, most of the model may represent the uninteresting data, too. We present a case in modeling the sensor data from an automobile with the intent of modeling driver actions and traffic situations. Our aim is to construct sequential, hierarchical, and stochastic models of this data for the purposes of automatically detecting situations that might require assisting the driver or changing the mode how information is presented to the driver. We describe first the driving modeling problem based on a hierarchical approach. Subunits of the driving data are discovered by clustering. This requires a balanced training

data set. In order to construct such a set, we then describe how multivariate stationarity can be used to resample the data so that interesting portions of driving are sampled more frequently than uninteresting portions. Experimentation is presented comparing clustering results with original simulator data to resampled driving simulator data. II. M ODELING NATURALISTIC D RIVING There are two approaches to collecting large databases of driving sensor data from various driving situations. One can outfit a fleet of cars with sensors and data collection equipment, as has been done in the NHTSA hundred car study [9]. This has the advantage of being as naturalistic as possible. However, the disadvantage is that potentially interesting driving situations will be extremely rare in the collected data. Realistic driving simulators provide much more controlled environments for experimentation and permit the creation of many interesting driving situations within a reasonable time frame. Furthermore, in a driving simulator, it is possible to simulate a large number of potential advanced sensors that would be yet too expensive or impossible to install in a real car. This will also enable us to study what sensors really are necessary for any particular task and what kind of signal processing is needed to create adequate models given those sensors. A. Driving Simulator as a Data Source We collect data in a driving simulator lab, which is an instrumented car in a surround video virtual world with full visual and audio simulation (although no motion or G-force simulation) of various roads, traffic and pedestrian activity. The driving simulator consists of a fixed based car surrounded by five front and three rear screens (Fig. 1). All driver controls such as the steering wheel, brake, and accelerator are monitored and affect the motion through the virtual world in real-time. Various hydraulics and motors provide realistic force feedback to driver controls to mimic actual driving. The basic driving simulator software is a commercial product with a set of simulated sensors that, at the behavioral level, simulate a rich set of current and future onboard sensors in the near future. This set consists of a radar for locating other traffic, a GPS system for position information,

a camera system for lane positioning and lane marking, and a mapping data base for road names, directions, locations of points of interest etc. There is also a complete car status system for determining the state of engine parameters (coolant temp, oil pressure etc) and driving controls (transmission gear selection, steering angle, window and seat belt status etc.). The simulator setup also has several video cameras, microphones and infrared eye tracking sensors to record all driver actions during the drive that is synchronized with all the sensor output and simulator tracking variables. Altogether there are 425 separate variables describing an extensive scope of driving data - information about the auto, the driver, the environment, and associated conditions. An additional screen of video is digitally captured in MPEG2 format, consisting of a quad combiner providing four different views of the driver and environment. Combined, these produce around 400Mb of data for each 10 minutes of drive time.

Fig. 1.

The driving simulator.

B. Driving Data Annotation The purpose of the data annotation is to manually label the sensor data with meaningful classes. Supervised learning and modeling techniques then become available with labeled data. For example, one can train classifiers for maneuver detection or inattention detection [12], [13]. We have developed a special purpose data annotation tool for the driving domain. This was necessary because available video annotation tools do not provide a view of the sensor data, and tools meant for signals, such as speech, do not allow simultaneous and synchronous playback of the video. The major properties of our annotation tool are 1) Ability to navigate through any portion of the driving sequence. 2) Ability to label (annotate) any portion of the driving sequence with proper time alignment 3) Synchronization between video and other sensor data 4) Ability to playback the video corresponding to the selected segment 5) Ability to visualize any number of sensor variables. 6) Provide persistent storage of the annotations 7) Ability to modify existing annotations

Since manual annotation is a tedious process, we are working on automating parts of the process by taking advantage of classifiers trained for various driving maneuvers. Annotation becomes then an instance of active learning [1]. Only if a classifier is not very confident in its decision, its results are presented to the human to verify. For the purposes of this paper we experimented with aiding the human decisions by clustering the data and displaying the clustering results along with the sensor data. We describe these results in Sec. V. C. Driving Data Modeling An action taken by the driver, or a driving maneuver, typically consists of a sequence of sensor observations that repeats with variations. These variations, as reflected in the sensor data stream, need to be captured in a sequential model. Speech recognition community has modeled the speech signal using stochastic graphical models, Hidden Markov Models (HMM) [10]. We take the same approach in modeling the sensor stream acquired from an automobile or from a driving simulator. The existence of well-defined subunits in speech, phonemes, words, and sentences, enables hierarchical modeling of the speech signal. There is no need to construct a distinct acoustic model for each word in the language as it suffices to construct a model for each phone(me). Words are then modeled as concatenated phoneme models according to a pronunciation dictionary, and sentences are concatenations of word models according to some kind of a language model. Thus, there is a small number of subunits that are shared in the upper level of the model hierarchy. In learning these models, training data becomes utilized much better because of parsimony in representation. Modeling driving shares some aspects with speech recognition. From the sensor point of view, driving in a car produces a stream of parameters from various different sensors. From a driver assistance system point of view, the sensor stream needs to be segmented in time into different context classes that are relevant to driving. The “sentence” of driving should be segmented into “words” of driving, that is, maneuvers. However, no “phonemes” exist for driving. Thus each different maneuver has to be modeled as a discrete entity with no shared parts. We attempt to discover such subunits that enable sharing for the purposes of modeling driving sensor data. We call these subunits “drivemes” [15]. The general idea is to model each maneuver via HMMs and find the states or sequences of states that are common to the maneuvers. These common states or state sequences represent the drivemes and can be used as building blocks to model the various maneuvers. The steps involved in learning the drivemes can be summarized as follows: 1) Collect ”naturalistic” driving data. 2) Annotate the data with different maneuver labels. 3) Build HMMs for each maneuver using the features/sensors that discriminates them the best [14]. 4) Cluster the states of these HMMs. All those states that are similar will belong to the same cluster. These

clusters will represent the common patterns among the various maneuver models. 5) Build tied state maneuver models using these cluster states. These clusters states or cluster state sequences that appear in common to more than one maneuver will characterize the various drivemes. The problem with this approach is that the “drivemes” will be dominated by driving that we are not interested in modeling. Therefore we wish to perform instance selection prior to learning any sub-units. III. W HAT IS I NTERESTING ? The definition of interestingness for instance selection depends on the application, of course. The cases may be divided in two categories, unsupervised and supervised. In the unsupervised case all that is available is the data without the actual label or target variable. In this case “interestingness” may be defined, for example, by preserving some sufficient statistics. On the other hand, in the supervised case all that really matters about the data is how it pertains to the target variable. In this case interestingness may be defined again by sufficient statistics but now these sufficient statistics describe the behavior of the target variable. A. What Driving Data is Interesting? The driving data modeling problem lies in between these two cases. Eventually the data will become labeled to mark the “maneuvers” or other interesting events, and classifiers for those events will define the interestingness. Before getting there, however, we must operate in an unsupervised fashion. For driving it is straightforward to make statements about what is interesting, and even to code those statements up into automatic detectors. Since we are interested in modeling “maneuvers” and “events”, by definition these mean change, and that change must be reflected somewhere in the sensor stream (otherwise the modeling task is not possible). Driving straight is generally not very interesting. Any action that the driver takes in response to some environmental condition is interesting. Any change in the environment is interesting (especially if the driver takes a consequent action). Any change in driver’s condition is interesting. These lead to using some kind of a difference operator as the detector. We describe next one such operator, the multivariate stationarity. Examples applying the operator to driving sensor data are presented in Sec. IV. B. Multivariate Stationarity The speech research community has used spectral (or cepstral) stationarity as an overall measure of change of speech spectrum (or cepstrum) [11], [7]. With minor changes, the same measure can be adopted to use with a set of disparate sensor signals. Multivariate stationarity of an N -dimensional sampled sensor signal can be computed by comparing the signal to itself after a time shift. The stationarity is written as PN i=1 |x(t + δ, i) − x(t − δ, i)| , (1) S(t, 2δ) = 1 − PN Kδ + i=1 |x(t + δ, i) + x(t − δ, i)|

where x(t, i) denotes the ith component of the tth sample of the sensor data vector. N is the number of sensors, 2δ is the time shift, and K is a scaling constant. Stationarity gives a scalar measure of the rate of change of the sensor vectors. It is equal to one if the sensors remain constant, and it approaches zero if the rate of changes in the N -dimensional sensor vector is extensive during the comparison interval of 2δ + 1. IV. U NINTERESTINGNESS F ILTERING We describe now our application of the stationarity to driving sensor data, and its conversion into a new resampling rate that varies in time according to interestingness. The first step is to select the variables from which the stationarity is computed. For the illustrations and experiments in this paper, we chose to use only four variables, accelerator pedal position (between zero and one, one denotes full throttle), brake pedal position (between zero and one, one denotes full braking), steering wheel angle, and distance to right lane edge. The last variable is a “virtual sensor” in the simulator which in the real world will be approximated by using an existing visual lane detection system. All variables participating in computation of stationarity are standardized to equalize their contributions. Each may then be weighted to give optional emphasis to desired variables. As the scaling constant in (1) we used K = 1, The data for these experiments was sampled at a rate of 10Hz, and the time window is δ = 5. The stationarity function may then be smoothed with a median filter, or a low-pass FIR-filter to remove insignificant extremes. We used a simple running average of the same window length of five samples. Figure 2 depicts these four sensor signals (the top four plots) and the computed stationarity. Horizontal axis is time in the units of the sampling interval (0.1 seconds). Vertical axis has no significance. For constant signals the stationarity remains one. When one of the variables changes, stationary drops, and when two (or more) variables change simultaneously or one changes significantly, stationarity drops towards zero. Stationarity can be converted into a parameter describing a non-constant sampling rate which can be used to sample interesting sensor data more frequently. This was done as follows. sˆ(t) = max(0, min(1, ((s(t) − smin )/(smax − smin )))) f (t) = fmin + (fmax − fmin )(1 − sˆ(t)), (2) where we have dropped the 2δ from stationarity for convenience. Thus sˆ(t) is basically clipped and rescaled stationarity, and f (t) is the relative new sampling rate varying between fmin and fmax . As the constants in (2) we used smin = 0.4, smax = 0.9, fmin = 0.03, and fmax = 1. The mapping from stationarity to relative sampling rate described in (2) is illustrated in the bottom part of Fig 2. Next, we resampled the data according to f (t). The effect of the variable sampling rate is illustrated in Figures 3 and 4 (See captions).

3

2.5

4 2 brake steeringWheel speed

3

1.5

accelerator brake distToRightLaneEdge steeringWheel Stationarity d=5 f=5 Relative Sample Rate

2

1

1

0.5

0

−0.5

0

1000

2000

3000

4000

5000

6000

7000

0

Fig. 3. Three variables plotted using normal sampling rate (10Hz). Compare to Fig. 4, which plots the same drive resampled. −1 250

300

350

400

450

500

550

600

650

700

750 3

Fig. 2. Four sensor signals and stationarity computed from them. The bottom graph depicts the relative sample rate derived from the stationarity.

2.5

2

To further visualize the effect of resampling, we also resampled video frames using the same function. An example video clip1 illustrates the effect. Note how the sampling rate is generally low which gives the effect of watching the video in “fast forward” mode but interesting portions such as turning, braking, and lane changes occur in more temporal detail giving the effect of automatically dropping out of “fast forward” mode and into normal “play” mode (or slower as the situation warrants). V. C LUSTERING R E -S AMPLED DATA In order to be able to construct good models of “interesting” portions of the driving, we experimented with clustering driving data after the non-uniform resampling. Even though with the simulator we are able to control to some extent the amounts and types of maneuvers the driver has to take, most of the driving data is still just dull straight driving. Clustering the data in order to define the sub-units in an unsupervised fashion would result in the majority of clusters representing this uninteresting driving. To give less weight to this kind of driving, we set the sampling rate proportional to the multivariate non-stationarity of steering, braking, accelerator, and the distance to right lane edge as described in Sec. IV. We present now two illustrations of the effect of this instance selection to clustering of the driving data. The first is visualization of the clustering using the Self-Organizing Map [5]. In the second experiment we compare the clustering to human annotation of the driving data. We will address the relevance of interestingness filtering to model construction in future work. A. Visualization of the clustering The effect of the instance selection to clustering of the driving data is illustrated in Figure 5. As the clustering method we used the Self-Organizing Map (SOM) [5]. This 1 http://www.eas.asu.edu/%7Eeee511/mmi/is.avi

brake steeringWheel speed

1.5

1

0.5

0

−0.5

0

50

100

150

200

250

300

350

400

450

500

Fig. 4. Three variables plotted using a variable sampling rate based on interestingness. Compared to Fig. 3, nonlinear warping has extended the portions of steering, and the changes in braking. Furthermore, periods of steady driving have been considerably shrunken.

method not only clusters multivariate data but also visualizes the clusters in relation to one another. The principles of using SOM as a tool for exploratory data analysis are well explained in [5], [4], for example. We will not discuss the methodology here, but illustrate how the component planes of the SOM can qualitatively characterize the data. The left panels of Figure 5 display the U-matrices of the SOMs [17], [16]. The U-matrix visualizes dissimilarities between adjacent nodes on the SOM. Large differences between nodes are visualized as light shades, areas similar to each other are drawn in dark shade. Two of the SOM component planes, brake pedal and accelerator usage, are visualized in the panels. In the clustering results of original driving data there is not much visible in these two component views of the SOM. This signifies that most of the clusters represent more or less incidentless driving. The situation is very different with the resampled driving data. Much clearer and more intense braking and accelerator usage clusters are visible in the bottom panels. Much larger number of modeling sub-units would thus be allocated to actual maneuvers rather than to plain driving.

B. Correspondence to Expert Annotation We also made an initial study of how well the clusters found in unsupervised fashion correspond to annotation decisions made by a human expert. The aim here is to use the cluster identity as an extra variable that the expert is looking at in making the decisions of labels. Our hypothesis is that clustering from resampled data should correspond better to human decisions. We collected data from one driver on three separate drives in the simulator, which resulted in approximately one hour of driving. The drives were meant to resemble normal realworld driving as much as possible. Therefore, the simulated world was designed to correspond to the local metropolitan area, using local street names and route numbers. In addition, a high density of ambient traffic was added to promote the realism of the driving scenario. A human expert then annotated each drive, using classifications that we determined to represent interesting driving. Some examples included occurrences in which the driver changed lanes, traversed curves, turned at intersections, passed other vehicles, came to a stop, and resumed driving from a stopped position. In all, there were 28 different classifications which defined interesting driving. In the clustering experiments, we employed k-means [2], Self-Organizing Map (SOM) [5], and Minimum Conditional Entropy Clustering (MEC) [6]. However, we should note that it is not suitable to perform these clustering methods on the driving data directly. Note that both k-means and SOM use the Euclidean distance to measure the dissimilarity between samples. But Euclidean distance does not make much sense for the driving data since they contain both realvalued variables (e.g. velocity) and categorial variables (e.g. brake pedal usage). Although MEC does not use Euclidean distance, it cannot be applied to categorial variables either. In order to apply the aforementioned clustering methods, we first use principal component analysis (PCA) [3] to transform the data and then perform the clustering methods on the transformed data. After the human expert annotated each drive, the results of the original and resampled data were analyzed to determine how closely the unsupervised clustering corresponded to the annotations made by the human expert. These results are shown in Table I. TABLE I C ORRESPONDENCE OF UNSUPERVISED CLUSTERING TO

HUMAN EXPERT

DECISIONS

Clustering Method k-means SOM MEC

Clusters from Original data 29.91% 25.73% 27.07%

Clusters from Resampled data 51.72% 49.72% 68.53%

A result of 100% would make annotation of the data unnecessary by an expert and provide a completely automated mechanism for this process. We can see that although far from this ideal, with all three clustering methods, clusters

derived from resampled data provide a significant improvement over the conventional clustering using all available data. An explanation as to why these numbers in general are so low is as follows. Informal observations of the original and resampled data suggested that the resampling resulted in a significant improvement in clustering uninteresting driving. The majority of the clusters from the resampled data that did not correspond to human expert decisions resulted from events in which the driver either came to a stop or started from a stopped position. In these cases, which the human expert classified as one event each, both the original and resampled data resulted in generally 6 to 10 clusters. This effect was less pronounced on the resampled data. VI. C ONCLUSIONS We described a preprocessing method for sequential sampled data that is able to emphasize interesting rare events occurring in the streaming sensor data. This mediates the undesired effect in unsupervised modeling of sequentially sampled sensor data, that typically allocates modeling resources in proportion to the occurrence of different phenomena in the training data, which is a problem when most of the data is uninteresting. We presented an approach to resample the data in proportion to the interestingness of the data. Interestingness is defined as the multivariate stationarity of a number of important variables. We showed how instance selection using the interestingness filtering, resulted in models that correspond much better to human classification of different driving situations. One application for this approach is to create annotation tools which can decrease the amount of time required for a human to label driving data. The ability to “fast forward” through uninteresting data and to have suggested labels generated by automated classifiers such as the one described above can speed up the labeling process allowing more time to be spent on problems such as driver modeling. R EFERENCES [1] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994. [2] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ, 1988. [3] I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986. [4] S. Kaski and T. Kohonen. Exploratory data analysis by the selforganizing map: Structures of welfare and poverty in the world. In A.-P. N. Refenes, Y. Abu-Mostafa, J. Moody, and A. Weigend, editors, Neural Networks in Financial Engineering. Proceedings of the Third International Conference on Neural Networks in the Capital Markets, London, England, 11-13 October, 1995, pages 498–507. World Scientific, Singapore, 1996. [5] T. Kohonen. Self-Organizing Maps. Springer, Berlin, Heidelberg, 1995. [6] H. Li, K. Zhang, and T. Jiang. Minimum entropy clustering and applications to gene expression analysis. In Proceedings of the 3rd IEEE Computational Systems Bioinformatics Conference, pages 142– 151, 2004. [7] J.-S. Lienard, M. Mlouka, J. Mariani, and J. Sapaly. Real-time segmentation of speech. In Preprints of the Speech Communications Seminar, volume 3, pages 183–187, Stockholm, Sweden, August 1-3, 1974 1974.

U−matrix

brake_rd3

U−matrix

accelerator_rd3

brake_rd3

accelerator_rd3

som−chrisall−40x40−var.mat

som−chrisall−40x40−var.mat

som−chris ll−40x40−var.mat

som−chrisresampled−40x40−var.mat

som−chris esampled−40x40−var.mat

som−chris esampled−40x40−var.mat

r

a

r

Fig. 5. A Self-Organizing Map view of the data. Top panels depict clustering from original driving data, bottom panels from resampled driving data. Leftmost panels depict the U-matrix. White areas in U-matrix denote borders between clusters of data. Darker areas denote homogeneous clusters. The middle panels show one SOM component plane, the time derivative of braking, and the rightmost panels depict another SOM component plane, the time derivative of the accelerator pedal use.

[8] H. Liu and H. Motoda. Instance Selection and Construction for Data Mining. Kluwer Academic Publishers, 2001. [9] V. Neale, S. Klauer, R. Knipling, T. Dingus, G. Holbrook, and A. Petersen. The 100 car naturalistic driving study: Phase 1experimental design. Interim Report DOT HS 809 536, Department of Transportation, Washington D.C., November 2002. Contract No: DTNH22-00-C-07007 by Virginia Tech Transportation Institute. [10] L. R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257– 286, 1989. [11] K. Torkkola. Automatic alignment of speech with phonetic transcriptions in real time. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP88), pages 611– 614, New York City, USA, April 11-14 1988. [12] K. Torkkola, N. Massey, B. Leivian, C. Wood, J. Summers, and S. Kundalkar. Classification of critical driving events. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA), pages 81–85, Los Angeles, CA, USA, June 23 - 24 2003. [13] K. Torkkola, N. Massey, and C. Wood. Driver inattention detection through intelligent analysis of readily available sensors. In Proceedings of the 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C., USA, October 3-6 2004. [14] K. Torkkola, S. Venkatesan, and H. Liu. Sensor selection for maneuver classification. In Proceedings of the 7th Annual IEEE Conference on Intelligent Transportation Systems (ITSC 2004), Washington, D.C., USA, October 3-6 2004. [15] K. Torkkola, S. Venkatesan, and H. Liu. Sensor sequence modeling for driving. In Proceedings of the 18th International FLAIRS Conference, Clearwater Beach, FL, USA, May 15-17 2005. AAAI Press. [16] A. Ultsch. Knowledge extraction from self-organizing neural networks. In O. Opitz, B. Lausen, and R. Klar, editors, Information and Classification, pages 301–306, London, UK, 1993. Springer. [17] A. Ultsch and H. P. Siemon. Kohonen’s self organizing feature maps for exploratory data analysis. In International Neural Network

Conference (INNC 90), pages 305–308, Paris, France, July 9-13 1990. Kluwer Academic Publishers.