A Benchmark for Vehicle Detection on Wide Area ...

37 downloads 0 Views 566KB Size Report
have attracted a great amount of research attention [1,2,3,4]. However, these approaches .... e CLIF red for on and rank-1 mance. single n [13]. Proc. of SPIE Vol.
A Benchmark for Vehicle Detection on Wide Area Motion Imagery Joseph Catrambonea, Ismail Amzovskia, Pengpeng Lianga, Erik Blaschb, Carolyn Sheaffb, Zhonghai Wangc, Genshe Chenc, Haibin Linga a

Computer & Information Science Department, Temple University, Philadelphia, PA, USA b Air Force Research Lab, USA c Intelligent Fusion Technology, Inc, Germantown, MD, USA {pliang,hbling}@temple.edu, {zwang, gchen}@intfusiontech.com, {carolyn.sheaff,erik.blasch.1}@us.af.mil

ABSTRACT Wide area motion imagery (WAMI) has been attracting an increased amount of research attention due to its large spatial and temporal coverage. An important application includes moving target analysis, where vehicle detection is often one of the first steps before advanced activity analysis. While there exist many vehicle detection algorithms, a thorough evaluation of them on WAMI data still remains a challenge mainly due to the lack of an appropriate benchmark data set. In this paper, we address a research need by presenting a new benchmark for wide area motion imagery vehicle detection data. The WAMI benchmark is based on the recently available Wright-Patterson Air Force Base (WPAFB09) dataset and the Temple Resolved Uncertainty Target History (TRUTH) associated target annotation. Trajectory annotations were provided in the original release of the WPAFB09 dataset, but detailed vehicle annotations were not available with the dataset. In addition, annotations of static vehicles, e.g., in parking lots, are also not identified in the original release. Addressing these issues, we re-annotated the whole dataset with detailed information for each vehicle, including not only a target’s location, but also its pose and size. The annotated WAMI data set should be useful to community for a common benchmark to compare WAMI detection, tracking, and identification methods. Keywords: Wide-Area Motion Imagery, Benchmark, Vehicle detection, Vehicle tracking

1. INTRODUCTION Analysis of wide area motion imagery (WAMI) is a challenging task due to its large spatial and temporal coverage that reduces the pixels on targets. Moving vehicle detection and tracking on WAMI is an important application and have attracted a great amount of research attention [1,2,3,4]. However, these approaches are evaluated on datasets from different groups, which might include significant differences based on the collection geometry, numbers of targets, and sensor types, etc.. The lack of a reasonable size and uniformly constructed benchmark dataset has prevented a thorough evaluation and comparison of state-of-the-art WAMI algorithms.

Sensors and Systems for Space Applications VIII, edited by Khanh D. Pham, Genshe Chen, Proc. of SPIE Vol. 9469, 94690F · © 2015 SPIE CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2178535 Proc. of SPIE Vol. 9469 94690F-1 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

Fig 1. An example off false positives in the original annnotation. The booats (marked with red) are annottated as vehicless.

To tackle the t problem for a common daata set, in this paper, p we consstruct a new beenchmark for vehicle v detectioon and tracking on o wide area motion imageery. This bencchmark is buiilt based on Wright-Patters W son Air Forcee Base (WPAFB009) data collecttion [15]. Trajeectories and loccations of movving vehicles are a provided with the release of the dataset. Trajectory T information is proovided by a peersistent identifier which alloows position data d to be corrrelated across imaages and times. Position infformation is prrovided in the format latitudee/longitude tupples and (x, y) pairs. However, the original an nnotation is nooisy to some exxtent. As show wn in Fig. 1, thhe boats in thee lake are labeeled as A the annotaations of staticc vehicles are not n included in the original grround truth, buut static vehiclees also vehicles. Also, play an im mportant role for f the evaluattion of detectioon algorithms.. For example,, Fig. 2 showss the vehicles in the parking lott which are nott identified in the t original releease. In order to improve the original o annotattion of WPAFB B09, we manuaally remove nooisy annotationns and label thee static vehicles. Besides B locatio ons, we also provide the posse of vehicles.. The trajectoory annotation is also refinedd with Temple Reesolved Uncerttainty Target History H (TRUT TH) associated target annotatiion. We will make m this benchhmark with improoved annotatio on publicly avvailable and hope h that this benchmark daataset has the potential to further f stimulate thhe research on n analysis of WAMI W data. o as foollows. Relatedd work is introoduced in Sectiion 2. Section 3 presents the detail The rest off the paper is organized of the consstruction of thee dataset. Sectioon 4 concludess the paper.

Proc. of SPIE Vol. 9469 94690F-2 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

Fig 2. An exam mple of lack of annnotations for staatic vehicles in parking p lot.

2. RE ELATED WORK W With the availability of WAMI W data, suuch as the Coluumbus Large Image Format (CLIF) ( 2006 dataset d [5, 18] [5,15], [ magery formatts has become more and moore popular, esspecially for moving m the researcch on understaanding large im vehicle dettection and traccking. Many exxamples includde temporal, sppatial, and freqquency context.. To capture thhe road informationn, temporal co ontext is propoosed in [3] whhich is useful too remove falsee positives thaat are away froom the road. In [6], spatial conteext is developedd by the same group, [3], to make m use of the information that t a true posiitive is a it than false f positives.. In [7], Haar features f and histogram of grradient more likelyy to have otheer candidates around (HOG) feaatures are comb bined with muultiple kernel learning to disccriminate the target t from bacckground. All of the works use CLIF2006 daataset [5] and 102 frames inn total are useed as testing data. d The bounndary of the caar, the boundary of o the front win ndshield are exxplored in [8] for f car detectioon, and the propposed algorithm m is evaluatedd on 12 image patcches of a Wasshington DC im mage set containing 320 carrs. In [9], vehhicle detectionn results are used u to construct trajectory whicch is further useed to refine thee detection resuults, and the allgorithm is alsoo tested on thee CLIF dataset [5]. W there are a several works using CLIF F dataset [5]. In I [18], the datta was registerred for For vehiclee tracking on WAMI, subsequentt analysis. A pioneering p worrk is done in [11] which uses background b suubtraction for vehicle v detectioon and associates the candidatess by minimizing the total costt based on apppearance and motion m informattion. In [4], rank-1 r mance. tensor approximation is used for data association. Inn [10], motionn context is used to boost traacking perform ontext is propoosed for multipple target assocciation in [11]. In [12], five state-of-the-art s single Maximum consistency co ms are evaluatedd on CLIF. Likkelihoods form m different feattures are fusedd for tracking inn [13]. target trackking algorithm

Proc. of SPIE Vol. 9469 94690F-3 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

In [14], daata association problem is foormulated as innference in a set s of Bayesiann networks. Beesides CLIF dataset, d WPAFB 2009 [15] is useed in [16] to evvaluate the propposed persistennt tracking algoorithm. Though moost of the abov ve work for WA AMI vehicle trracking uses thhe CLIF dataset [5], different groups use diffferent subset of the dataset, and d it is prohibitiive to obtain a fair comparisoon between the results. Diffferent from thee work LIF2006 data, in this paper,, we focus onn constructing a unified bennchmark for evvaluation of vehicle v on the CL detection and a tracking allgorithms baseed on the Wrigght-Patterson Air A Force Base (WPAFB09) dataset [15] - which includes cooarse annotatio ons of vehicle locations and trajectory. Thhe next sectionn describes thee data for the WAMI W benchmarkk.

3. DATA This work builds upon the t Wright-Pattterson Air Force Base (WP PAFB09) datasset [15], succeeeding the Coluumbus AMI 07 dataset [5]. While thee CLIF datasetts are still an exxcellent resourrce, they lack vehicle v Large Imagge Format WA or scene taagging for creaating a trainingg set which still requires inveesting tens or hundreds h of huuman hours maanually generating labels. Furthermore, due too restrictions inn the distributiion of the data, acquiring a usable u sample set s for ossibility. Similar to the CIFA AR [20], the annotation a is diistributed in coonvenient 'pickkled' or testing borrdered on impo gzipped sets. The Wrighht-Patterson Aiir Force Base 2009 2 image seet contains on the order of 30072 training, 3072 3 evaluation, and 3079 test images of sizzes as high ass 26000 by 21000 pixels. These imagees are also disstributed in sm maller, subsampled forms to inccrease their easse of use. Alongside the imaages is a set off tracking dataa, distributed inn CSV format, coontaining GPS coordinates, track informattion, and posiitional informaation. The daata, however, is not without isssue. a were encounteredd, as shown inn Fig 3. Positional While exaamining the orriginal data, a number of anomalies anomalies consist of noisse in the (x, y)) annotations of o the vehicle tracking t pointss. It can be roughly estimateed that t seventeen thousand t pointss drifted from center to eitherr end of the veehicle. Occasioonally, around onee thousand of the vehicles were w entirely ob bstructed by buildings, b treetops, or found on bridges/higghway passoveers. On at leaast one occasion, vehicles v were reported wherre none were present p or obsccured. In the WAMI W benchm mark, these ouutlying points werre removed fro om the dataset. Additionally,, static vehiclees were unlabeled, that is, noot all vehicles visible v from the camera c array were w labeled. Water vehiclees (e.g., boats, kayaks, etc), were includedd as tracking points. p These weree removed from m the final WA AMI benchmarrk dataset.

F 3 False posittive labels in origginal dataset Fig.

Proc. of SPIE Vol. 9469 94690F-4 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

To supplement the existing annotation on the WPAFB09 dataset, additional filtering and information gathering was performed. In addition to rectifying the position data, bounding boxes and orientation information was gathered and included in the WAMI Benchmark release.. Annotations were created using an HTML5/JS-based frontend with a Python/SQLite backend. This allowed data to be examined even as annotation were being added and allowed multiple individuals to perform labeling in tandem. The JavaScript interface contained tools to add vehicles, remove errant vehicles, move labels, resize the bounding boxes, and reorient the heading information. Fig. 4 shows the TEMPLE interface of the annotation system.

Get Patch

Select Nonnals Only:

Gel Pate hi Select Nonnals Only: U

Fig. 4 Interface of the annotation system.

The backend was a thin RESTful API sitting in front of a Sqlite3 database. Requests would draw from a pool of working data, one which contained regions not yet labeled. When a request for work was made, a region was obtained, existing annotation data was loaded for the region, and a segment of the parent image was cropped. This data was encoded in JSON and sent to the JavaScript client. In the case of the image data, Base64 encoding was used as well. The full source of the application is available at: https://github.com/JosephCatrambone/VehicleLabellingTool . After gathering the WAMI target data, an additional filtering step was required to remove obstructed vehicles and non-vehicle entities. The filtering was performed by drawing each vehicle's tracking point as a red dot on the image. These mislabeled points were 'painted over' in the GNU Image Manipulation Program. Mislabeled points generally fall into one of the following categories: vehicle obstructions (tracking points which labeled vehicles under objects like trees, bridges, etc), water vehicles, and outlying points. After painting, a difference mask was created and points contained in the painted regions were removed. Some positive examples are shown in Fig. 5. Negative examples were generated by selecting points at random which were a minimum of 512 pixels from all positive examples, and with non-zero variance. Fig. 6 shows some negative samples. A subset of these examples were, like their positive counterparts; manually vetted to confirm the absence of vehicles.

Proc. of SPIE Vol. 9469 94690F-5 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

We have provided the tracking data (which contains: IDs, position, heading, and image information) as a SQLite database. In addition, we have provided excerpts of 64 by 64 images in a format that is consistent with CIFAR10, making for easy and direct integration into existing machine learning systems. There are 32768 positive and 32768 negative examples selected from the raw WPAFB09 data, and 17676 positive and 2000 negative examples vetted by human reviewers. Datasets and experimental source code is available at: http://josephcatrambone.com/projects/wpafb09/datasets.html

Fig. 5 Positive examples

Fig. 6 Negative examples

4. CONCLUSION In this paper, we construct a unified benchmark for WAMI vehicle detection and tracking on wide area motion imagery. We hope that our work can attract more research attention on a fair comparison analysis of wide area motion imagery by making the evaluation of vehicle detection and tracking algorithms more easy, consistent, and comparable. Our future work includes developing a unified evaluation protocol (e.g., images and metrics) so that the evaluation can be fairer, a challenge problem [19], and a forum for multiple presentations on the same data

REFERENCES [1] V. Reilly, H. Idrees, and M. Shah, “Detection and tracking of large number of targets in wide area surveillance”, in ECCV, 2010. [2] J. Xiao, H. Cheng, H. S. Sawhney, and F. Han, “Vehicle detection and tracking in wide field-of-view aerial video”, in CVPR, 2010. [3] P. Liang, H. Ling, E. Blasch, et al.”Vehicle detection in wide area aerial surveillance using Temporal Context”, in International Conference on Information Fusion (FUSION), 2013 [4] X. Shi, H. Ling, J. Xing, et al. “Multi-target tracking by rank-1 tensor approximation”, in CVPR, 2013. [5] “Clif 2006,” https://www.sdms.afrl.af.mil/index.php?collection =clif2006. [6] P. Liang, D. Shen, E. Blasch, et al. “Spatial context for moving vehicle detection in wide area motion imagery with multiple kernel learning”, in SPIE Defense, Security, and Sensing, 2013. [7] P. Liang, G. Teodoro, H. Ling, et al. “Multiple kernel learning for vehicle detection in wide area motion imagery”, in International Conference on Information Fusion (FUSION), 2012. [8] T. Zhao and R. Nevatia, “Car detection in low resolution aerial image”, in ICCV, 2001. [9] X. Shi, H. Ling, E. Blasch, et al. “Context-driven moving vehicle detection in wide area motion imagery”, in ICPR 2012. [10] X. Shi, H. Ling, W. Hu, et al. “Multi-target Tracking with Motion Context in Tensor Power Iteration”, in CVPR, 2014.

Proc. of SPIE Vol. 9469 94690F-6 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms

[11] X. Shi, P. Li, H. Ling, et al. “Using maximum consistency context for multiple target association in wide area traffic scenes”, in ICASSP, 2013 [12] H. Ling, Y. Wu, E. Blasch, G. Chen, and L. Bai, “Evaluation of visual tracking in extremely low frame rate wide area motion imagery”, in International Conference on Information Fusion (FUSION), 2011. [13] R. Pelapur, S. Candemir, F. Bunyak, M. Poostchi, G. Seetharaman, and K. Palaniappan, “Persistent target tracking using likelihood fusion in wide-area and full motion video sequences”, in International Conference on Information Fusion (FUSION), 2012. [14] J. Prokaj, M. Duchaineau, and G. Medioni, “Inferring tracklets for multi-object tracking”, in Workshop of Aerial Video Processing joint with IEEE CVPR, 2011. [15] C. Cohenour, R. Price, T. Rovito, and F. van Graas, “Camera Models for the Wright Patterson Air Force Base (WPAFB) 2009 Wide Area Motion Imagery (WAMI) Data Set,” IEEE Aerospace and Electronic Systems Magazine, in press. [16] J. Prokaj, G. Medioni. “Persistent Tracking for Wide Area Aerial Surveillance”, in CVPR, 2014 [17] E. Blasch, G. Seetharaman, S. Suddarth, K. Palaniappan, G. Chen, H. Ling, A. Basharat, “Summary of Methods in WideArea Motion Imagery (WAMI),” Proc. SPIE, Vol. 9089, 2014. [18] Mendoza-Schrock, O., Patrick, J. A., Blasch, E., “Video Image Registration Evaluation for a Layered Sensing Environment,” Proc. IEEE Nat. Aerospace Electronics Conf. (NAECON), (2009). [19] Blasch, E., Deignan, P. B. Jr, Dockstader, S. L., Pellechia, M., Palaniappan, K., Seetharaman, G., “Contemporary Concerns in Geographical/Geospatial Information Systems (GIS) Processing,” IEEE Nat. Aerospace and Electronics Conference, (2011). [20] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” Tech. Report, University of Toronto, 2009.

Proc. of SPIE Vol. 9469 94690F-7 Downloaded From: http://spiedigitallibrary.org/ on 06/23/2015 Terms of Use: http://spiedl.org/terms