of many classification algorithms. In this way ... by using the dynamic programming approach: as the global coupling ....  ââ, âDynamic programming algorithm optimization for spoken word recognition,â IEEE Transactions on. Acoustics ...
INTRODUCING PRIOR KNOWLEDGE IN TEMPORAL DISTANCES FOR SATELLITE IMAGE TIME SERIES ANALYSIS François Petitjean† , Jordi Inglada‡ , and Pierre Gançarski† †
LSIIT/University of Strasbourg, UMR 7005 – 67412 Illkirch Cedex – France ‡ CNES/CESBIO, UMR 5126 – 31401 Toulouse Cedex 9 – France
ABSTRACT Satellite Image Time Series are becoming increasingly available and will continue to do so in the coming years thanks to the launch of space missions which aim at providing a coverage of the Earth every few days with high spatial resolution. In the case of optical imagery, it will be possible to produce land use and cover change maps with detailed nomenclatures. It has been shown that the Dynamic Time Warping similarity measure is a consistent tool for the comparison of radiometric profiles of temporal evolution. Actually, it makes it possible to compare time series with both different lengths and different sampling. This property allows us to make the most of partially cloud-covered images, but also to transfer the knowledge learned on an agronomical year in order to classify the next year without using reference data. This article pursues this work on satellite image time series analysis and focuses on the introduction of constraints in the distance in order to fit to the expert’s knowledge about the observed phenomena. Index Terms— Remote Sensing, Crops, Time series analysis, Image classification, Knowledge management. I. INTRODUCTION ATELLITE Image Time Series (SITS, for short) are a precious resource for Earth monitoring. Current time series have either high temporal resolution (S POT-V EGETATION, M ODIS) or high spatial resolution (L ANDSAT, S POT-HRV). In the coming years, both high temporal and high spatial resolution SITS are going to be widely available thanks to the ESA’s S ENTINEL program. Nowadays, satellites as the Taiwanese F ORMOSAT-2 are already providing similar data, but with a limited coverage of the Earth’s surface and with only four spectral bands. Our research group focuses on the comparison of radiometric time series. The similarity measure is the key tool of many classification algorithms. In this way, having a
∗ The authors would like to thank the French Space Agency (CNES) and Thales Alenia Space for supporting this work under research contract n°1520011594 and the colleagues from CESBIO (Danielle Ducrot, Claire Marais-Sicre, Olivier Hagolle and Mireille Huc) for providing the land-cover maps and the geometrically and radiometrically corrected F ORMOSAT-2 images.
similarity measure between radiometric time series makes it possible to easily provide a temporal analysis of the sensed scene. We have recently introduced the Dynamic Time Warping (DTW) similarity measure to the remote sensing community –. We have shown that DTW: • consistently compares the radiometric time series, by capturing distorted behaviors; • is robust to the irregular sampling of the SITS (when several images per month are available – and therefore required for new applications – multi-temporal methods should be robust to the low regularity of the sampling at the scale of days); • makes it possible to remove cloud-contaminated values, without loosing neither the corresponding complete image, nor the corresponding radiometric series; • enables the comparison of time series of different lengths, which makes it possible to re-use previously defined reference data, i.e., ground truth, training samples. DTW appears thus to be well-suited for SITS analysis. However, we would like DTW’s “distortion power” to be parametrized in order to fit the expert’s knowledge about the observed phenomena. This work aims at showing how DTW can be constrained in order to avoid inconsistent temporal distortions. II. DYNAMIC TIME WARPING When studying radiometric evolutions of sensed areas over time, the core of the process generally consists of comparing data in order to estimate (dis)similarity, whatever the method is. The distance tool provides an estimation of this similarity. It is a critical tool, on which results of analysis methods heavily rely. When the data is temporal, the choice of the distance is even more crucial since it completely defines the way of tackling the temporality of the data. This work focuses on the Dynamic Time Warping (DTW) similarity measure introduced in ,  for time series comparison. This similarity measure is able to exploit the temporal distortions and to compare shifted or distorted evolution profiles. Moreover, DTW makes it possible to compare time series with different lengths and sampling, thanks to the optimal alignment of radiometric profiles.
DTW finds the optimal coupling between two sequences by using the dynamic programming approach: as the global coupling problem exhibits overlapping sub-problems for the resolution, the solutions to the sub-problems can be memoized into a two dimensional matrix (one dimension for each sequence). Given two sequences of lengths S and T , DTW computes their coupling by finding a minimum-cost path (named warping path) linking up two corners of the S ×T -matrix. Each coordinate (i, j) along this warping path actually corresponds to the association of the ith element of S with the j th element of T . An example warping path as well as the corresponding alignment are illustrated in Figure 1(a). DTW makes it possible to align shifted elements from sequences. This ability is very useful and makes DTW be sufficiently robust and flexible in order to handle satellite image time series . However, the expert may want to introduce his/her knowledge about the observed temporal phenology. For example, the expert might want to guarantee that summer crops are not aligned with winter ones, or that values sensed with a time delay of more than two months are not associated. Constraints on the alignments have actually already been studied in the literature. The idea was to prevent the warping path from straying away from the diagonal of the matrix. For instance, the warping path can be limited to a certain band around the diagonal of the matrix, named Sakoe-Chiba band  (See Figure 1(b)). Another example corresponds to a warping window shaped as a parallelogram and named Itakura parallelogram , enabling more distortions in the middle of the sequences than on the extremities. These global constraints on the search of the warping path are actually limiting the alignments, by preventing an alignment of two elements that are too distant. The original aim was to decrease the computational complexity; constraining the alignments was the way of doing it. We take here the opposite direction: we introduce some constrains in order to avoid inconsistent temporal distortions, and the decrease of the computational complexity is a bonus. These global constraints are however not suitable to SITS analysis, since the data are irregularly sampled. For example, the Sakoe-Chiba band works by removing all matrix elements that are too distant from the diagonal: the matrix element (i, j) cannot be part of the warping path if |i − j| > w (with w be the width of the band). As a result, the ith element of the first sequence cannot be linked to the j th element of the second sequence if they are more than w elements apart. As a consequence, this constraint would only make sense if the elements of the sequences are regularly sampled; the number of elements would then have a temporal meaning, since w elements would correspond to a time delay of w times the sampling period. In the case of SITS, however, limiting the warping window to all elements (i, j) distant of less than w elements would have no temporal
meaning, since two consecutive elements (i.e., corresponding to two consecutive sensed images) could be sensed with a time delay of, for instance, one week, one month, or one year. Thus, given a maximum time delay ∆t, we propose to limit the scope of the matrix to all elements (i, j) fulfilling: DATE D IFF ( DATE (i) , DATE (j) ) < ∆t
where DATE is a function returning respectively the date of the ith image of the first sequence and the date of the j th image of the second sequence, and DATE D IFF returns the elapsed time between two dates. In practice, during the computation of every (i, j) element of the matrix, the condition presented in Equation 1 has to be evaluated beforehand: if the condition is true, the classic procedure remains unchanged, otherwise the value of the matrix can be set to +∞. This procedure corresponds to a masking of the matrix in order to compute the warping path in a warping window. The resulting mask is then consistent with the sensing dates of the satellite images and adapted to this kind of irregularly sampled data. Figure 1(c) illustrates this masking procedure as well as the corresponding alignment. III. MATERIALS AND METHODS The area of study for this work is located near the town of Toulouse in the South West of France. 46 F ORMOSAT-2 images sensed over one cultural years is used; the temporal distribution and the cloud covering of these images is given in Figure 2. From these images, we use the multi-spectral product at a spatial resolution of 8 m and only the three bands Near-Infrared, Red and Green are kept, since the blue channel gives little information about vegetation and is very sensitive to atmospheric artifacts. Before being used in this work, the F ORMOSAT-2 products have been orthorectified (guaranteeing that a pixel (x, y) covers the same geographic area throughout the image series) and the digital counts provided by the sensors are converted into surface reflectances. Moreover, we have a land cover reference map produced by the method described in  and using a comprehensive ground reference data set. Also, cloud masks are produced using the cloud screening procedure described in . The dataset is built using the process described in . Then, each sequence is built as the series of tuples (NIR,R,G) for each pixel (x, y) in the image series. IV. EXPERIMENTS This article presents the results of several clustering procedures on a cultural year. The aim is to study the influence of the temporal constraint (∆t) on the quality of the results. To this end, a K- MEANS clustering was computed for various values of ∆t customizing the DTW measure. We thus obtain a clustering map for every ∆t on which we can evaluate its quality. The Kappa and the F-measure indexes
day of year (2006) Jan Feb Mar Apr May Jun Jul
Aug Sep Oct Nov Dec
Fig. 2. Temporal distribution of images used from the two years. Each spot represents an acquired image.
broad-leaved tree wildland sunﬂower temporary meadow fallow land low density housing surface corn high density housing surface wheat
Fig. 3. Evolution of the quality of the classification with the temporal constraint used in the distance. (a) Global evolution of the Kappa index with ∆. (b) Class by class evolution of the F-measure with ∆. t
Fig. 1. Several constraints on the warping path on the left and the resulting alignment of the sequences on the right. (a) Unconstrained warping path. (b) Sakoe-Chiba band: warping path constrained on the sequencing. (c) Warping path constrained using the sensing dates of the images.
are respectively used to assess the global quality and classspecific quality of the obtained clustering. Figure 3(a) shows the global evolution of the Kappa index with the maximum allowed distortion. The trend shows that a too high constraint leads to meaningless results, where there are about no values anymore to be compared with each other. Then the quality increases with the loosening of the
constraint to achieve a maximum value for ∆t = 17. The map corresponding to this Kappa score of 29 % is depicted in Figure 4(a). Finally, this figure shows that the quality tends to the one of the unconstrained version when the ∆t increases. This asymptotic behavior when the constraint is loosen further assess the relevance of this procedure. Figure 3(b) shows that a global constraint does not allow us to maximize the overall quality, and 17 days only corresponds to the best balance. Depending on the considered thematic class, it can be noticed that the maximum value is not always close to this ∆t. In general, the constraint has a small impact on the quality for corn, wheat and broad-leaved tree classes, contrary to the sunflower class, which is quite sensitive to the constraint. This is mainly due to two different types of this crop class that are cultivated during two different periods.
Fig. 4. (a) Best result obtained with a temporal constraint ∆ of 17 days. (b) Corresponding land cover reference map.
In this way, a strong constraint tends to distinguish these two sub-classes and corresponds to a good results’ quality, while loosening this constraint leads to the grouping of these two classes and also corresponds to a quite good quality. Finally, wildland, fallowland and meadows benefit from a quite strong constraint. V. CONCLUSION This paper showed how the Dynamic Time Warping similarity measure can be constrained in order to introduce prior knowledge about the phenology of observed phenomena. This study set out to counterbalance the distortion power of DTW, by avoiding inconsistent alignments of the elements of the sequences, in order to fit the knowledge about the observed phenomena. We believe this work opens up a number of research directions. One of these directions could be the use of this constraint in a supervised system, where different time delays ∆t could be used depending on the considered thematic class. Moreover, the constraint could correspond to a varying function in order to match more complex knowledge about the observed temporal phenology. For example, the constraint could be more loosen during winter season than during the summer one, when the vegetation evolves less. Finally, learning the constraint from data could also be an interesting research direction. VI. REFERENCES  F. Petitjean, J. Inglada, and P. Gançarski, “Satellite Image Time Series Analysis under Time Warping,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 8, Aug. 2012.
 ——, “Temporal Domain Adaptation under Time Warping,” in IEEE International Geoscience and Remote Sensing Symposium, 2011, pp. 3578–3581.  F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for Dynamic Time Warping, with applications to clustering,” Pattern Recognition, vol. 44, no. 3, pp. 678–693, Mar. 2011.  H. Sakoe and S. Chiba, “A dynamic programming approach to continuous speech recognition,” in Proceedings of the Seventh International Congress on Acoustics, vol. 3, 1971, pp. 65–69.  ——, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, no. 1, pp. 43–49, 1978.  F. Itakura, “Minimum Prediction Residual Principle applied to Speech Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 23, no. 1, pp. 67–72, 1975.  S. Idbraim, D. Ducrot, D. Mammass, and D. Aboutajdine, “An unsupervised classification using a novel ICM method with constraints for land cover mapping from remote sensing imagery,” International Review on Computers and Software, vol. 4, no. 2, pp. 165–176, Mar. 2009.  O. Hagolle, M. Huc, D. V. Pascual, and G. Dedieu, “A multi-temporal method for cloud detection, applied to F ORMOSAT-2, V ENµS, L ANDSAT and S ENTINEL -2 images,” Remote Sensing of Environment, vol. 114, no. 8, pp. 1747–1755, 2010.