Automatic Color Correction for Multisource Remote Sensing ... - MDPI

1 downloads 0 Views 7MB Size Report
May 15, 2017 - wide difference in viewing angles, platform characteristics, and light and ..... layer and the softmax layer of the squeeze-net was removed, and ...
remote sensing Article

Automatic Color Correction for Multisource Remote Sensing Images with Wasserstein CNN Jiayi Guo 1,2,3 , Zongxu Pan 1,2,3 , Bin Lei 1,2,3, * and Chibiao Ding 1,2,3 1

2 3

*

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China; [email protected] (J.G.); [email protected] (Z.P.); [email protected] (C.D.) Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China Key Laboratory of Geo-spatial Information Processing and Application System Technology, Beijing 100190, China Correspondence: [email protected]; Tel.: +86-135-8167-8087

Academic Editors: Qi Wang, Nicolas H. Younan, Carlos López-Martínez, Sangram Ganguly and Prasad S. Thenkabail Received: 19 March 2017; Accepted: 12 May 2017; Published: 15 May 2017

Abstract: In this paper a non-parametric model based on Wasserstein CNN is proposed for color correction. It is suitable for large-scale remote sensing image preprocessing from multiple sources under various viewing conditions, including illumination variances, atmosphere disturbances, and sensor and aspect angles. Color correction aims to alter the color palette of an input image to a standard reference which does not suffer from the mentioned disturbances. Most of current methods highly depend on the similarity between the inputs and the references, with respect to both the contents and the conditions, such as illumination and atmosphere condition. Segmentation is usually necessary to alleviate the color leakage effect on the edges. Different from the previous studies, the proposed method matches the color distribution of the input dataset with the references in a probabilistic optimal transportation framework. Multi-scale features are extracted from the intermediate layers of the lightweight CNN model and are utilized to infer the undisturbed distribution. The Wasserstein distance is utilized to calculate the cost function to measure the discrepancy between two color distributions. The advantage of the method is that no registration or segmentation processes are needed, benefiting from the local texture processing potential of the CNN models. Experimental results demonstrate that the proposed method is effective when the input and reference images are of different sources, resolutions, and under different illumination and atmosphere conditions. Keywords: remote sensing image correction; color matching; optimal transport; CNN

1. Introduction Large-scale remote sensing content providers aggregate remote sensing imagery from different platforms, providing a vast geographical coverage with a range of spatial and temporal resolutions. One of the challenges is that the color correction task becomes more complicated due to the wide difference in viewing angles, platform characteristics, and light and atmosphere conditions (see Figure 1). For further processing purposes, it is often desired to perform color correction to the images. Histogram matching [1,2] is a cheap way to address this when a reference image with no color errors is available that shares the same coverage of land and reflectance distribution. To gain a deeper insight, first we would like to place histogram matching in a broader context as the simplest form of color matching [3]. These methods try to match the color distribution of the input images to a reference, also known as color transferring. They can either work by matching Remote Sens. 2017, 9, 483; doi:10.3390/rs9050483

www.mdpi.com/journal/remotesensing

Remote Sens. 2017, 9, 483

2 of 16

Remote Sens.2017, 9, 483

2 of 16

low order statistics [3–5] or by transferring the exact distribution [6–8]. Matching the low order statistics isto sensitive the color space [9]. The performances of both methods are highly is sensitive the colortospace selected [9].selected The performances of both methods are highly related to the related to the similarity between of thethe contents thethe input and thePicking reference. an appropriate similarity between the contents input of and reference. an Picking appropriate reference referencemanual requiresintervention manual intervention maythe become bottle for processing. A drawback requires and may and become bottlethe neck for neck processing. A drawback of such of such methods is that the colors on the edges of the targets would be mixed up [10–12]. Methods methods is that the colors on the edges of the targets would be mixed up [10–12]. Methods exploiting exploiting spatial information wereto proposed thebut problem, but segmentation, spatial the spatial the information were proposed migrate to themigrate problem, segmentation, spatial matching, matching, and alignment are required [13,14]. Matching the exact distribution is not sensitive to the and alignment are required [13,14]. Matching the exact distribution is not sensitive to the color space color space selection, but has to work in an iterative [8].segmentation Both the segmentation and the increase iteration selection, but has to work in an iterative fashion [8].fashion Both the and the iteration increase the computation burden and are not suitable for online viewing and querying. For video the computation burden and are not suitable for online viewing and querying. For video and stereo and stereo cases, extra information from the correlation between frames can be exploited to achieve cases, extra information from the correlation between frames can be exploited to achieve better color better color harmony [15,16]. The method holography method isinto introduced into color transfer the to eliminate harmony [15,16]. The holography is introduced color transfer to eliminate artifacts the artifacts [17]. Manifold learning is an interesting framework to find the similarity between the [17]. Manifold learning is an interesting framework to find the similarity between the pixels, so that pixels, so that the output color can be more natural and it can suppress the color leakage as well [18]. the output color can be more natural and it can suppress the color leakage as well [18]. Another Another perspective to comprehend the problem is image-to-image translation.Convolutional Convolutional neural neural perspective to comprehend the problem is image-to-image translation. networks have proven to be successful for such applications [19], for example, the auto colorization networks have proven to be successful for such applications [19], for example, the auto colorization of grayscale grayscale images images [20,21]. [20,21]. Recently, Recently, deep deep learning learning shows shows its its potential potential and and power power in in hyper-spectral hyper-spectral of image understanding applications [22]. image understanding applications [22].

Figure 1. Color Color discrepancy discrepancy in in remote remote sensing sensing images. images. (a,b) (a,b) Digital Digital Globe Globe images images on on different different dates dates Figure from Google Google Earth; Earth; (c,d) (c,d) Digital Digital Globe Globe (bottom, (bottom, right) right) and and NASA NASA (National (National Aeronautics Aeronautics and and Space Space from Administration) Copernicus Administration) Copernicus (top, (top, left) left) images images on onthe thesame samedate datefrom fromGoogle GoogleEarth; Earth;(e) (e)GF1 GF1(Gaofen-1) (Gaofenimages from different sensors, same area and date. 1) images from different sensors, same area and date.

Unfortunately, forlarge-scale large-scale applications, too astrict a requirement that the whole Unfortunately, for applications, it is it tooisstrict requirement that the whole reflectance reflectance should the same theimage reference image and ones to be processed. distributiondistribution should be the samebe between thebetween reference and the ones to the be processed. As a result, As a result, such reference histograms are usually not available and have greatly restricted such reference histograms are usually not available and have greatly restricted the applicationsthe of applications of thesecolor sample-based color matching methods. [23] the authors chooseplan a color these sample-based matching methods. In [23] the authorsInchoose a color correction that correction that minimizes color discrepancy between and both the reference input image andThis the minimizesplan the color discrepancythe between it and both the input itimage and the image. reference image. This is a good solution in stitching applications. However, the purpose of this paper is a good solution in stitching applications. However, the purpose of this paper is to eliminate the is to eliminate errors raised byetc., atmosphere, etc., the result can be further reflectance employed errors raised bythe atmosphere, light, so that thelight, result canso bethat further employed in ground in groundorreflectance retrieval or atmosphere retrieval. We hope theasoutput is as retrieval atmosphere parameters retrieval. parameters We hope that the output is asthat close possible to close as possible to the reference images, rather than modifying the ground truth values as in [23]. Since it is usually infeasible to find such a reference, a natural question is, can we develop a universal function which can automatically determine the references directly according to the input images? Once this function is obtained, we can combine it with simple histogram matching or other color

Remote Sens. 2017, 9, 483

3 of 16

the reference images, rather than modifying the ground truth values as in [23]. Since it is usually infeasible to find such a reference, a natural question is, can we develop a universal function which can automatically determine the references directly according to the input images? Once this function is obtained, we can combine it with simple histogram matching or other color transfer methods into a very powerful algorithm. In this paper, a Wasserstein CNN model is built to infer the reference histograms for remote sensing image color correction applications. The model is completely data driven, and no registration or segmentation is needed in both the training phase and the inferring phase. Besides, as will be explained in Section 2, the input and the reference can be of different scales and sources. In Section 2, the details of the proposed method are elaborated in an optimal transporting framework [24,25]. In Section 3, the experiments are conducted to validate the feasibility of the proposed method, in which images from the GF1 and GF2 satellites are used as the input and the reference datasets accordingly. Section 4 comprises the discussions and comparisons with other color matching (correcting) methods. And finally, Section 5 gives the conclusion and points out our future works. 2. Materials and Methods 2.1. Analysis Given an input image I and a reference image I 0 with Nc channels, an automatic color matching algorithm aims to alter the color palette of I to that of I 0 , the reference. Some of the algorithms require that the reference image is known, which are called sample-based methods. Of course an ideal algorithm should work without knowing I 0 . The matching can be operated either in the Nc -dimensional color space at once, or in each dimension separately [8,26]. The influence of the light and the atmosphere conditions and other factors can be included into a function h( I 0 , x, y) that acts on the grayscale value of the pixel located at ( x, y). Under such circumstances, the problem is converted to learning an inverse transfer function f ( I, x, y) that maps the grayscale values of the input image I back to that of the reference image I 0 , where ( x, y) denotes the location of the target pixel inside I. When the input image is divided into patches that each possess a relatively small geographical coverage, the spatial variance of the color discrepancy inside each patch is usually small enough to be neglected. Thus h( I 0 , x, y) should be the same with h( I 0 , x 0 , y0 ) as long as ( x, y) and ( x 0 , y0 ) share the same grayscale values. Let u x,y and v x,y be the grayscale values of the pixels located at ( x, y) in I and I 0 accordingly, and h( I 0 , x, y) can be rewritten as h( I 0 , v x,y ), because the color discrepancy function is not related to the location of the pixel but only to its value. The three assumptions of the transformation from the input images to the reference images are made as follows, and some properties which f should satisfy can be derived from them. Assumption 1: v x,y = v x0 ,y0 ⇒ u x,y = u x0 ,y0 Assumption 1 suggests that when two pixels in I 0 have the same grayscale value, so do the corresponding pixels in I. This assumption is straight forward since in general cases the cameras are well calibrated and the inhomogeneity of light and atmosphere is usually small within a small geographical coverage. It is true that when severe sensor errors occur this assumption may not hold, however that is not the focus of this paper. Assumption 2: u x,y = u x0 ,y0 ⇒ v x,y = v x0 ,y0 Assumption 2 indicates that when two pixels in I have the same grayscale, so are their corresponding pixels in I 0 . The assumption is based on the fact that the pixel value the sensor recorded is not related to its context or location, but only to its raw physical intensity. Assumption 3: u x,y > u x0 ,y0 ⇔ v x,y > v x0 ,y0 Assumption 3 implies that the transformation is order preserving, or a brighter pixel in I should also be brighter in I 0 , and vice versa.

Remote Sens. 2017, 9, 483

4 of 16

According to the above assumptions, we expect the transfer function f to possess the following properties. Property 1: u x,y = u x0 ,y0 ⇒ f ( I, u x,y ) = f ( I, u x0 ,y0 ) Property 2: u x,y > u x0 ,y0 ⇔ f ( I, u x,y ) > f ( I, u x0 ,y0 ) , or f is order-preserving Property 3: I1 6= I2 ⇒ f ( I1 , •) 6= f ( I2 , •) Consider that even when two pixels inside I1 and I2 share the same grayscale values, the corrected values can still be different according to their ground truth values in the references. Property 3 is to say that f should be content related. In other words, for different input images, the transfer function values should be different to maintain the content consistency. To better explain the point, consider that two input images having different contents, the grassland and the lake so to speak, happen to be of similar color distributions. The pixel in the lake should be darker and the other pixel in the grassland should be brighter in the corresponding reference images. If f is only related to the grayscale values while discarding the input images (the contexts of the pixels), this cannot be done because similar pixels in different input images have to be mapped to similar output levels. An issue to take into account is whether the raw image or its histogram of the input and reference images should be made use of for the matching. Table 1 lists all possible cases, each of which will be discussed. Table 1. Different color matching schemes according to the input form and the reference form. Input

Reference

Scheme

Histogram Image Image Histogram

Histogram Image Histogram Image

A B C D

Scheme A is the case when both the input and reference are histograms, and this is essentially histogram matching. Many previous studies employ this scheme for simplicity, for example, histogram matching and low order statistics matching in various color spaces. Since histograms do not contain the content information, the corresponding histogram matching is not content related. Concretely speaking, two pixels that belong to two regions with different contents but with the same grayscale fall into the same bin of the histogram, and have to be assigned to the same grayscale value in the output image, which does not meet Property 3. In order for one distribution with different contexts to be correctly matched to different corresponding distributions, we cannot enclose different transformations in one unified mapping (see Figure 2). This should not be appropriate for large scale datasets that demand a high degree of automation. Scheme B corresponds to the case where both the input and output are images, which is usually referred to as image to image translation. The image certainly contains much more information than its histogram, thus providing a possibility that the mapping is content related. Although Property 3 can be satisfied, this scheme emphasizes the content of the image, and the consequence is that the pixels with same grayscales may be mapped to different grayscales as their contexts could be different, and in this case Property 1 is violated (see Figure 3).

Remote Sens. 2017, 9, 483 Remote Sens.2017, 9, 483

5 of 16

5 of 16

Figure Matchingalgorithms algorithmsof of“scheme “scheme A” A” take the form of histograms. Figure 2. 2. Matching take both bothinput inputand andreference referenceinin the form of histograms. As this scheme is not content related, two similar distributions with different contexts could be be not not be As this scheme is not content related, two similar distributions with different contexts could be mapped to their corresponding reference with one unified mapping. mapped to their corresponding reference with one unified mapping.

Figure Matchingalgorithms algorithms of of “scheme “scheme B” inin the form of of images. Figure 3. 3.Matching B” take take both bothinput inputand andreference reference the form images. Similar distributionscould couldbe bemapped mappedto to different different corresponding is content Similar distributions correspondingreferences, references,asasthe thescheme scheme is content based.However, However,the thesame same grayscales grayscales could could be they areare in in based. be mapped mappedtotodifferent differentgrayscales grayscaleswhen when they different contexts, violating Property 1. different contexts, violating Property 1.

Scheme C is the case where the input is an image and the output is a histogram. As mentioned Scheme C is the case where the input is an image and the output is a histogram. As mentioned above, scheme A does not satisfy Property 3 because the context of the image is not used, while above, scheme A does not satisfy Property 3 because the context of the image is not used, while scheme B violates Property 1. Mapping one image to another, with constraints that the pixels with scheme B violates Property 1. Mapping one image to another, with constraints that thea pixels withtothe the same grayscales also have the same grayscale values in the output, is essentially grayscale same grayscales also have the same grayscale values in the output, is essentially a grayscale to grayscale grayscale transforming process. Under such circumstances, the output of scheme B is always transforming Under C. such circumstances, the outputpossesses of scheme B is always equivalent toprocess. that of scheme Since scheme C automatically Properties 1 andequivalent 3, the task to that of scheme C. Since scheme C automatically possesses Properties 1 and 3, the task has been now has been now converted to devise the algorithm so that it possesses Property 2 as well converted to devise the algorithm so that it possesses Property 2 as well (see Figure 4). The task (see Figure 4). The task is addressed under an optimal transporting framework, which will be is addressed under an optimal elaborated in Section 2.2. transporting framework, which will be elaborated in Section 2.2.

Remote Sens. 2017, 9, 483 Remote Sens.2017, 9, 483

6 of 16 6 of 16

Figure 4. Matching algorithms of “scheme C” take images as inputs and histograms as references in form of ofimages. images.Similar Similardistributions distributionscould couldbebe mapped different corresponding references, as the form mapped to to different corresponding references, as the the scheme is content related. scheme is content related.

The scheme scheme of of type typeD Dcorresponds correspondstotothe thecase casewhere wherethe theinput inputisisthe the histogram and output The histogram and thethe output is is the image. Since it is nearly impossible to determine a transformation mapping of a histogram to the image. Since it is nearly impossible to determine a transformation mapping of a histogram to an an image, consideration. image, we we do do notnot taketake thisthis casecase intointo consideration. Transporting Perspective Perspective of of View View 2.2. Optimal Transporting Denote u and v as RNNc c is a as the the input input and and the the reference reference color color distributions, distributions, then then TT ::R NNc →  mapping u,(vu), v[25–27]: mapping that that transforms transforms uu to to v. v. The The total total cost cost of of TT(u, canbebedefined definedasasC (C ( u,vv)) can ) [25–27]: inf  c (cx(,x,y )y)dπdπ ( x(,x,y )y) C (C u,(vu), v=) = π ∈Π inf (u ,v ) Z

π ∈Π(u,v)

(1) (1)

π ( ujoint , v) is is the of transporting oneofunit offrom massx to from x toπ (y,u, and the joint where c(cx,( xy, )y is ) the where costcost of transporting one unit mass y, and v) is the probability Nc Nc Nc Nc  + × u+and probability measure , having and v as its marginal distributions. Again, the N c number indicates measure of R , having v as itsu marginal distributions. Again, Nc indicates of + × R+of color channels and Πchannels feasible of π (every u, v). feasible π ( u, v) . (u, v) is the Π ( u, v ) of the number of color andcollection is every the collection When c( x, y) is defined as a distance d( x, y), the p-order Wasserstein distance can be defined When c ( x, y ) is defined as a distance d ( x, y ) , the p-order Wasserstein distance can be defined as [25,27]:  1/p Z as [25,27]: Wp (u, v) = inf d( x, y) p dπ( x, y) (2)

(

π ∈Π(u,v)

)

1/ p

Wp ( u, v ) = inf  d ( x, y ) dπ ( x, y ) (2) ( u ,v ) Finding the transformation T (u, v) thatπ ∈Π minimizes the total cost C (u, v) is known as the Monge’s optimal transportation problem, or the MK problem. The solution to the problem is the gradient of Finding the transformation T ( u, v ) that minimizes the total cost C ( u, v ) is known as the some convex function [25,27,28]: Monge’s optimal transportation problem, or the MK problem. The solution to the problem is the gradient of some convex function T =[25,27,28]: ∇φ, where φ : R Nc → R is convex (3) p

φ :  Nc →isequivalent (3) T =∇ φ , where is convexto monotonicity, as consequence Specifically in one dimensional cases, this statement meets Property 2. Specifically in one dimensional cases, this statement is equivalent to monotonicity, as For high dimensional problems, the solution of the MK problem is intractable. In this paper, consequence meets Property 2. the distributions of the Nc channels are matched separately. The Wasserstein distance between the For high dimensional problems, the solution of the MK problem is intractable. In this paper, the inferred values and the ideal values can be calculated in the following way: first sort the pixels on distributions of the N c channels are matched separately. The Wasserstein distance between the a 1-D axis, and then calculate the distance between each pair of inferred pixels and the ideal pixels inferred values and the ideal values can be calculated in the following way: first sort the pixels on a 1-D axis, and then calculate the distance between each pair of inferred pixels and the ideal pixels

Remote Sens. 2017, 9, 483

7 of 16

Remote Sens.2017, Sens.2017, 9, 9, 483 483 Remote

of 16 16 77 of

accordingly. This is equivalent to using a stacked histogram (see Figure 5). The Wasserstein distance when p equals 2 can be formulated as: aa stacked accordingly. This is equivalent equivalent to using using stacked histogram histogram (see (see Figure Figure 5). 5). The The Wasserstein Wasserstein distance distance accordingly. This is to

when pp equals equals 22 can can be formulated formulated as: as: when Zbe  2 1/2 W2 = h pred ( f ) − hre f ( f )22 d11/f/22 , where f is the cumulative frequency W22 == ( hhpred − hhref df ,, where where ff is is the the cumulative cumulative frequency frequency W pred ( ff ) − ref ( ff ) ) df

(

)

(4) (4) (4)

Figure 5. 5. Calculation Wasserstein distance distance between between the the inferred inferred histograms histograms and and the the Calculation method method of of the the Wasserstein between histograms Figure reference. STEP STEP 1: stack stack the histograms histograms on on the frequency axis; axis; STEP STEP 2: 2: subtract subtract the the ground-truth reference. STEP 1: stack the the frequency subtract ground-truth stacked histograms, and integrate with respect to the cumulative frequency. stacked histograms, and integrate with respect to the cumulative frequency. stacked histograms, and integrate with respect to the cumulative frequency.

2.3. The The Model Model Structure Structure 2.3. The transformation transformation can can be be fitted fitted by by aa CNN CNN model, model, where the Wasserstein distance plays plays the the role role CNN model, where where the the Wasserstein Wasserstein distance The of the the loss loss function. function. To To reduce reduce the the memory memory and and computation computation burden, burden, we we used used aa modified modified version version of of To of Squeeze-net v1.1 [29] (see Figures 6 and 7). In this section we will first introduce the basic modules Squeeze-net v1.1 v1.1 [29] (see Figures 6 and 7). In this section we will first introduce the basic modules Squeeze-net and then then go go on on to to state state the the major major modifications. modifications. and

Figure 6. 6. Structure Structure of of the the “fire “fire module” module” in in the the Squeeze-net. Squeeze-net. Figure Figure 6. Structure of the “fire module” in the Squeeze-net.

Remote Sens. 2017, 9, 483

8 of 16

Remote Sens.2017, 9, 483

8 of 16

input 224x224x3

conv1 fire 2, 3 111x111x96 55x55x128 fire 4, 5 55x55x128 27x27x256 27x27x256

27x27x3* 27x27x96

27x27x128

fire 8, 9, and conv10 13x13x512 13x13x512 13x13x512

fire 6, 7 13x13x384 13x13x384

27x27x256

27x27x512

Concatenate & Flatten 725,355

Fully-Connected 256

256

softmax

4D output tensors of the fire-modules 4D input tensors

256

softmax

softmax

4D output tensors of normal conv-layers

2D tensors

*: one row and one column trimmed on the margin

Figure 7. Model structure of the proposed model.

Figure 7. Model structure of the proposed model.

2.3.1. Basic Modules

2.3.1. Basic Modules

The Squeeze-net is a light-weight convolutional neural network. The basic modules of the squeeze-net are called “fire” modulesconvolutional [29], and each neural consists network. of two convolution layers, the The Squeeze-net is the a light-weight The basic modules of layer and the “expand” layer. The kernels the “squeeze” layersofare all of 1 × 1 sizes tolayers, the “squeeze” squeeze-net are called the “fire” modules [29],inand each consists two convolution maximally lessen inside the model and reduce the“squeeze” computational burden. Two the “squeeze” layer the andparameters the “expand” layer. The kernels in the layers are all of types 1 × 1 sizes of kernels, 1 × 1 and 3 × 3 filters, comprise the “expand” layer. The “fire” modules prove to be to maximally lessen the parameters inside the model and reduce the computational burden. Two types computationally efficient, and also make the network less likely to be over fitted, as it “squeezes” the of kernels, 1 × 1 and 3 × 3 filters, comprise the “expand” layer. The “fire” modules prove to be amount of parameters to a much smaller scale. In our experiment, the final global average pooling computationally efficient,layer andofalso the network less likely over fitted, aswere it “squeezes” layer and the softmax the make squeeze-net was removed, and to thebe rest of the parts used to the amount of parameters to a much smaller scale. In our experiment, the final global average pooling extract the features from the raw input images.

layer and the softmax layer of the squeeze-net was removed, and the rest of the parts were used to 2.3.2. The Multi-Scale and the Histogram Predictors extract the features fromConcatenation the raw input images. As stated in Section 2.3.1, we used a modified version of Squeeze-net to extract features from the

2.3.2.input Theimages. Multi-Scale Concatenation and the Histogram Predictors The layers at different levels in the CNN model extract features at different scales, and each level has its own characteristics. theversion former of layers in the CNN model are more from As stated in Section 2.3.1, we usedIna general, modified Squeeze-net to extract features associated with the raw pixels, while the latter ones are more meaningful in semantic senses [30,31]. the input images. The layers at different levels in the CNN model extract features at different scales, Besides, the scales of the former feature maps are also different from the latter ones. and each level has its own characteristics. In general, the former layers in the CNN model are more To utilize the information from different scales and semantic levels, we used a concatenating associated with the raw pixels, while the latter ones are more meaningful in semantic senses [30,31]. structure. In order for the feature maps to be concatenated, average pooling and deconvolution Besides, the scales the former feature are also different fromAllthe ones.modes in the operations wereof applied to resize themmaps to a unified shape (27 × 27). thelatter padding To utilize thewere information different and semantic levels, used kernel a concatenating pooling layers “valid”, so from that the residual scales parts which could not fill up thewe pooling were

structure. In order for the feature maps to be concatenated, average pooling and deconvolution operations were applied to resize them to a unified shape (27 × 27). All the padding modes in the pooling layers were “valid”, so that the residual parts which could not fill up the pooling kernel were

Remote Sens. 2017, 9, 483

9 of 16

Remote Sens.2017, 9, 483

9 of 16

discarded. The strides and kernel sizes within each pooling layer were the same. All the resized discarded. The strides and kernel sizes within each pooling layer were the same. All the resized shapes were 27 × 27, except for the input, whose output was 28 × 28. Its last row and column were shapes were 27 × 27, except for the input, whose output was 28 × 28. Its last row and column were trimmed in order to be consistent with the other tensors to be concatenated. The concatenated feature trimmed in order to be consistent with the other tensors to be concatenated. The concatenated feature maps were then flattened into a 2-dimensional tensor of 725,355 length, and then was fed into three maps were then flattened into a 2-dimensional tensor of 725,355 length, and then was fed into three fully-connected layers separately, one for each channel (blue, green, and red). The fully-connected fully-connected layers separately, one for each channel (blue, green, and red). The fully-connected layer was then attached by a softmax head each to infer the corrected color distribution. layer was then attached by a softmax head each to infer the corrected color distribution. 2.4. Data Augmentation 2.4. Data Augmentation Data augmentation was performed on the original inputs to avoid over fitting as well as to enclose augmentation was performed the original inputs to avoid over fitting as well as to moreData patterns of color discrepancy into theon model. The augmentation operations include: enclose more patterns of color discrepancy into the model. The augmentation operations include: 1. Randomcropping: cropping:AApatch patchof of 227 227 ××227 227isis cropped cropped at at aa random 1. Random random position position from from each each 256 256×× 256 256 sample. It is worth noting that this implies that no registration is needed in the training process. sample. It is worth noting that this implies that no registration is needed in the training process. 2. Randomflipping: flipping:Each Eachsample samplein inthe theinput inputbatch batchisisrandomly randomly horizontally horizontally and and vertically vertically flipped flipped 2. Random by a chance of 50%. by a chance of 50%. 3. Randomcolor coloraugmentation: augmentation:The Thebrightness, brightness,saturation, saturation,and and gamma gamma values values of of the the input 3. Random input color color are randomly shifted. Small perturbations are added to each color channel. Figure 8 shows are randomly shifted. Small perturbations are added to each color channel. Figure 8 shows an an exampleofofsuch suchtransformation transformationof ofthe thecolor colordistribution. distribution. example

Figure 8. Color transforming curves in the random augmentation process. Figure 8. Color transforming curves in the random augmentation process.

2.5. Algorithm Flow Chart 2.5. Algorithm Flow Chart The entire model can be trained in an end-to-end fashion with the gradient descent algorithm, The entire model can be trained in an end-to-end fashion with the gradient descent algorithm, as displayed in Algorithm 1. as displayed in Algorithm 1 ( Algorithm flow of the training process). Algorithm 1. Algorithm flow of the training process. Algorithm 1. Training Process of the Automatic Color Matching WCNN, Our Proposed Algorithm. Algorithm 1. Training Process of the Automatic Color Matching WCNN, our proposed algorithm Notations: θ, the parameters in the WCNN model; gθ , the gradients w.r.t. θ; h(•), the predicted color Notations:r, the parameters the WCNNLmodel; g θ , Wasserstein the gradients w.r.t. θ ; h() , the predicted θ , the distribution; reference color in distribution; loss. w (•, •), the , the reference color distribution; color distribution; Required constants: α,rthe learning rate; m, the batch size.  w (  , ) , the Wasserstein loss. Required initial values: α θ0 ,, the parameters. Required constants: the initial learning rate; m, the batch size. 1: while θ has not converged do Required initial θ0 , the initial parameters. n values: om (i ) 2: 1: while Sample ∼ Pin ado batch from the input data θ hasxnot iconverged 1 n (io) m= m Sample y{(xi) }i =1 ~∼Pinre fa abatch data data 3: 2: Sample batchfrom fromthe theinput reference i =1 n om (i ) m Sample { y }augmentation i =1 ~ ref a batch 4: 3: Apply random to from x (i) the reference data 5:

4:

i =1

(i ) m

m Apply random augmentation to {x }i =1 gθ ←∇θ [ m1 ∑ Lw (h( xi ), yi )] i =11 m



[ 6: 5: θ← − α∇·θ SGD (θ,wgθ(h ) ( xi ), yi )] g θθ  m i =1 7: end while 6: θ  θ − α ⋅ SGD(θ , gθ ) 7: end while

Remote Sens. 2017, 9, 483

10 of 16

3. Results Remote Sens.2017, 9, 483

10 of 16

We had our algorithm evaluated with satellite images from GF1 and GF2 that cover the same Results areas.3.The GF2 images were chosen as the reference. The parameters of the data are listed in Table 2. We had our algorithm evaluated with satellite images from GF1 and GF2 that cover the same Tablewere 2. Parameters of the GF1 andThe GF2 data in theofexperiment. areas. The GF2 images chosen as the reference. parameters the data are listed in Table 2. GF2 Table 2. Parameters of the GF1GF1 and GF2 data in the experiment. Resolution 8m 4m GF1 GF2 Resolution Band1 0.45–0.52 0.45–0.52 µm 8 m µm 4m Band2 0.52–0.59 µm 0.52–0.59 Band1 0.45–0.52 μm 0.45–0.52 μm µm Band3 0.63–0.69 µm 0.63–0.69 µm Band2 0.52–0.59 μm 0.52–0.59 μm Band3 0.63–0.69 μm 0.63–0.69 μm

The direct outputs of WCNN are the inferred distributions (or histograms, see Figure 9) based The direct outputs of WCNN are the inferred distributions (or histograms, see Figure 9) based on the contents of the input images. The corrected images are obtained by histogram matching on the contents of the input images. The corrected images are obtained by histogram matching (see Figure 10). The reference images are only used in the training process and are unnecessary in (see Figure 10). The reference images are only used in the training process and are unnecessary in practical applications, as the purpose of the WCNN model is to generate the reference histogram when practical applications, as the purpose of the WCNN model is to generate the reference histogram therewhen are nothere available It is worth thatnoting the patches onlywere roughly according to are noones. available ones. Itnoting is worth that thewere patches only sliced roughly sliced the longitude the latitudeand information within the GeoTIFF files, so registration was not necessary, accordingand to the longitude the latitude information within the GeoTIFF files, so registration was and neither was pre-segmentation. not necessary, and neither was pre-segmentation.

Figure 9. Results of matching the colorpalette palette of of GF1 GF1 to histograms of input patches; solid solid Figure 9. Results of matching the color toGF2. GF2.Bars: Bars: histograms of input patches; lines with color: predicted histograms of our model; dashed lines in black: histograms of reference lines with color: predicted histograms of our model; dashed lines in black: histograms of reference images; from top to bottom: histograms of images of the same area, but under different illumination images; from top to bottom: histograms of images of the same area, but under different illumination and atmospheric conditions. and atmospheric conditions.

Remote Sens. 2017, 9, 483 Remote Sens.2017, 9, 483

11 of 16 11 of 16

Figure 10. 10. Color Color matching matching results results of of GF1 GF1 and and GF2. GF2. From From top top to to bottom: bottom: satellite satellite images images of of the the same same Figure area, but under different illumination and atmospheric conditions; left: input images; middle: output area, but under different illumination and atmospheric conditions; left: input images; middle: output images with with the the predicted predictedcolor colorpalette; palette;right: right:reference referenceimages, images,only only needed training process images needed inin thethe training process to to calculate the loss function. The model is able to infer the corrected color palette based on the content calculate the loss function. The model is able to infer the corrected color palette based on the content of of the input images in the absence a reference, when model is fully trained. the input images in the absence of aofreference, when thethe model is fully trained.

4. Discussion 4. Discussion 4.1. Comparison KL Divergence Divergence and and Wasserstein Wasserstein Distance Distance 4.1. Comparison between between KL As has has been been mentioned mentioned in Section 2.2, Wasserstein distance distance is is aa natural natural choice choice to to represent represent As in Section 2.2, the the Wasserstein the difference between two color distributions. The Kullback–Leibler divergence (also known KL the difference between two color distributions. The Kullback–Leibler divergence (also known as as KL divergence) is another commonly used measure (but not a metric) in such circumstances. The divergence) is another commonly used measure (but not a metric) in such circumstances. The definition definition of KL divergence [27] is: of KL divergence [27] is:   Z u( x ) DKL (u k v) = u( x ) log u ( x)  dx (5) DKL ( u  v ) =  u ( x) log  v( x)dx (5) v x ( )   and the definition of 2-Wasserstein distance is:

Remote Sens. 2017, 9, 483

12 of 16

Remote 9, 483 and theSens.2017, definition of 2-Wasserstein distance is:

12 of 16

 1/2 Z Wp (u, v) = inf k x − y2k2 dπ( x, y1/) 2 W p ( u, v )π=∈Π(inf u,v)  x − y dπ ( x, y )

(

π ∈Π ( u , v )

)

(6) (6)

Consider simple distributions, distributions, uu1 ∼U (U−(− ) andu u2 U∼(−U0.5 (−+0.5 + a, ), as shown in 0.5,0.5, 0.5)0.5and a, 0.5 + a0.5 ),+ as ashown in Figure Consider two two simple 1 2 Figure 11. The Kullback–Leibler divergence should be: 11. The Kullback–Leibler divergence should be: ( i f a| a|≤≤ a a if 11 DKL (DuKL1 k (7) ( uu1 2)u2=) =  (7) 11 ∞ iiff a| a|>> ++∞

Andthe theWasserstein Wassersteindistance distanceis: is: And W ( u  u ) = a , where a ∈ [ −∞, + ∞ ] W2 (u21 k1 u2 )2 = a, where a ∈ [−∞, +∞]

(8) (8)

Because both both the the Wasserstein differentiable, there is no Because Wasserstein metric metric and andthe theKL KLdivergence divergenceare arefully fully differentiable, there is difference in the back-propagation pipeline between the two losses. From the above discussion, no difference in the back-propagation pipeline between the two losses. From the above discussion, however, we we could could see stable compared to the KL however, see that that the theWasserstein Wassersteindistance distanceisismore morenumerically numerically stable compared to the divergence. KL divergence.

Figure11. 11. Two Twoone-dimensional one-dimensionaluniform uniformdistributions. distributions. Figure

4.2. Connection Connectionand andComparison Comparisonwith withOther OtherColor ColorMatching MatchingMethods Methods 4.2. Histogrammatching matchingcan canbe beregarded regardedas asthe thesimplest simplestcase caseof ofcolor colormatching. matching.ItItisiswidely widelyused usedin in Histogram seamless mosaic workflows. The method requires that a reference image is selected for each input, seamless mosaic workflows. The method requires that a reference image is selected for each input, whichcertainly certainlyputs putsrestriction restrictionon onthe theapplications applicationswith withlarge largescale scaledatasets. datasets. Wasserstein WassersteinCNN CNNisisable able which to directly predict the corrected color distribution, and the histogram matching is the final step in to directly predict the corrected color distribution, and the histogram matching is the final step the in workflow of of ourour proposed method sample-based color color matching matching the workflow proposed method(but (butnot notthe theonly only choice, choice, other other sample-based methodswould wouldalso alsodo). do). methods Matching low order performance is closely related to the Matching low order statistics statistics faces faces similar similarproblems. problems.ItsIts performance is closely related to similarity between the input images and the reference images. To handle low similarity cases, the the similarity between the input images and the reference images. To handle low similarity cases, images maymay have to be andand thethe color needs to be transferred part to part. Besides, for the images have to segmented be segmented color needs to be transferred part to part. Besides, images with complex contents, color leakage on the edges could be a problem, and the image quality for images with complex contents, color leakage on the edges could be a problem, and the image quality willdegrade. degrade. Considering Considering these these restrictions, restrictions, such such methods methods may may not not be be appropriate appropriate for for automatic automatic will color matching in remote sensing applications. Matching the exact distribution is more precise than color matching in remote sensing applications. Matching the exact distribution is more precise than just matching matching the the low and computationally expensive. To just low order order statistics, statistics,but butisisalso alsomore morecomplex complex and computationally expensive. match two non-Gaussian distributions, iterative approaches have to be exploited, as there are no To match two non-Gaussian distributions, iterative approaches have to be exploited, as there are no closed-form solutions [8]. The Wasserstein CNN method is non-iterative, and is more suitable for large scale processing. Poisson image editing (PIE) is another well-known color matching method. Rather than directly matching the color distributions, the PIE method tries to preserve the gradients of the input image

Remote Sens. 2017, 9, 483

13 of 16

closed-form solutions [8]. The Wasserstein CNN method is non-iterative, and is more suitable for large scale processing. Poisson image editing (PIE) is another well-known color matching method. Rather than directly matching the color distributions, the PIE method tries to preserve the gradients of the input image and Remote Sens.2017, 9, 483 13 of 16 matches the pixel values on the border to those in the reference image. The problem is equivalent to and matches theequation. pixel values on the border those in idea the reference image. Theappropriate, problem is equivalent solving a Poisson However, in ourto case, this might not be very because the to solving a Poisson However, in reference our case, this ideacan might not be very appropriate, gradients between theequation. input image and the image be very different, especiallybecause when the the gradients between the input image and the reference image can be very different, especially when atmosphere visibility is low (see the PIE result in Figure 12). theComparisons atmosphere visibility is low (see the PIE result in Figure 12). between the color matching methods are displayed in Figure 12. The ground truth between the set, color methodsas are 12. The groundproblem. truth was notComparisons included in the training asmatching it was supposed andisplayed unknownininFigure the color matching was not included in the training set, as it was supposed as an unknown in the color matching Because the PIE, statistics transferring, and the histogram matching methods are all sample-based, problem. Because the PIE, statistics transferring, and the histogram matching methods are all samplean image must be selected from the training set to act as the reference. However, all that the WCNN based, an image must be selected from the training set to act as the reference. However, all that the model needs is the input image, thus it can operate without selecting such a reference. As the reference WCNN model needs is the input image, thus it can operate without selecting such a reference. As the is not likely to be exactly the same as the ground truth, we can see the color discrepancy between the reference is not likely to be exactly the same as the ground truth, we can see the color discrepancy output and the ground truth in the results of PIE, statistics transferring, and histogram matching in between the output and the ground truth in the results of PIE, statistics transferring, and histogram Figure 12. Also, several and descriptors were computed forcomputed all input images, output images, matching in Figure 12.features Also, several features and descriptors were for all input images, and the ground truth images in the test set, including the Oriented FAST and Rotated BRIEF (ORB) output images, and the ground truth images in the test set, including the Oriented FAST and Rotated descriptor, the Scale-Invariant Feature Transform (SIFT) descriptor, and the Binary Robust Invariant BRIEF (ORB) descriptor, the Scale-Invariant Feature Transform (SIFT) descriptor, and the Binary Scalable (BRISK)Keypoints descriptor.(BRISK) To be adescriptor. representation similarity, the of distances between RobustKeypoints Invariant Scalable To be of a representation similarity, the thedistances featuresbetween of the output and the ground truth are computed, and are displayed in the boxplots the features of the output and the ground truth are computed, and are displayed in in Figure 13. the boxplots in Figure 13.

Figure12. 12.Comparisons Comparisons between between color Figure colormatching matchingmethods. methods.

From Figure 13 we can see that generally the processed images are closer to the ground truth, in From Figure 13 we can see that generally the processed images are closer to the ground truth, regards to the distances of the feature descriptors, except for the PIE method. One of the reasons why in regards to the distances of the feature descriptors, except for the PIE method. One of the reasons PIE fails to generate high quality results is that the low atmosphere visibility may deteriorate the why PIE fails to generate high quality results is that the low atmosphere visibility may deteriorate gradients, resulting in a significant difference between the gradients of the input image and the theground gradients, resulting in a significant difference between the gradients image truth. The WCNN model results achieve the maximum similarityof tothe theinput ground truth,and andthe ground truth. The WCNN model results achieve the maximum similarity to the ground truth, and the the model is also the most stable one. model is also the most stable one.

Remote Sens. 2017, 9, 483 Remote Sens.2017, 9, 483

14 of 16 14 of 16

Figure 13. Boxplots Boxplots of L1-norm distances distances between between the the processed processed images images and and the ground truth with respect to left: ORB; middle: SIFT, and right: BRISK feature descriptors. The The distances distances represent the dissimilarity between the processed results and the ground truth (the smaller the better). There dissimilarity between the processed results and There are five horizontal line segments in each patch, indicating indicating five percentiles of the distances within the the maximum maximum (worst) (worst) distance, distance, processed images by the corresponding method; from top to bottom: the minimum (best) distance. distance. the worst-25% distance, the median distance, the best-25% distance, and the minimum

4.3. Processing Processing Time and Memory Comsumption ® The processing time of of 512 512 patches patches with withaasize sizeof of227 227×× 227 227 × × 3 on a single single NVIDIA NVIDIA®® GeForce® −3 −3 1010 s for GTX 1080 s, s, oror0.8 a single patch, which means thatthat the 1080 graphics graphicsprocessing processingunit unitisis0.408 0.408 0.8×× s for a single patch, which means the method could handle images as large 2000 × 2000 realtime. time.AAtotal totalofof6990 6990MB MB memory memory is method could handle images as large as as 2000 × 2000 in in real consumed for 512 patches, or 13.7 MB for each.

5. Conclusions Conclusions 5. This paper paper presents presents aa nonparametric nonparametric color color correcting correcting scheme scheme in in aa probabilistic probabilistic optimal optimal transport transport This framework, based on the Wasserstein CNN model. The multi-scale features are first to be extracted from framework, based on the Wasserstein CNN model. The multi-scale features are first to be extracted the intermediate layers, and then are used to infer the corrected color distribution which minimizes from the intermediate layers, and then are used to infer the corrected color distribution which the errors with respect to the ground truth. The experimental results demonstrate that the method is minimizes the errors with respect to the ground truth. The experimental results demonstrate that the able to handle images of different sources, resolutions, and illumination and atmosphere conditions. method is able to handle images of different sources, resolutions, and illumination and atmosphere With high efficiency in computing and memory consumption, theconsumption, proposed method shows its conditions. With high efficiency speed in computing speed and memory the proposed prospectsshows for utilization in realfor time processing large-scale remote sensing datasets.remote sensing method its prospects utilization in of real time processing of large-scale We are currently extending the global color matching algorithm to take the local inhomogeneity datasets. of illumination into consideration, ordercolor to enhance thealgorithm precision.toLocal histogram matching of We are currently extending thein global matching take the local inhomogeneity each band could serve for reflectance retrieval and atmospheric parameter retrieval purposes, and the of illumination into consideration, in order to enhance the precision. Local histogram matching of preliminary results are encouraging. each band could serve for reflectance retrieval and atmospheric parameter retrieval purposes, and

the preliminary results are encouraging.

Acknowledgments: This work was supported by the National Natural Science Foundation of China (Grant No. 61331017.) Acknowledgments: This work was supported by the National Natural Science Foundation of China (Grant No. 61331017.) Author Contributions: Jiayi Guo and Bin Lei conceived and designed the experiments; Jiayi Guo performed the experiments; Jiayi Guo analyzed the data; Bin Lei and Chibiao Ding contributed materials and computing Author Contributions: Guo Pan and wrote Bin Leithe conceived resources; Jiayi Guo andJiayi Zongxu paper. and designed the experiments; Jiayi Guo performed the experiments; Jiayi Guo analyzed the data; Bin Lei and Chibiao Ding contributed materials and computing Conflicts of Interest: The authors declare no conflict of interest. resources; Jiayi Guo and Zongxu Pan wrote the paper.

Conflicts of Interest: The authors declare no conflict of interest.

Remote Sens. 2017, 9, 483

15 of 16

References 1. 2. 3. 4.

5. 6. 7.

8.

9.

10. 11. 12.

13. 14.

15. 16. 17. 18.

19. 20. 21. 22.

Haichao, L.; Shengyong, H.; Qi, Z. Fast seamless mosaic algorithm for multiple remote sensing images. Infrared Laser Eng. 2011, 40, 1381–1386. Rau, J.; Chen, N.-Y.; Chen, L.-C. True orthophoto generation of built-up areas using multi-view images. Photogramm. Eng. Remote Sens. 2002, 68, 581–588. Reinhard, E.; Adhikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [CrossRef] Abadpour, A.; Kasaei, S. A fast and efficient fuzzy color transfer method. In Proceedings of the IEEE Fourth International Symposium on Signal Processing and Information Technology, Rome, Italy, 18–21 Dcember 2004; pp. 491–494. Kotera, H. A scene-referred color transfer for pleasant imaging on display. In Proceedings of the IEEE International Conference on Image Processing, Genova, Italy, 14 November 2005. Morovic, J.; Sun, P.-L. Accurate 3d image colour histogram transformation. Pattern Recognit. Lett. 2003, 24, 1725–1735. [CrossRef] Neumann, L.; Neumann, A. Color style transfer techniques using hue, lightness and saturation histogram matching. In Proceedings of the Computational Aesthetics in Graphics, Visualization and Imaging, Girona, Spain, 18–20 May 2005; pp. 111–122. Pitie, F.; Kokaram, A.C.; Dahyot, R. N-dimensional probability density function transfer and its application to color transfer. In Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 17–21 October 2005; pp. 1434–1439. Reinhard, E.; Pouli, T. Colour spaces for colour transfer. In Proceedings of the International Workshop on Computational Color Imaging, Milan, Italy, 20–21 April 2011; Springer: Berlin/Heidelberg, Germany; pp. 1–15. An, X.; Pellacini, F. User-controllable color transfer. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2010; pp. 263–271. Pouli, T.; Reinhard, E. Progressive color transfer for images of arbitrary dynamic range. Comput. Graph. 2011, 35, 67–80. [CrossRef] Tai, Y.-W.; Jia, J.; Tang, C.-K. Local color transfer via probabilistic segmentation by expectation-maximization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 747–754. HaCohen, Y.; Shechtman, E.; Goldman, D.B.; Lischinski, D. Non-rigid dense correspondence with applications for image enhancement. ACM Trans. Graph. 2011, 30, 70. [CrossRef] Kagarlitsky, S.; Moses, Y.; Hel-Or, Y. Piecewise-consistent color mappings of images acquired under various conditions. In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2311–2318. Bonneel, N.; Sunkavalli, K.; Paris, S.; Pfister, H. Example-based video color grading. ACM Trans. Graph. 2013, 32, 39:1–39:12. [CrossRef] Wang, Q.; Yan, P.; Yuan, Y.; Li, X. Robust color correction in stereo vision. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 965–968. Gong, H.; Finlayson, G.D.; Fisher, R.B. Recoding color transfer as a color homography. arXiv 2016, arXiv:1608.01505. Liao, D.; Qian, Y.; Li, Z.-N. Semisupervised manifold learning for color transfer between multiview images. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 259–264. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. arXiv 2016, arXiv:1611.07004. Larsson, G.; Maire, M.; Shakhnarovich, G. Learning representations for automatic colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 577–593. Zhang, R.; Isola, P.; Efros, A.A. Colorful image colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 649–666. Wang, Q.; Lin, J.; Yuan, Y. Salient band selection for hyperspectral image classification via manifold ranking. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1279–1289. [CrossRef] [PubMed]

Remote Sens. 2017, 9, 483

23. 24. 25. 26. 27. 28. 29. 30. 31.

16 of 16

Vallet, B.; Lelégard, L. Partial iterates for symmetrizing non-parametric color correction. ISPRS J. Photogramm. Remote Sens. 2013, 82, 93–101. [CrossRef] Danila, B.; Yu, Y.; Marsh, J.A.; Bassler, K.E. Optimal transport on complex networks. Phys. Rev. E Stat. Nomlin. Soft. Matter Phys. 2006, 74, 046106. [CrossRef] [PubMed] Villani, C. Optimal Transport: Old and New; Springer: Berlin, Germany, 2008. Pitié, F.; Kokaram, A. The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In Proceedings of the European Conference on Visual Media Production, London, UK, 27–28 November 2007. Frogner, C.; Zhang, C.; Mobahi, H.; Araya, M.; Poggio, T.A. Learning with a wasserstein loss. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 2015; pp. 2053–2061. Cuturi, M.; Avis, D. Ground metric learning. J. Mach. Learn. Res. 2014, 15, 533–564. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and