Matching Multi-Sensor Remote Sensing Images via an Affinity ... - MDPI

4 downloads 0 Views 8MB Size Report
2 days ago - Second, topological structures between image features are not abandoned; instead, they are treated as geometric similarities in the matching ...
remote sensing Article

Matching Multi-Sensor Remote Sensing Images via an Affinity Tensor Shiyu Chen 1 1 2 3 4 5

*

ID

, Xiuxiao Yuan 2,3 , Wei Yuan 2,4, *

ID

, Jiqiang Niu 1 , Feng Xu 1 and Yong Zhang 5

ID

School of Geographic Sciences, Xinyang Normal University, 237 Nanhu Road, Xinyang 464000, China; [email protected] (S.C.); [email protected] (J.N.); [email protected] (F.X.) School of Remote Sensing and Information Engineering, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; [email protected] Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China Center for Spatial Information Science, University of Tokyo, Kashiwa 277-8568, Japan Visiontek Research, 6 Phoenix Avenue, Wuhan 430205, China; [email protected] Correspondence: [email protected]; Tel.: +86-27-6877-1228

Received: 10 May 2018; Accepted: 26 June 2018; Published: 11 July 2018

 

Abstract: Matching multi-sensor remote sensing images is still a challenging task due to textural changes and non-linear intensity differences. In this paper, a novel matching method is proposed for multi-sensor remote sensing images. To establish feature correspondences, an affinity tensor is used to integrate geometric and radiometric information. The matching process consists of three steps. First, features from an accelerated segment test are extracted from both source and target images, and two complete graphs are constructed with their nodes representing these features. Then, the geometric and radiometric similarities of the feature points are represented by the three-order affinity tensor, and the initial feature correspondences are established by tensor power iteration. Finally, a tensor-based mismatch detection process is conducted to purify the initial matched points. The robustness and capability of the proposed method are tested with a variety of remote sensing images such as Ziyuan-3 backward, Ziyuan-3 nadir, Gaofen-1, Gaofen-2, unmanned aerial vehicle platform, and Jilin-1. The experiments show that the average matching recall is greater than 0.5, which outperforms state-of-the-art multi-sensor image-matching algorithms such as SIFT, SURF, NG-SIFT, OR-SIFT and GOM-SIFT. Keywords: image matching; multi-sensor remote sensing image; graph theory; affinity tensor; matching blunder detection

1. Introduction Image matching aims to find feature correspondences among two or more images, and it is a crucial step in many remote sensing applications such as image registration, change detection, 3D scene reconstruction, and aerial triangulation. Among all the matching methods, feature-based matching methods receive the most attention. These methods can be summarized in three steps: (1) feature detection; (2) feature description; and (3) descriptor matching. In feature detection, salient features such as corners, blobs, lines, and regions are extracted from images. The commonly used feature detection methods are Harris corner detector [1], features from accelerated segment test (FAST) [2], differences of Gaussian (DoG) features [3], binary robust invariant scalable (BRISK) features [4], oriented FAST and rotation BRIEF (ORB) [5], Canny [6], and maximally stable external regions (MSER) [7]. Among these image features, Harris, FAST, and DoG are point features which are invariant to image rotation. In particular, DoG is invariant to scale and linear illumination changes. Canny and MSER are line and region feature, respectively.

Remote Sens. 2018, 10, 1104; doi:10.3390/rs10071104

www.mdpi.com/journal/remotesensing

Remote Sens. 2018, 10, 1104

2 of 19

Remote Sens. 2018, 10, x FOR PEER REVIEW

2 of 19

In the the feature feature description, description, every every image image feature feature is is assigned assigned to to aa unique unique signature signature to to distinguish distinguish In itself from others so that the similar features will have close descriptors, while different features itself from others so that the similar features will have close descriptors, while different features will will characterize far different descriptors. State-of-the-art featuredescriptors descriptorsinclude include scale-invariant scale-invariant characterize far different descriptors. State-of-the-art feature feature transform transform (SIFT) (SIFT) [3], [3], speeded-up speeded-up robust robust features features (SURF) (SURF) [8], [8], principle principle component component analysis analysis feature SIFT (PCA-SIFT) [9], gradient location and orientation histogram (GLOH) [10], local self-similarity SIFT (PCA-SIFT) [9], gradient location and orientation histogram (GLOH) [10], local self-similarity (LSS) [11], (DOBSS) [12], local intensity order pattern (LIOP) [13], (LSS) [11], distinctive distinctiveorder-based order-basedself-similarity self-similarity (DOBSS) [12], local intensity order pattern (LIOP) and a multi-support region order-based gradient histogram (MROGH) [14]. For line features, [13], and a multi-support region order-based gradient histogram (MROGH) [14]. For line features, the mean-standard mean-standarddeviation deviation line descriptor (MSLD) andintersection line intersection the line descriptor (MSLD) [15] [15] and line contextcontext feature feature (LICF) (LICF) [16] have good performances. For region features, Forssén and Lowe [17] gave their solution [16] have good performances. For region features, Forssén and Lowe [17] gave their solution based based ondescriptor. SIFT descriptor. on SIFT In the the descriptor and itsits candidates areare evaluated by In descriptor matching, matching,the thesimilarities similaritiesbetween betweenone onefeature feature and candidates evaluated the distances of description vectors. The commonly used distance metrics for descriptor matching are by the distances of description vectors. The commonly used distance metrics for descriptor matching Euclidean distance, Hamming distance, Mahalanobis distance,distance, and a normalized correlationcorrelation coefficient. are Euclidean distance, Hamming distance, Mahalanobis and a normalized To determine final correspondences, sometimes a threshold, such the nearest neighbor coefficient. Tothe determine the final correspondences, sometimes a as threshold, such as the distance nearest ratio (NNDR) [3], is applied in descriptor matching. neighbor distance ratio (NNDR) [3], is applied in descriptor matching. Though many many researchers researchers have have made made massive massive improvements improvements in in the the aforementioned matching Though aforementioned matching steps in trying to obtain stable and practical matching results, matching multi-sensor remote sensing steps in trying to obtain stable and practical matching results, matching multi-sensor remote sensing images is still a challenging task. The matching difficulties are the results of two issues. First, images of images is still a challenging task. The matching difficulties are the results of two issues. First, images thethe same scene from which are are of same scene fromdifferent differentsensors sensorsoften oftenpresent presentdifferent differentradiometric radiometric characteristics characteristics which known as non-linear intensity differences [12,18]. These intensity differences can be exampled by known as non-linear intensity differences [12,18]. These intensity differences can be exampled by Figure 1a, 1a, showing Gaofen-1 (GF-1) sensors. Due to Figure showing aa pair pair of ofimages imagesacquired acquiredby byZiyuan-3 Ziyuan-3(ZY-3) (ZY-3)and and Gaofen-1 (GF-1) sensors. Due non-linear intensity differences caused by different sensors and illumination, the distance between to non-linear intensity differences caused by different sensors and illumination, the distance between local descriptors descriptorsofoftwo two conjugated feature points areclose not close enough, and thus lead matches. to false local conjugated feature points are not enough, and thus lead to false matches.images Second, images from sensors differentare sensors often shot at different whichrise gives Second, from different often are shot at different momentsmoments which gives to rise to considerable textural changes. As shown in Figure 1b, this image pair was captured by ZY-3 considerable textural changes. As shown in Figure 1b, this image pair was captured by ZY-3 and and Gaofen-2 (GF-2) sensors. Thetime shotspans time over spansthree overyears, threeand years, the rapid changes of the Gaofen-2 (GF-2) sensors. The shot theand rapid changes of the modern modern city resulted in abundant changes images. Textural changes will bring down city resulted in abundant texturaltextural changes amongamong images. Textural changes will bring down the the repetitive rate of image features and lead to fewer correct matches. Both of these aspects reduce repetitive rate of image features and lead to fewer correct matches. Both of these aspects reduce the the matching recall to unsatisfying matching results. matching recall [10][10] andand thusthus leadlead to unsatisfying matching results.

(a)

(b)

Figure Figure 1. 1. Non-linear Non-linear intensity intensity differences differences and and textural textural changes changes between between image image pairs: pairs: (a) (a) The The image image pair consists of GF-1 and ZY-3 sensor images and (b) GF-2 and ZY-3 sensor images. pair consists of GF-1 and ZY-3 sensor images and (b) GF-2 and ZY-3 sensor images.

As the most representative method of feature-based matching, SIFT-based methods have been As the most representative method of feature-based matching, SIFT-based methods have been studied extensively. Its upgraded versions are successfully applied in matching multi-sensor remote studied extensively. Its upgraded versions are successfully applied in matching multi-sensor remote sensing images. Yi et al. [19] proposed gradient orientation modification SIFT (GOM-SIFT) which sensing images. Yi et al. [19] proposed gradient orientation modification SIFT (GOM-SIFT) which modified the gradient orientation of SIFT and gave restrictions on scale changes to improve the modified the gradient orientation of SIFT and gave restrictions on scale changes to improve the matching precision. The simple strategy was also used by Mehmet et al. [20]; they proposed an matching precision. The simple strategy was also used by Mehmet et al. [20]; they proposed orientation-restricted SIFT (OR-SIFT) descriptor which only had half the length of GOM-SIFT. When an orientation-restricted SIFT (OR-SIFT) descriptor which only had half the length of GOM-SIFT. non-linear intensity differences and considerable textural changes are present between images, When non-linear intensity differences and considerable textural changes are present between images, simply reversing the gradient orientation of image features cannot increase the number of correct simply reversing the gradient orientation of image features cannot increase the number of correct matches. Hasan et al. [21] took advantage of neighborhood information from SIFT’s key points. The experimental results showed that this method could generate more correct matches because the geometric structures between images were relatively static. Saleem and Sablatnig [22] used

Remote Sens. 2018, 10, 1104

3 of 19

matches. Hasan et al. [21] took advantage of neighborhood information from SIFT’s key points. The experimental results showed that this method could generate more correct matches because the geometric structures between images were relatively static. Saleem and Sablatnig [22] used normalized gradients SIFT (NG-SIFT) for matching multispectral images; their results showed that the proposed method achieved a better matching performance against non-linear intensity changes between multispectral images. Considering the self-similarity of an image patch, local self-similarity (LSS) captures the internal geometric layouts of feature points. LSS is another theory which has been successfully applied for the registration of thermal and visible videos, and it has handled complex intensity variations. Shechtman [11] first proposed the LSS descriptor and used it as an image-based and video-based similarity measure. Kim et al. [23] improved LSS based on the observation that a self-similarity existing within images was less sensitive to modality variations. They proposed a dense adaptive self-correlation (DASC) descriptor for multi-modal and multi-spectral correspondences. In an earlier study, Kim et al. [24] extended the standard LSS descriptor into the frequency domain and proposed the LSS frequency (LSSF) descriptor in matching multispectral and RGB-NIR image pairs. Their experimental results showed LSSF was invariant to image rotation, and it outperformed other state-of-the-art descriptors. Despite the attractive illumination-invariance property, both standard LSS and extended LSS have drawbacks in remote sensing image matching because of their relatively low descriptor discriminability. To conquer this shortcoming, Ye and San [18] used GOM-SIFT in the coarse registration process, and then utilized LSS descriptor for finding feature correspondences in the fine registration process. Their proposed method alleviated the low distinctiveness effect of LSS descriptors and achieved satisfied matching results in multispectral remote sensing image pairs. Sedaghat and Ebadi [12] combined the merits of LSS and the intensity order pooling scheme, and proposed DOBSS for matching multi-senor remote sensing images. Similar to Ye and San’s matching strategy [18], Liu et al. [25] proposed a multi-stage matching approach for multi-source optical satellite imagery matching. They firstly estimated homographic transformation between source and target image by initial matching, and then used probability relaxation to expand matching. Though the matching scheme presented in [25] obtained satisfied matching results in satellite images, it has the same defect as in [18] that if the initial matching fails, the whole matching process will end up with a failure. Most feature-based matching methods boil down to finding the closest descriptor in distance while seldom considering the topological structures between image features. In multi-sensor image pairs, non-linear intensity differences lead to dissimilarities of conjugate features in radiometry, and changed textures caused by different shooting moments have no feature correspondences in essence. Descriptor-based matching recall, which is a very crucial factor for a robust matching method, only depends on the radiometric similarity of feature pairs. On the contrary, although captured by different sensors or shot in different moments, the structures among images remain relatively static [22], and they are invariant to intensity changes. Therefore, structural information can be utilized to obtain robust and reliable matching results. Based on this fact, many researchers cast the feature correspondence problem as the graph matching problem (e.g., [26–29]). The affinity tensor between graphs, as the core of graph matching [26], paves the way for multi-sensor remote sensing image matching, since radiometric and geometric information of image features can be easily encoded in a tensor. Duchenne et al. [28] used geometric constraints in feature matching and gained favorable matching results. Wang et al. [30] proposed a new matching method based on a multi-tensor which merged multi-granularity geometric affinities. The experimental results showed that their method was more robust than the method proposed in [28]. Chen et al. [31] observed that outliers were inevitable in matching, so they proposed a weighted affinity tensor for poor textural image matching and gained better experimental results compared to feature-based matching algorithms. To address the multi-sensor image matching problem, an affinity tensor-based matching (ATBM) method is proposed, in which the radiometric information between feature pairs and structural

Remote Sens. 2018, 10, 1104

4 of 19

information between feature tuples are both utilized. Compared to the traditional descriptor-based matching methods, the main differences are in four folds. First, image features are not isolated in matching; instead, they are ordered triplets to compute the affinity tensor elements (Section 2.1). Second, topological structures between image features are not abandoned; instead, they are treated as geometric theREVIEW matching process. Geometric similarities and radiometric similarities Remote Sens.similarities 2018, 10, x FORin PEER 4 of 19 play the same role in matching by vector normalization and balancing (Sections 2.2 and 2.2). Third, as geometric similarities in the matching process. Geometric radiometric similarities the affinity tensor-based matching method inherently has similarities the abilityand to detect matching blunders play the same role in matching by vector normalization and balancing (Sections 2.2 and 2.3). Third, (Section 2.4). At last, the tenor-based model is a generalized matching model in which geometric and the affinity tensor-based matching method inherently has the ability to detect matching blunders radiometric information can be easily integrated. (Section 2.4). At last, the tenor-based model is a generalized matching model in which geometric and radiometric information can be easily integrated.

2. Methodology

2. Methodology Given a source and target images acquired from different sensors, image matching aims to find point correspondences between the two images. The proposed method for matching involves a Given a source and target images acquired from different sensors, image matching aims to find four-step process, namely complete graph building [32], affinity tensor construction, tensor power point correspondences between the two images. The proposed method for matching involves a fouriteration matching error graph detection. In the first stage,tensor an upgraded FASTtensor detector (named step and process, namelygross complete building [32], affinity construction, power iteration andFAST, matching gross errorasdetection. In the first will stage,beandetailed upgraded detector (named uniform robust abbreviated UR-FAST, which inFAST Section 2.2) was used to uniform robust FAST, abbreviated as UR-FAST, which will be detailed in Section 2.2) was used to with extract image features, and two complete graphs in both source and target images are built extract image features, and two complete graphs in both source and target images are built with their their nodes representing FAST features. Then, the affinity tensor between the graphs is built with nodes representing FAST features. Then, the affinity tensor between the graphs is built with its its elements expressing the node similarities. Next, the tensor power iteration is applied to obtain elements expressing the node similarities. Next, the tensor power iteration is applied to obtain the the leading vector which contains coarse matching results. Finally, a gross error detection method is leading vector which contains coarse matching results. Finally, a gross error detection method is followed for purifying thethe matching showsthe themain main process of the proposed method. followed for purifying matchingresults. results.Figure Figure 22 shows process of the proposed method.

Figure 2. The workflow of tensor matching and matching blunder detection.

Figure 2. The workflow of tensor matching and matching blunder detection.

2.1. Definition of the Affinity Tensor Given a source and a target image, FS and FT are the two image feature point sets extracted from the source and target image, respectively; the numbers of features in FS and FT are m and n;

Remote Sens. 2018, 10, 1104 Remote Sens. 2018, 10, x FOR PEER REVIEW

5 of 19 5 of 19

i ( iDefinition ≤ m ) and i′ (Affinity i′ ≤ n ) are feature indices of FS and FT ; G S and GT are the two complete 2.1. of the Tensor graphs built by FS and FT (as illustrated in Figure 2). In the mathematical field of graph theory, a Given a source and a target image, FS and FT are the two image feature point sets extracted from complete is a simple graph which every pair of nodes is connected by a the sourcegraph and target image,undirected respectively; the in numbers of features indistinct FS and F T are m and n; i (i ≤ m) 0 0 unique can see from of Figure 2, there ten and eight features in the source and target and i (iedge. ≤ n)As arewe feature indices FS and FT ; Gare S and GT are the two complete graphs built by FS image, respectively. According to the complete graph aretheory, unique edges C102 = 45 and FT (as illustrated in Figure 2). In the mathematicaltheory, field ofthere graph a complete graphthat is a 2 connect node pairsgraph in GS in , and unique edges that connect node pairs GT . edge. As we C 8 every = 28 pair simple undirected which of distinct nodes is connected by ainunique can see Figure 2, there arebuilt ten on andthe eight in graphs the source image,describes respectively. is twofeatures complete ( G S and andtarget the Thefrom affinity tensor GT ) which A 2 = 45 unique edges that connect node pairs in According to the complete graph theory, there are C similarity of the two graphs. In this paper, tensor A 10 can be considered as a three-way array with a 2 mn×mn ×mn GS , and unique edges that connect pairs in GT . element located in column I, row J and andnode size of mCn8 ×=m 28 aI ; J ; K is a tensor n × m n , i.e., A ∈ R The affinity tensor A is built on the two complete graphs (GS and GT ) which describes the tube K in the tensor cube (as shown in Figure 2). To explicitly express how the tensor elements are similarity of the two graphs. In this paper, tensor A can be considered as a three-way array with a size constructed, we use (i,i′), (j,j′)mn and (k,k′) to replace I, J, and K. The indices i, j, and k are feature indices of mn × mn × mn, i.e., A ∈ R ×mn×mn and a I;J;K is a tensor element located in column I, row J and in the source image ranging from 0 to m − 1; i′, j′ and k′ are feature indices in the target image ranging tube K in the tensor cube (as shown in Figure 2). To explicitly express how the tensor elements are from 0 to n − 1. In this0 way,0 the tensor element a is rewritten as a located at (i × n + i′) ,k ' constructed, we use (i,i ), (j,j ) and (k,k0 ) to replace I,I ; JJ,; K and K. The indicesi ,ii,'; j ,j,j '; kand k are feature indices 0 0 0 column, (j × n + j′) row, and (k × n + k′) tube, and it expresses the similarity of triangles and T i ,Sranging in the source image ranging from 0 to m − 1; i , j and k are feature indices in the target image j,k T from to n −are 1. the In this the tensor element a I;J;KGis rewritten as a 0 0 0 located at (i × n + i0 ) GS and Ti ', j ', k ' 0which twoway, triangles in graph T , respectively. i,i ;j,j ;k,k S column, × n +element j0 ) row, aand (k × can n + be k0 ) computed tube, and by it expresses The (j tensor Equation the (1): similarity of triangles Ti,j,k and i , i '; j , j '; k , k '

TiT0 ,j0 ,k0 which are the two triangles in graph GS and GT , respectively. wb * fi , j ,k − fi ', j ',k ' )2  − (be 2 The tensor element ai,i0 ;j,j0 ;k,k0 can computed by Equation (1): ε2 e , if i = j = k and i ' = j ' = k '   (wb ∗k − f ',k0' 0)k2 )2 ( ffi,j,k i , j ,k − fii',0j,j ,k2 2  −  − 0 0 0   εε22  e = k iand ≠ kj and e , ,if i f≠ ij = ' ≠ ji ' ≠=kj' = k  ai ,i '; j , j '; k ,k ' =   (k f i,j,k − f 0 0 0 k )2 i ,j ,k 2 − ai,i0 ;j,j0 ;k,k0 = ε2 , i f i 6= j 6= k and i0 6= j0 6= k0 e   0, others         0, others

(1) (1)

where f i, j,k and f i', j',k' are the geometric descriptors of Ti ,Sj ,Sk and Ti T', Tj ', k ' . As shown in Figure 3, the where f i,j,k and f i0 ,j0 ,k0 are the geometric descriptors of Ti,j,k and Ti0 ,j0 ,k0 . As shown in Figure 3, geometric descriptors areare usually expressed as as thethe cosines the geometric descriptors usually expressed cosinesofofthe thethree threevertical verticalangles anglesin in aa triangle triangle ε (i.e., f f i, j,k = variable represented the the Gaussian Gaussian kernel band width; = (cos θθi , ,cos θ θj , cos θ kθ) )).).The (i.e., ( cos cos , cos The variable ε represented kernel band width; k..k is is i j i,j,k k the length of a vector; and wwbb isisaabalanced balancedfactor factorwhich whichwill willbebedetailed detailedbelow. below.

i

i'

θi

j

θj

θi'

θk GP

j'

θ j' θk'

k

k'

G

Q

Figure 3. Diagram of the triangle descriptors. Figure 3. Diagram of the triangle descriptors.

As shown in Equation (1), triangle Ti ,SSj , k and TTi T', j ', k ' will degenerate to a pair of points when As shown in Equation (1), triangle Ti,j,k and Ti0 ,j0 ,k0 will degenerate to a pair of points when i = j = k and 0 i ' =0 j ' = 0k ' ; thus, f i, j,k and f i', j',k' will be radiometric descriptors (e.g., 128 i = j = k and i = j = k ; thus, f i,j,k and f i0 ,j0 ,k0 will be radiometric descriptors (e.g., 128 dimensional dimensional SIFT descriptor or SURF 64 dimensional descriptor) of image featurethei geometric and i ' . SIFT descriptor or 64 dimensional descriptor)SURF of image feature i and i0 . However, However, the geometric descriptors are related to cosines of angles, and the radiometric descriptors are related to intensity of image pixels; the two kinds of descriptors have different physical

Remote Sens. 2018, 10, 1104

6 of 19

descriptors are related to cosines of angles, and the radiometric descriptors are related to intensity of image pixels; the two kinds of descriptors have different physical dimensions. In addition, the radiometric descriptors are more distinctive than geometric descriptors because the radiometric descriptors have higher dimensions than those of the geometric descriptors. To balance the differences of physical dimensions and descriptor distinctiveness, Wang et al. [30] suggested normalizing all the descriptors to a unit norm. However, a simple normalization can only alleviate the physical dimensional differences. Thus, a constant factor wb is used to balance the differences of descriptor distinctiveness. In this way, the geometric and radiometric information are equally important when encoded in the affinity tensor. The constant factor wb can be estimated by the correct matches of image pairs in prior: wb =

fr fg

(2)

where f r and f g are the average distances of normalized radiometric and geometric descriptors, respectively. Though the balanced factor wb differs in different image pairs, they have the same order of magnitude. The order of magnitude of wb can be estimated as follows: (1) (2) (3) (4) (5)

Select an image pair from the images to be matched, extract a number of tie points manually or automatically (the automatic matching method could be SIFT, SURF, and so on); Remove the erroneous tie points by manual check or gross error detection algorithm such as Random sample consensus (RANSAC) [33]; Estimate the average feature descriptor distance of the matched features, named f r ; Construct triangulated irregular network (TIN) by the matched points, and compute the average triangle descriptor distance of the matched triangles, named f g ; Compute the balanced factor wb with Equation (2).

In practical matching tasks, calculating wb for every image pair is very time consuming and unnecessary, and only the order of magnitude of wb is significant. Thus, we sample a pair of images from the image set to be matched, estimate the order of magnitude of wb , and apply this factor for the rest of image pairs. The configuration of wb also has some principles in tensor matching: if both geometric and radiometric distortions are small, then wb can be an arbitrary positive real number because either geometric or radiometric information is sufficient for matching; if geometric distortions are small, then it is preferable a greater wb because radiometric information should be suppressed in matching; if radiometric distortions are small, then a smaller wb would be better because large geometric distortions will contaminate the affinity tensor if not depressed. 2.2. Construction of the Affinity Tensor The affinity tensor of two complete graphs can be constructed by Equation (1). Whereas a pair of images may consist of thousands of image features, and using all the features to construct a complete tensor is very memory and time consuming, sometimes a partial tensor is sufficient for matching. Besides, the feature repeatability is relatively low in multi-sensor remote sensing images, thus leading to small overlaps between feature sets. In addition, the outliers of image features mix with inliers and introduce unrelated information in the tensor which leads to a local minimum in power iteration [34]. Moreover, to speed up tensor power iteration, the affinity tensor should maintain a certain sparseness. Based on the above requirements, the following four strategies are proposed to construct a small, relatively pure and sparse tensor. (1) Extracting high repetitive and evenly distributed features. Repetitiveness of features is a critical factor for a successful matching [2]. We evaluated common used image features such as SIFT, SURF, and so on, and found that FAST feature had the highest feature repetitiveness. As mentioned

Remote Sens. 2018, 10, 1104

7 of 19

in Section 2.1, the measure of structural similarities of the graph nodes are three inner angles of the 77 of of 19 19 triangles. However, if any two vertices in a triangle has a small distance, then a small shift of a vertex may lead to tremendous differences of the inner angles. Consequently, noises are introduced may may lead lead to to tremendous tremendous differences differences of of the the inner inner angles. angles. Consequently, Consequently, noises noises are are introduced introduced in in in computing tensor elements. Inspired by UR-SIFT (uniform robust SIFT [35]), we design a new computing tensor elements. Inspired by UR-SIFT (uniform robust SIFT [35]), we design a new version computing tensor elements. Inspired by UR-SIFT (uniform robust SIFT [35]), we design a new version version of FAST named UR-FAST to extract evenly distributed image features.shown As shown in Figure 4, of of FAST FAST named named UR-FAST UR-FAST to to extract extract evenly evenly distributed distributed image image features. features. As As shown in in Figure Figure 4, 4, the the the standard FAST features are clustered in violently changed textures,the while the modified FAST standard standard FAST FAST features features are are clustered clustered in in violently violently changed changed textures, textures, while while the modified modified FAST FAST features features features have a better distribution. It should be noted that UR-FAST detector is only applied in the have have aa better better distribution. distribution. It It should should be be noted noted that that UR-FAST UR-FAST detector detector is is only only applied applied in in the the source source source image, and the features in the targetare image are acquired by standardand FAST and approximate image, image, and and the the features features in in the the target target image image are acquired acquired by by standard standard FAST FAST and approximate approximate nearest nearest nearest neighbors (ANN) [36], which will be detailed in strategy (2). neighbors neighbors (ANN) (ANN) [36], [36], which which will will be be detailed detailed in in strategy strategy (2). (2). Remote Remote Sens. Sens. 2018, 2018, 10, 10, xx FOR FOR PEER PEER REVIEW REVIEW

(a) (a)

(b) (b)

Figure FAST and (b) UR-FAST. Figure 4. Comparison of standard FAST and UR-FAST. UR-FAST. (a) Standard Standard FAST FASTand and(b) (b)UR-FAST. UR-FAST. Figure4. 4.Comparison Comparisonof ofstandard standardFAST FAST and UR-FAST. (a) (a) Standard

(2) (2) Reducing Reducing the the number number of of image image features. features. The The detected detected features features are are filtered filtered by by ANN ANN [36] [36] (2) Reducing the number of image features. The detected features are filtered by ANN [36] F F searching algorithm for constructing two feature sets and . As illustrated in Figure 5a, there FTT . illustrated searching algorithm algorithm for for constructing constructing two twofeature featuresets setsF FSSand and As illustrated in Figure 5a, there searching FT . As in Figure 5a, there are S are 3 UR-FAST features and 13 FAST features in the source and target images, respectively. As shown 3 UR-FAST features FAST features source andtarget targetimages, images,respectively. respectively.As As shown shown 3are UR-FAST features andand 13 13 FAST features in in thethe source and in for every feature element in F , ANN searching in Figure Figure 5b, 5b, to to guarantee guarantee feature feature repetitiveness, repetitiveness, ANN searching searching in Figure 5b, to guarantee feature repetitiveness, for for every every feature feature element element in in FFSSS ,, ANN algorithm is used to search its 4 most probable matches among the 13 features in the target image. algorithm is used to search its 4 most probable matches among the 13 features in the target image. F F constitute (as shown in Figure 5b, the All the candidate matches feature elements S T F F constitute (as shown in Figure 5b, the All the thecandidate candidatematches matchesofof of feature elements in All feature elements in Fin constitute F (as shown in Figure 5b, the original T S S

T

original feature target is while the decreases to 99 after feature in targetin image 13, while the feature number decreases to 9 after In this originalnumber feature number number in targetisimage image is 13, 13, while the feature feature number number decreases tofiltering). after filtering). filtering). 3 from 3 , which 33 from the original size of 33, which sharply In this way, the tensor size decreases to (3 × 9) (3 × 13) way, the tensor size decreases to (3 × 9) the original size of (3 × 13) sharply decreases In this way, the tensor size decreases to (3 × 9) from the original size of (3 × 13) , which sharply decreases the memory the memory decreases theconsumption. memory consumption. consumption. 2 2

2 2

1 1

1 1

3 3

SS

' ' B B

3 3

Source Source image image

Target Target image image

(a) (a)

FFSS

FFTT

(b) (b)

Figure 5. pink dots are outliers: original detected feature Figure 5. 5. An An illustration illustration of of outlier outlier filtering, filtering, the the pink pink dots dots are are outliers: outliers: (a) (a) original original detected detected feature feature Figure An illustration of outlier filtering, the (a) points are shown in (a); while filtered feature sets are shown in (b). points are shown in (a); while filtered feature sets are shown in (b). points are shown in (a); while filtered feature sets are shown in (b).

(3) (3) Computing Computing aa partial partial tensor. tensor. When When using using the the definition definition of of an an affinity affinity tensor tensor in in Equation Equation (1), (1), SS TT T T the tensor element means the geometric similarity of and . As described a the tensor element aii,,ii';'; jj,, jj';';kk,,kk'' means the geometric similarity of Tii,,jj,,kk and Tii',', jj',',kk'' . As described in in 33 elements should be computed for the complete tensor. Actually, completely strategy (2), (3 × 9) strategy (2), (3 × 9) elements should be computed for the complete tensor. Actually, completely computing computing the the tensor tensor will will be be redundant redundant and and make make the the tensor tensor less less sparse. sparse. Therefore, Therefore, in in this this algorithm, algorithm, only a small part of the tensor elements is computed. As shown in Figure 6, to compute only a small part of the tensor elements is computed. As shown in Figure 6, to compute aa partial partial

Remote Sens. 2018, 10, 1104

8 of 19

(3) Computing a partial tensor. When using the definition of an affinity tensor in Equation (1), S and T T the tensor element ai,i0 ;j,j0 ;k,k0 means the geometric similarity of Ti,j,k i0 ,j0 ,k0 . As described in strategy (2), (3 × 9)3 elements should be computed for the complete tensor. Actually, completely computing the Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 19 tensor will be redundant and make the tensor less sparse. Therefore, in this algorithm, only a small part of the tensor elements is computed. As shown in Figure 6, to compute a partial tensor, the ANN tensor, the ANN searching algorithm is applied once again to find the 3 most similar triangles in GT S searching algorithm is applied once again to find the 3 most similar triangles in GT for T1,2,3 (As shown S T T T T1,2,3 6, T in Figure 6, T T1',2',3' , T4 ',5',6 ' , and T5',6 ',7 ' are the searching results). That is, the forFigure (As in T1T0shown ,20 ,30 , T40 ,50 ,60 , and T50 ,60 ,70 are the searching results). That is, the tensor elements a1,40 ;2,60 ;3,50 , tensor are non-zero, the remaining a1,40 ;2,2 a1,5non-zero, a1,1';2,2 a1,50 ;2,60elements and the';3,3' remaining tensorand elements are zero. tensor In thiselements way, the ';2,60 ';3,5 ';2,6 ';3,7 ' , and ;3,70 , and a1,1 ;3,3'0 , are effects of In outliers are alleviated, the tensor is much sparser. are zero. this way, the effects and of outliers are alleviated, and the tensor is much sparser.

2

A

4 '; 1,

2,

6 ';

3,

5'

1

A1,5 ';2,6 ';3,7 '

A1

,1 ';

2, 2

';3,

3'

3

GS

GT Figure 6. ANN searching for triangles. Figure 6. ANN searching for triangles.

(4) Making the graph nodes distribute evenly. Though UR-FAST makes image features (4) Making thethere graph distribute evenly. Though UR-FAST makes image featureshave distribute distribute evenly, is nodes still some possibility that triangles in the graphs occasionally small evenly, there is still some possibility that triangles in the graphs occasionally have small areas, and the areas, and the vertices of the triangles are approximately collinear. Thus, a threshold of 15 pixels is vertices of the triangles are approximately collinear. Thus, a threshold of 15 pixels is set to make the set to make the triangles have relatively large areas. If a triangle in either graph has an area fewer triangles have relatively largetensor areas.elements If a triangle in either graph has an fewer than 15 pixels, than 15 pixels, then all the related to this triangle arearea set to zero. Formally, if S then allS the tensor elements related to thisT triangle are set to zero. Formally, if Area( Ti,j,k ) < 15, Area(Ti , j , k ) < 15 , then ai ,.; j ,.; k ,. = 0 ; if Area(Ti ', j ', k ' ) < 15 , then a.,i ';., j ';., k ' = 0 . The threshold (15 pixels) has then ai,.;j,.;k,. = 0; if Area( TiT0 ,j0 ,k0 ) < 15, then a.,i0 ;.,j0 ;.,k0 = 0. The threshold (15 pixels) has less relationship less relationship with sensor types, and it is not case specific but more of an empirical value. Since with sensor types, and it is not case specific but more of an empirical value. Since UR-FAST avoid UR-FAST avoid clustered image features, the area constraint condition is seldom violated unless the clustered image features, the area constraint condition is seldom violated unless the triangle vertices triangle vertices are approximately collinear. are approximately collinear. 2.3. Power Iteration of the Affinity Tensor 2.3. Power Iteration of the Affinity Tensor In the perspective of graph theory, the feature correspondence problem is to find the maximum In the perspective of graph theory, the feature correspondence problem is to find the maximum similar sub-graphs (i.e., maximize the summation of the node similarities). The best matching should similar sub-graphs (i.e., maximize the summation of the node similarities). The best matching should be satisfied by be satisfied by ∗ ∈ {0, 1mn ×mn 1 ×1 * z∗ =zarg max 0 0 z z 0 zzj,j0 zzk,k 0 ,, z * ∈ ∑  ai,ia0i;j,j = arg 0,1} } { , i '; j ;k,k , j '; k , k ' i,i i ,i ' j , j ' k , k ' z max 0 0 0 z ;j,j ;k,k i,i (3) i , i '; j , j '; k , k ' (3) s.t. Z ∗ 1 ≤ *1 and ( Z ∗*)TT 1 ≤ 1 s.t. Z 1 ≤ 1 and (Z ) 1 ≤ 1 ∗ = 1 indicates that i, i 0 are the indices of matched where Z ∗ ∈ {0, 1}m×n is the assignment matrix, zi,i 0 m×n * {0,1}vectorization is the assignment indicates where zZ∗*is∈the i, i to ' are i , i ' = 1all points; of Z ∗ ; 1 ismatrix, a vectorzwith elementsthat equal 1. the indices of matched * * ∗ points; is the vectorization of Z (3) ; 1shows is a vector with elements to 1. Thezconstraint term in Equation that all theall elements in equal Z 1 and ( Z ∗ )T 1 are less than 1, T i.e., every node in Gterm at most one corresponding GT , and every at most S has in The constraint Equation (3) shows that node all theinelements in Z * 1node andin(G less Z T* )has 1 are one corresponding node in GS (that is, the node mapping between GS and GT is an injective mapping). than 1, i.e., every node in GS has at most one corresponding node in GT , and every node in GT has at most one corresponding node in GS (that is, the node mapping between GS and GT is an injective mapping). Equation (3) actually is a sub-graph isomorphism problem, with no known polynomial-time algorithm (it is also known as an NP-complete problem [37]). However, the solution of Equation (3) can be well approximated by the leading vector of the affinity tensor which can be obtained by tensor

Remote Sens. 2018, 10, 1104

9 of 19

Equation (3) actually is a sub-graph isomorphism problem, with no known polynomial-time algorithm (it is also known as an NP-complete problem [37]). However, the solution of Equation (3) can be well approximated by the leading vector of the affinity tensor which can be obtained by tensor power iteration [34]; the interpretation is as follows. S and T T 9 ofare RemoteBased Sens. 2018, x FOR PEER REVIEW 19 matched on 10, Equation (1), if ai,i0 ;j,j0 ;k,k0 is bigger, then the possibility that Ti,j,k i0 ,j0 ,k0 triangles is higher, and vice versa. Thus, the tensor elements can be expressed with probability theory Based on Equation (1), if ai ,i '; j , j '; k , k ' is bigger, then the possibility that Ti ,Sj , k and Ti T', j ', k ' are under the assumption that the correspondences of the node pairs are independent [37]: matched triangles is higher, and vice versa. Thus, the tensor elements can be expressed with probability theory under the assumption that the correspondences ai,i0 ;j,j0 ;k,k0 = P(i, i0 ) P( j, j0 ) P(ofk,the k0 ) node pairs are independent [37]:

0 where P(i, i0 ) is the matching possibility a i ,i '; j , j '; k , k ' of = Pfeature (i, i ') P ( j ,pair j ') P (ikand , k ') i . Equation (4) can be rewritten in a tensor product form [38]

(4)

(4)

where P ( i , i ') is the matching possibility of feature pair i and i′. Equation (4) can be rewritten in a tensor product [38] A = p ⊗form K p ⊗K p

A = p ⊗K p ⊗K p

(5) (5)

where p is the vector with its elements expressing the matching possibility of feature pairs; ⊗K is the ⊗K no p is the vector with symbol; its elements the matching possibility ofgraphs feature that pairs;have is noise or where Kronecker tensor product andexpressing A is a tensor constructed by two the tensor Kronecker product symbol; and A are is aindependent. tensor constructed by two graphs that have no outliers, and all the node correspondences noiseThough or outliers, and all the node correspondences are independent. in practical matching tasks, noise, and outliers are inevitable, and the assumption that Though in practical matching tasks, noise, and outliers inevitable, and the can assumption that the node correspondences are independent is weak, theare matching problem be approximated by the node correspondences are independent is weak, the matching problem can be approximated by the following function the following function p∗ = arg mink A − p ⊗K p ⊗K pk F (6) p

p * = arg min A − p ⊗ K p ⊗ K p p

F

(6)

where k•k F is the operator of Frobenius norm. the exact operator of Frobenius norm. (3) generally cannot be solved in polynomial-time, it can where • F isthe Though solution of Equation Though the exact solution of Equation generally cannot be solvedcomplexity in polynomial-time, be well approximated by Equation (6), of(3) which the computational is O((mn)it3 can ). The tensor 3). The tensor be well approximated by Equation (6), of which the computational complexity is O((mn) power iteration is illustrated in Algorithm 1. power iteration is illustrated in Algorithm 1.

Algorithm Algorithm1:1:Affinity Affinitytensor tensorpower poweriteration. iteration Input: Affinity tensor A Output: Leading eigenvector p* of A 1 begin 1 A(0) ← A , p (0) ← 1 2 n 3 for n = 0 to NIteration do 4 p ( n +1) ← A( n ) ⊗ K p ( n ) ⊗ K p ( n ) (i.e., ∀i, i ', pi ,i ' ←



ai ,i '; j , j '; k , k ' p j , j ' pk , k ' )

j , j '; k , k '

p ( n +1) ←

5 6

end

7

p* ←

p ( n +1) p ( n +1)

2

p(n) p(n)

2

8 end The output 1 is1the leading vector of theof tensor A (i.e., A the(i.e., solution to Equation (6)). The outputofofAlgorithm Algorithm is the leading vector the tensor the solution to Equation (6)). The size of the affinity tensor A is mn × mn × mn (i.e., the tensor has mn columns, mn rows, and mn The size of *the affinity tensor A is mn × mn × mn (i.e., the tensor has mn columns, mn rows, and mn tubes), so p , the leading vector of the tensor A, has the length of mn. Besides, the elements of the tubes), so p* , the leading vector of the tensor A, has the length of mn. Besides, the elements of the tensor A are real numbers, so the elements of p* are real numbers too. To obtain the assignment matrix, * tensor A are real transform numbers, pso elements of (the p are real numbers too. To obtainmatrix the assignment * tothe we should firstly a m × n matrix matrix is called soft assignment because matrix, * to a m × n matrix (the matrix is called soft assignment we should firstly transform p * use of because its elements are real numbers. In the manuscript, we donate by Z), then discretize Z to Z by matrix its elements are real In the manuscript, we donate by Z), then discretize to Z* by use of greedy algorithm [39] numbers. or Hungarian algorithm [40]. Finally, the correspondences of image Z features * are generated by the assignment matrix Z .

Remote Sens. 2018, 10, 1104

10 of 19

greedy algorithm [39] or Hungarian algorithm [40]. Finally, the correspondences of image features are generated by the assignment matrix Z* . 2.4. Matching Blunder Detection by Affinity Tensor Though the aforementioned tensor matching algorithm gets rid of most of the mismatching points, there are still caused by outliers, image distortions, and noise. Thus, Remotematching Sens. 2018, 10, blunders x FOR PEER REVIEW 10 of 19this paper proposed an affinity tensor-based method for eliminating erroneous matches. 2.4. Matching Blunder Detection by Affinity Tensor The proposed method is based on the observation that the structures among images are relatively Though aforementioned tensor matching algorithm gets rid ofofmost the mismatching static. Meanwhile, thetheaffinity tensor includes structural information the of image features. Therefore, points, there are still matching blunders caused by outliers, image distortions, and noise. Thus, this the two matched graphs can induce an attributed graph. In the induced graph, every node has an paper proposed an affinity tensor-based method for eliminating erroneous matches. attribute that is computed by the summation of the similarities of the triangles which include such a The proposed method is based on the observation that the structures among images are relatively static. Meanwhile, the affinity tensor includes structural information of the image features. node. As shown from Figure 7, GS and GT are two matched graphs including six pairs of matched thepair two matched induce an attributed graph. In the induced graph, everyinduced node 0 is ancan nodes, andTherefore, the node 6 and 6graphs erroneous correspondence. Therefore, in this graph, has an attribute that is computed by the summation of the similarities of the triangles which include the node induced by erroneous matched node pair (i.e., node 6 in G , which is induced by node pair A such a node. As shown from Figure 7, GS and GT are two matched graphs including six pairs of 0 6 and 6 ) has a small attribute which similarity that comes from other matched nodes, and the node pairmeasures 6 and 6′ is anthe erroneous correspondence. Therefore, in this nodes. induced In a more formal expression, the attribute inducedmatched node inode can be GA , which is induced by graph, the node induced byoferroneous pairexpressed (i.e., node 6 as in follows node pair 6 and 6′) has a small attribute which measures the similarity that comes from other nodes. In a more formal expression,1the attribute of induced node i can be0 expressed as 0 0 follows

sI =

∑ N∑ j,j0 1k,k0 sI =

N

ai,i0 ;j,j0 ;k,k0 ,

 a

i ,i '; j , j '; k , k '

,

i 6= j 6= k, i 6= j 6= k i ≠ j ≠ k, i ' ≠ j ' ≠ k '

(7)

(7)

j , j ' k ,k '

s I is the attribute of node i in G A ; i and i0 , j and j0 , and k and k0 are the matched node where sI is the attribute of node i in GA ; i and i ' , j and j ' , and k and k ' are the matched

where and N is the number of the matched triangles.

pairs;

node pairs; and N is the number of the matched triangles.

Figure 7. The induced graph of two matched graph.

Figure 7. The induced graph of two matched graph. Equation (7) is the basic formula for matching blunder detection. In a practical matching task, the number of the blunders is more than illustrated in Figure 7, and it is very hard to distinguish Equation (7) is the basic formula for matching blunder detection. In a practical matching task, blunders from noise. Thus, we embed Equation (7) in an iterative algorithm: the big blunders are the number of the blunders is the more than in Figure and it is very hard to distinguish eliminated first and then small onesillustrated until the similarities are7, kept constant. The algorithm is illustrated in Algorithm 2. embed Equation (7) in an iterative algorithm: the big blunders are blunders from noise. Thus, we In Algorithm I, J,small K are ones the abbreviations for feature index (j,j’) andThe (k,k’);algorithm p eliminated first and then2,the until the similarities are pairs kept(i,i’), constant. is represents for p-th iteration. c is a constant; it can be calculated by the average similarity of triangles illustratedwhich in Algorithm 2. constructed by two error-free matched feature set (the calculating formula is shown in In Algorithm 2, I,InJ, this K are theitabbreviations feature index pairs (i,i’), (j,j’) and (k,k’); p represents Equation (7)). paper, is empirically setfor to 0.85.

for p-th iteration. c is a constant; it can be calculated by the average similarity of triangles which constructed by two error-free matched feature set (the calculating formula is shown in Equation (7)). In this paper, it is empirically set to 0.85.

Remote Sens. 2018, 10, 1104

11 of 19

Remote Sens. 2018, 10, x FOR PEER REVIEW

11 of 19

Algorithm 2: Algorithm 2: Blunder Blunder detection detection by by the the affinity affinitytensor. tensor Remote Sens. 2018, 10, x FOR PEER REVIEW Input: Affinity tensor A, matched feature set {I, J, K, L…} Output:Algorithm Error-free 2: matching set Blunder detection by the affinity tensor 1 beginInput: Affinity tensor A, matched feature set {I, J, K, L…} 2 doOutput: Error-free matching set 1 begin i = 1...n, i ' = 1...n do 3 for

4 5 6 7 8 9

11 of 19



2 sI do ← A, I ≠ J ≠ K 3 forJ i K= 1...n, i ' = 1...n do

end sI ←  A , I ≠ J ≠ K 4 J K rank {si } in descending order 5

end

remove the node pair corresponding to sn 6 rank {si } in descending order s7n( p +1) ←remove sn( p ) the node pair corresponding to sn p ← p + 1 ( p +1) 8 s ← s( p) n

( p)

n

( p +1)

( p +1)

) > 0.01 or sn < c ) while 9 ( absp(s← n p−+s1 n 11 end 10 while ( abs(sn( p) − sn( p+1) ) > 0.01 or sn( p+1) < c ) 10

11 end

To verify the 2, 2, it it was compared to RANSAC algorithm integrated To verify the effectiveness effectivenessofofAlgorithm Algorithm was compared to RANSAC algorithm integrated with with a homographic transformation (the threshold of re-projected errors was set to 3.0 pixels). The To verify the effectiveness of Algorithm 2, it was compared to RANSAC algorithm integrated a homographic transformation (the threshold of re-projected errors was set to 3.0 pixels). The results results are shown in Figure 8. with a homographic transformation (the threshold of re-projected errors was set to 3.0 pixels). The

are shown in Figure 8.

results are shown in Figure 8.

(a)

(a)

(b)

(b)

Figure 8. A comparison of RANSAC and tensor-based gross error detection: (a) manual selected tie

Figure of RANSAC and and tensor-based gross error (a) manual Figure8.8.AAcomparison comparison of RANSAC tensor-based grossdetection: error detection: (a)selected manualtieselected tie points; and (b) results of the comparison. points; and (b) results of the comparison. points; and (b) results of the comparison.

As can be seen from Figure 8a, 30 pairs of evenly distributed tie points are selected from the sub-

As image can be pairs. seen from Figure 8a, 30 pairs of evenly distributed tie points are selected from the subTo verify the robustness of the two algorithms, from 10 to 100 pairs of randomly Aspairs. can be seen from Figure 8a,of30the pairs evenly distributed points selected from the image To verify the robustness twoof algorithms, 10 totie 100 pairs are of randomly generated tie points are added in the matched feature sets as from the matching blunders. Two parameters sub-image pairs. To verify the robustness of the two algorithms, from 10 to 100 pairs of randomly generated points are added the matched sets as the matching Twowhere parameters aretie used to evaluate thein effectiveness offeature the algorithms. The first oneblunders. is rf (rf = fp/t), fp is the generated pointsthe areeffectiveness added in the feature sets as one the is matching blunders. parameters are used totie evaluate of matched the algorithms. The first rf (rf = fp/t), where fpTwo is the

are used to evaluate the effectiveness of the algorithms. The first one is rf (rf = fp/t), where fp is the number of falsely detected blunders (that is, the detected matching blunders are actually correct matches, the detection algorithm falsely mark them as erroneous matches), t is the total number of

Remote Sens. 2018, 10, 1104

12 of 19

correctly matched points. In this experiment, t is 30. In the ideal case, rf is 0, as a lower rf indicates a better algorithm. The other parameter to measure the usefulness of the algorithms is rc (rc = ct/g), where ct is the number of correctly detected blunders and g is the total number of matching blunders in the matched points; in this experiment, g varies from 10 to 100. By the experimental results, rc of RANSAC and tensor-based algorithms was 1.0, this means both algorithms detected all the gross errors. The results of rf were demonstrated in Figure 8b. Although both algorithms can detect all the blunders, the tensor-based algorithm seems to perform better than RANSAC because the average rf of the former one is only 0.03, which is far below that of the latter. The essence of RANSAC is randomly sampling, the global geometric transform model is fitted by the sampled points, and the rest of the points are treated as verification set. The drawback of RANSAC is that the probability of correct sampling is very low when encountered with considerable mismatches, and finally RANSAC is stuck in a local minimum. However, graph-based detection method avoids these drawbacks, because the node attributes represent other nodes’ support. If the node is an outlier, other nodes’ support will be very small, and thus the attribute indicates whether the node is an outlier or not. 3. Experimental Results The proposed ATBM was evaluated with five pairs of images which were captured by ZY-3 backward (ZY3-BWD), ZY-3 nadir (ZY3-NAD), GF-1, GF-2, unmanned aerial vehicle (UAV) platform, Jilin1 (JL1), SPOT 5, and SPOT6 sensors. To examine its effectiveness, five descriptor-based matching algorithms were chosen as the competitors, namely, SIFT, SURF, NG-SIFT, OR-SIFT, and GOM-SIFT. These five descriptor matching algorithms were state-of-the-art in matching multi-sensor images. The matcher for these algorithms was ANN searching, and the NNDR threshold was set to 0.8. In the following, evaluation criteria, experimental datasets, implementation details, and experimental results are presented. 3.1. Evaluation Criterion and Datasets In order to evaluate the proposed method, two common criteria including recall and precision [10] were used. The two criteria are defined as follows: recall = CM/C and precision = CM/M, where CM (correct matches) was the number of correctly matched point pairs, C (correspondences) was the total number of existing correspondences in the initial feature point sets, and M (matches) was the number of matched points. However, for large images covering mountainous and urban areas, a global transformation such as projective and second-order polynomial models could not accurately express the geometric transformation among such images [18]. Thus, in the estimation of C and CM, 25 pairs of sub-images grabbed from the complete image pairs were used for evaluation. Each of the sub-image pairs was typically characterized by non-linear intensity differences and considerable textural changes (the details of the complete image pairs were listed in Table 1; the sub-image pairs were shown in Figure 9). C and CM were determined as follows: A skilled operator manually selected 10–30 evenly distributed tie points between image pairs, and an accurate homographic transformation was computed using the selected tie points. Then the computed homographic matrix with the back-projective error threshold of 3.0 pixels was used to determine the number of correct matches and correspondences. Apart from recall and precision, positional accuracy was used to evaluate the matching effectiveness of ATBM and other five matching algorithms. The positional accuracy was also used by Ye and San [18] and was computed as follows. First, TIN was constructed using the matched points of the complete image pair, and 13–38 evenly distributed checkpoints (or CPs, for short) were fed to the Delaunay triangulation. Then, an affine transformation model was fitted by the matched triangles. Lastly, positional accuracy was estimated via the root mean square error (RMSE), which was computed through the affine transformation of the CPs.

Remote Sens. 2018, 10, 1104

13 of 19

Remote Sens. 2018, 10, x FOR PEER REVIEW

(a)

(b)

13 of 19

(c)

(d)

(e)

Figure 9. Sub-image pairs for estimating recall, precision, and number of correct matches: (a) SubFigure 9. Sub-image pairsGF-1 for estimating recall, and sensors; number(c)ofZY3-NAD correct matches: (a) Sub-image image pairs from and ZY-3 sensors; (b)precision, GF-2 and ZY-3 and ZY3-BWD UAV andsensors; JL1 sensors; (e) SPOT and STOP 6 sensors. pairs fromsensors; GF-1 (d) and ZY-3 (b)and GF-2 and 5ZY-3 sensors; (c) ZY3-NAD and ZY3-BWD sensors;

(d) UAV and JL1 sensors; and (e) SPOT 5 and STOP 6 sensors.

Table 1. Experimental image pairs.

Acquisition Pixel Size CPs Table 1. Location ExperimentalImage image pairs. Size (Pixels) Date (m/Pixel) Number GF-1 2013 18,192 × 18,000 2.0 Acquisition Pixel Size CPs 1 Xinjiang, China 30 PlatformZY-3 Location Image Size (Pixels) 2.1 2012 24,530 × 24,575 Date (m/Pixel) Number GF-2 2015 9376 × 9136 1.0 2 GF-1 Beijing, China 20132012 18,192 × 18,000 2.1 2.0 32 ZY-3 Xinjiang, China 24,530 × 24,575 30 ZY-3 2012 24,530 × 24,575 2.1 ZY3-BWD 2014 24,525 × 24,419 2.1 3 Wuhan, China 38 GF-2 × 9136 1.0 ZY3-NAD 20152015 16,292 9376 × 16,348 3.5 Beijing, China 32 ZY-3 UAV 20122014 24,530 2.1 4990 × 3908× 24,575 0.5 Changchun, 36 4 China 4765 × 4070× 24,419 1.0 ZY3-BWDJL1 20142014 24,525 2.1 Wuhan, China 38 SPOT5 2009 598 × 551 × 16,348 2.5 ZY3-NAD 2015 16,292 3.5 13 5 Barcelona, Spain SPOT6 2012 792 × 729 1.5

No.

No. 1 2 3 4

Platform

UAV JL1

2014 2014

Changchun, China

3.2. Implementation Details and Experimental Results

5

4990 × 3908 4765 × 4070

0.5 1.0

SPOT5 2009 598 × 551 2.5 In the process of feature extraction, UR-FAST was applied in the source image, and standard Barcelona, Spain SPOT6 2012 792 × 729 1.5

36 13

FAST was used in the target image. It was shown by our experiments that this feature extracting strategy could greatly improve the feature repetitiveness compared to commonly used feature detectors such as Harris BRISK, Results DoG, and ORB. In the construction of the affinity tensors, 3.2. Implementation Details and features, Experimental to guarantee repetitive rate of image feature, it retained 50 UR-FAST features in the source image. In theFor process feature extraction, UR-FAST applied thesearch sourceforimage, standard FAST every of UR-FAST feature, ANN algorithmwas was appliedinto its 4 and candidate for every in the source image, ANN algorithm applied again to search strategy was used correspondences; in the target image. Ittriangle was shown by our experiments thatwas this feature extracting for its 3 candidate matched triangles (by our experiments, the matching recall reached the peak when could greatly improve the feature repetitiveness compared to commonly used feature detectors such as the numbers of candidate matching features and triangles were 4 and 3, respectively). The Gaussian Harris features, BRISK, DoG, ORB.(1)Inwas theset construction the affinitytensor tensors, to guarantee kernel bandwidth ε inand Equation to π / 15 , andof any computed elements that were repetitive / 5retained rate of image feature, 50zero. UR-FAST features in the source image.(1)For UR-FAST feature, higher than π it were set to The balanced weighted factor in Equation wasevery empirically set to 0.01 (the more precise balanced weighted factor could be calculated by Equation (2) if necessary). ANN algorithm was applied to search for its 4 candidate correspondences; for every triangle in the Besides, to trade off computational cost and accuracy, we made 5 trials in power iterations (the power source image, ANN algorithm was applied again to search for its 3 candidate matched triangles (by our iteration was shown in Algorithm 1). experiments, the recall reached peak when numbers matching features The matching complete image pairs listed inthe Table 1 were the the testing datasets of forcandidate estimating positional accuracy. were thousands of feature in thesekernel image pairs. Thus, computing the affinity and triangles wereThere 4 and 3, respectively). Thepoints Gaussian bandwidth ε in Equation (1) was set to

π/15, and any computed tensor elements that were higher than π/5 were set to zero. The balanced weighted factor in Equation (1) was empirically set to 0.01 (the more precise balanced weighted factor could be calculated by Equation (2) if necessary). Besides, to trade off computational cost and accuracy, we made 5 trials in power iterations (the power iteration was shown in Algorithm 1). The complete image pairs listed in Table 1 were the testing datasets for estimating positional accuracy. There were thousands of feature points in these image pairs. Thus, computing the affinity tensors for such image pairs was very time consuming. Meanwhile, storing all the feature descriptors in computer memory was nearly impossible. Therefore, SIFT algorithm (SIFT algorithm was optional;

Remote Sens. 2018, 10, 1104

14 of 19

other algorithms such as SURF and ORB were also sufficient for the coarse matching) was used to obtain some tie points in10,down-sampled Remote Sens. 2018, x FOR PEER REVIEW image pairs. Then, we roughly computed14the of 19homographic transformation between the image pairs. Next, the two images of an image pair were gridded tensors for such image pairs was very time consuming. Meanwhile, storing all the feature descriptors to sub-image inpairs, and every pair would bealgorithm overlapped under the constraint of the computer memory was sub-image nearly impossible. Therefore, SIFT (SIFT algorithm was optional; other algorithms such as SURF and ORB were also sufficient theapplied coarse matching) used to algorithms). homographic transformation (this coarse-to-fine strategy wasfor also by thewas compared some tie points in down-sampled image pairs. Then, we roughly computed the homographic In the end, allobtain the sub-image pairs were matched by ABTM as illustrated in Figure 2. transformation between the image pairs. Next, the two images of an image pair were gridded to subABTM, SIFT, SURF, NG-SIFT, OR-SIFT, GOM-SIFT were also ofevaluated with the testing image pairs, and every sub-image pair wouldand be overlapped under the constraint the homographic transformation (this coarse-to-fine strategy was also applied by the compared algorithms). In the end, data listed in Table 1. All the test image pairs were selected from different sensors and, hence, all the sub-image pairs were matched by ABTM as illustrated in Figure 2. had considerable illumination differences particularly in images with higher resolution. In addition, ABTM, SIFT, SURF, NG-SIFT, OR-SIFT, and GOM-SIFT were also evaluated with the testing these test images hadindifferent relief, so obvious distortions and data listed Table 1. Allscales the test and imagetopographic pairs were selected from different sensorsgeometric and, hence, had considerable illumination differences particularly in images with higher resolution. In addition, these considerable texture changes were present. test images had different scales and topographic relief, so obvious geometric distortions and Figure 10considerable showed the matching results of ABTM on the five test image pairs. It seemed clear that texture changes were present. ABTM obtained favorable matching results though were intensity differences with Figure 10 showed the matching results of ABTMthere on the five test non-linear image pairs. It seemed clear that ABTM obtained favorable matching results though there were non-linear intensity differences with considerable texture changes and somewhat obvious geometric distortions. considerable texture changes and somewhat obvious geometric distortions.

(a)

(f)

(b)

(g)

(c)

(h)

Remote Sens. 2018, 10, x FOR PEER REVIEW

(d)

(i)

(e)

(j)

15 of 19

Figure 10. Matching results of ATBM. (a–e) showed the matching results of GF-1 and ZY-3, GF-2 and

Figure 10. Matching results ATBM.UAV (a–e) the 5matching resultsand of(f–j) GF-1 andthe ZY-3, GF-2 and ZY-3, ZY3-BWD andof ZY3-NAD, andshowed JL1, and SPOT and SPOT 6 sensors, showed matching of the sub-images, which was marked by red rectangles in (a–e). ZY-3, ZY3-BWD and details ZY3-NAD, UAV and JL1, and SPOT 5 and SPOT 6 sensors, and (f–j) showed the matching details of the sub-images, which was marked by red rectangles in (a–e). As shown in Figure 10, ATBM obtained abundant and evenly distributed matches in all of the experimental images. The first pair of images was from Xinjiang, China and captured by GF-1 and ZY-3 sensors; therefore, the illumination differences were obvious as evidenced by the matching details. Besides, there were also changes in scale caused by the differences of ground sampling distance (GSD) of sensors and the vast topographic relief caused by the Chi-lien Mountains. These two factors resulted in locally non-linear intensity differences and geometric distortions. Nevertheless, ATBM obtained evenly distributed tie points in the sub-image pairs (as shown in Figure 10f). However, in the middle of the first image pair (Figure 10a), there was no tie point. This

Remote Sens. 2018, 10, 1104

15 of 19

As shown in Figure 10, ATBM obtained abundant and evenly distributed matches in all of the experimental images. The first pair of images was from Xinjiang, China and captured by GF-1 and ZY-3 sensors; therefore, the illumination differences were obvious as evidenced by the matching details. Besides, there were also changes in scale caused by the differences of ground sampling distance (GSD) of sensors and the vast topographic relief caused by the Chi-lien Mountains. These two factors resulted in locally non-linear intensity differences and geometric distortions. Nevertheless, ATBM obtained evenly distributed tie points in the sub-image pairs (as shown in Figure 10f). However, in the middle of the first image pair (Figure 10a), there was no tie point. This is because the two images were captured at different moments, and the image textures changed dramatically. Thus, there was hardly any feature repetition. Although feature correspondences by geometric constraints are found in the initial matching process, the tensor-based detection algorithm eliminated all the false positive matches in the gross error detection process of ATBM. The second image pair consisted of two images from the GF-2 and ZY-3 sensors. The two images were captured at different moments which spanned over three years; therefore, considerable textural changes were presented as a result of the rapid development of modern cities in China (as shown Figure 10g). The tie points of the second sub-image pair were clustered in unchanged image patches because ATBM detected feature correspondences using radiometric and geometric constraints. It could find feature matches in the unchanged textures which were surrounded by numerous changed textures. Therefore, the tie points of the complete image pair (as shown in Figure 10b) distributed well. The third pair of images was captured by the ZY3-BWD and ZY3-NAD sensors. The two images were taken in different seasons—namely early spring and midsummer. The different seasons led to much severer intensity changes compared to the first two image pairs. For example, it seemed clear that in Figure 10h, some of the pixel intensity was reversed. However, ATBM was immune to intensity changes and obtained stable matching results in this image pair as well. The fourth pair of images was shot by UAV and JL1 sensors (the camera equipped on UAV platform was Canon EOS 5D Mark III), and it could be seen that JL1 image was blurrier than UAV image. The transmission distance between the sensed target and satellite sensors was much longer than that of UAV sensors and their targets, the reflected light lost more in the transmission, thus the UAV images looked sharper than satellite images. Besides, as in the second image pair, the two images were captured at different seasons. JL1 image was covered by thick snow, while the UAV image was covered by growing trees (as shown Figure 10i). However, both these two challenges had few impacts on the matching results, and ATBM still obtained abundant tie points. The fifth pair of images was from an urban area and the image resolution was high (2.5 m for SPOT5 and 1.5 m for SPOT6), and the two images were characteristic with structured textures (as shown in Figure 10j). These textures benefitted FAST corner detector, and thus the matching result was satisfied too. Figure 11 gave a quantitative description of ATBM and a comparison of the five feature descriptors. ATBM outperformed the other five matching algorithms in all the evaluation criterion including matching recall, matching precision, number of correct matches, and positional accuracy. ATBM outperformed the other five algorithms in all the test images which contained locally non-linear intensity differences, geometric distortions, and numerous textural changes. The better performance owed to the higher matching recall (higher than 0.5 in matching recall for ATBM as shown in Figure 11a), which resulted in more correct matches between image pairs. In addition, ATBM used the affinity tensor-based gross error detection algorithm which could distinguish true matches among a number of wrong ones. Thus, the matching precision and the number of correct matches was higher too (the average number of correct matches for ATBM was approximately 20 for sub-image pairs). Higher positional accuracy (an average of 2.5 pixels for ATBM) was consequent because positional accuracy was mainly determined by the number of correct matches and matching precision.

Remote Sens. 2018, 10, x FOR PEER REVIEW

16 of 19

Figure 11 gave a quantitative description of ATBM and a comparison of the five feature Remotedescriptors. Sens. 2018, 10,ATBM 1104 outperformed the other five matching algorithms in all the evaluation criterion 16 of 19

including matching recall, matching precision, number of correct matches, and positional accuracy.

(a)

(b)

(c)

(d)

Figure 11. Quantitative matching results of test images: (a) Matching recall; (b) number of correct

Figure 11. Quantitative matching results of test images: (a) Matching recall; (b) number of correct matches; (c) matching precision; and (d) positional accuracy. matches; (c) matching precision; and (d) positional accuracy.

ATBM outperformed the other five algorithms in all the test images which contained locally non-linear intensity differences, geometric distortions, and numerous textural changes. Theradiometry. better ATBM tended to match feature points with the similarities in both geometry and owed to the higher matching recall (higher than 0.5 in in matching recall for ATBMfine as and Both performance geometric and radiometric information played the same role matching texturally shown in Figure 11a), which resulted in more correct matches between image pairs. In addition, well-structured images (i.e., images with small geometric and radiometric distortions). The higher ATBM used the affinity tensor-based gross error detection algorithm which could distinguish true matching recall benefitted from the affinity tensor which encoded geometric and radiometric matches among a number of wrong ones. Thus, the matching precision and the number of correct information. Geometry described the intrinsic relations of feature points. Meanwhile radiometry matches was higher too (the average number of correct matches for ATBM was approximately 20 for represented local appearances. When geometric or radiometric distortions were considerable, sub-image pairs). Higher positional accuracy (an average of 2.5 pixels for ATBM) was consequent ATBM could make theaccuracy two constraints compensate each other.ofFor example, in and sub-image pair 1, because positional was mainly determinedfor by the number correct matches matching the geometric precision. distortions were relatively large, and thus, radiometric information played a leading role in matching. Whiletoinmatch sub-image 3, the in relatively geometric ATBM tended featurepair points withflat thefarmlands similaritiesresulted in both geometry and small radiometry. Both geometric radiometriccompensated information played same role in matching texturally fine and distortions, and suchand information for thethe radiometric information in matching. well-structured images (i.e., images with small geometric radiometric Theradiometric higher In general, descriptor-based matching algorithms are and determined viadistortions). NNDR of the matching recall benefitted from the affinity tensor which encoded geometric and radiometric descriptors. When encountered with non-linear intensity differences and changed textures, information. Geometry described the intrinsic relations of feature points. Meanwhile radiometry the drawback of the NNDR strategy in matching multi-sensor images was exposed: descriptor distance represented local appearances. When geometric or radiometric distortions were considerable, ATBM of two matched features was insufficiently small. OR-SIFT and GOM-SIFT modified the descriptor to could make the two constraints compensate for each other. For example, in sub-image pair 1, the adaptgeometric the textures in which therelatively radiometric intensity differences were reversed,played while athese image distortions were large, and thus, radiometric information leading rolepairs were in notmatching. such theWhile case because the intensity differences wereresulted non-linear. NG-SIFT used normalized in sub-image pair 3, the flat farmlands in relatively small geometric descriptors and abandoned gradient magnitude, it had a certain of capability to resist non-linear distortions, and such information compensatedthough for the radiometric information in matching. intensity differences, it had problems in descriptor distinctiveness. the descriptor-based In general, descriptor-based matching algorithms are determinedTherefore, via NNDR of the radiometric descriptors. When encountered with non-linear differences and changed matching methods such as SIFT, SUFR, OR-SIFT,intensity GOM-SIFT and NG-SIFT had textures, relativelythelower drawback NNDR strategy inimage matching multi-sensor images was exposed: descriptor distance matching recallofinthe the experimental pairs. ATBM avoided NNDR rule and tended to find the of two matched features was insufficiently small. OR-SIFT and GOM-SIFT modified the descriptor to from best matches in geometry and radiometry, and thus leading to higher matching recall. Benefitting adapt the textures in which the radiometric intensity differences were reversed, while these image higher matching recall, there were more correct matches for ATMB in the sub-image pairs (shown in Figure 11c), and it could also be concluded that there would be more correct matches in the complete image pairs (evidences were shown in Figure 10a–e). With more correct matches, the positional accuracy was higher, too (as shown in Figure 11d). The experimental results of the five descriptor-based matching algorithms were strongly dependent on the image contents: GOM-SIFT obtained the most correct matches and highest matching recall (Figure 11a,b), SIFT had the best matching precision (Figure 11c). In most cases, OR-SIFT and

Remote Sens. 2018, 10, 1104

17 of 19

GOM-SIFT had similar matching results, because OR-SIFT used the same manner as GOM-SIFT to build feature descriptors; the differences between them were that OR-SIFT had the half-descriptor dimensions of GOM-SIFT, thus OR-SIFT had efficiency advantage in ANN searching. SIFT had the best matching precision in the experimental data; this is because SIFT descriptor had higher distinctiveness than another four feature descriptors. NG-SIFT and SURF had no obvious regularity in the testing data, though both of them were characteristic with scale, rotation, and partly illumination change invariance, which were also the capabilities of SIFT, OR-SIFT, and GOM-SIFT. ATBM outperformed other five matching algorithms again in positional accuracy (shown in Figure 11d). Positional accuracy was mainly determined by the matching precision, number of correct matches and their spatial distribution. In most cases, more numbers of correct matches (as we can see from Figure 11b, ATBM had more numbers of correct) meant a better spatial distribution, and the matching precision of ATBM was higher than other five algorithms too (shown in Figure 11c). Therefore, ATBM had higher positional accuracy than other five descriptor matching algorithms. 4. Conclusions and Future Work Image matching is a fundamental step in remote sensing image registration, aerial triangulation, and object detection. Although it has been well-addressed by feature-based matching algorithms, it remains a challenge to match multi-sensor images because of non-linear intensity differences, geometric distortions and textural changes. Conventional feature-based algorithms have been prone to failure because they only use radiometric information. This study presented a novel matching algorithm that integrated geometric and radiometric information into an affinity tensor and utilized such a tensor to address matching blunders. The proposed method involved three steps: graph building, tensor-based matching, and tensor-based detection. In graph building, the UR-FAST and ANN searching algorithms were applied, and the extracted image features were regarded as graph nodes. Then the affinity tensor was built with its elements representing the similarities of nodes and triangles. The initial matching results were obtained by tensor power iteration. Finally, the affinity tensor was used again to eliminate matching errors. The proposed method had been evaluated using five pairs of multi-sensor remote sensing images covered with six different sensors: ZY3-BWD, ZY3-NAD, GF-1 and GF-2, JL-1 and UAV platform. Compared with traditionally used feature matching descriptors such as SIFT, SURF, NG-SIFT, OR-SIFT and GOM-SIFT, the proposed ATBM could achieve reliable matching results in terms of matching recall, precision correct matches, and positional accuracy of the experimental data. Because the tensor-based model is a generalized matching model, the proposed method can also be applied in point cloud registration and multi-source data fusion. However, a few problems should be addressed in future research. The bottleneck of the computing efficiency is ANN algorithm when using in the searching for n most similar triangles. For example, if a feature set consists of 50 image features, these features constitute 503 triangles, then the KD-tree will have 503 nodes, and lead to a very time-consuming result for construction of KD-tree. In addition, the efficiency of searching for n most similar triangles in a 503 nodes KD-tree is very low in essence. The computational time of ATBM is higher than that of SURF, SIFT because the considerable size of the affinity tensor increased computing operations in the power iterations. Further research can introduce new strategies to make the tensor sparser and reduce computational complexity in power iterations. Besides, power iterations also could be implemented in a GPU-based parallel computing framework which could immensely speed the power iterations. Author Contributions: W.Y. advised on the data analysis provided remote sensing experience and gave the whole structure of the idea. S.C. developed the algorithm, conducted the primary data analysis, and crafted the manuscript. X.Y. significantly edited the manuscript. J.N., F.X. and Y.Z. revised the manuscript. Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 41371432, Grant No. 41671405, and Grant No. 41771438).

Remote Sens. 2018, 10, 1104

18 of 19

Acknowledgments: We would like to thank Yifei Kang, Ying Guo, and Cailong Deng for their help in image pre-processing. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

21.

22.

Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference (AVC), Manchester, UK, 1–6 September 1988. Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. Canny, J.F. Finding Edges and Lines in Images; Technical Report AI-TR-720; MIT, Artificial Intelligence Laboratory: Cambridge, MA, USA, 1983. Forssén, P.E. Maximally stable colour regions for recognition and matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007. Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006. Ke, Y.; Sukthankar, R. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004. Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [CrossRef] [PubMed] Shechtman, E.; Irani, M. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007. Sedaghat, A.; Ebadi, H. Distinctive Order Based Self-Similarity descriptor for multi-sensor remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2015, 108, 62–71. [CrossRef] Wang, Z.; Fan, B.; Wu, F. Local intensity order pattern for feature description. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. Fan, B.; Wu, F.; Hu, Z. Rotationally invariant descriptors using intensity order pooling. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2031–2045. [CrossRef] [PubMed] Wang, Z.; Wu, F.; Hu, Z. MSLD: A robust descriptor for line matching. Pattern Recognit. 2009, 42, 941–953. [CrossRef] Kim, H.; Lee, S. Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs. Pattern Recognit. Lett. 2012, 33, 1349–1363. [CrossRef] Forssén, P.E.; Lowe, D.G. Shape descriptors for maximally stable extremal regions. In Proceedings of the International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 14–20 October 2007. Ye, Y.; Shan, J. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 2014, 90, 83–95. [CrossRef] Yi, Z.; Zhiguo, C.; Yang, X. Multi-spectral remote image registration based on SIFT. Electron. Lett. 2008, 44, 107–108. [CrossRef] Mehmet, F.; Yardimci, Y.; Temlzel, A. Registration of multispectral satellite images with orientation-restricted SIFT. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cape Town, South Africa, 12–17 July 2009. Hasan, M.; Jia, X.; Robles-Kelly, A.; Zhou, J.; Pickering, M.R. Multi-spectral remote sensing image registration via spatial relationship analysis on sift keypoints. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Honolulu, HI, USA, 25–30 July 2010. Saleem, S.; Sablatnig, R. A robust sift descriptor for multispectral images. IEEE Signal Process. Lett. 2014, 21, 400–403. [CrossRef]

Remote Sens. 2018, 10, 1104

23.

24.

25. 26.

27.

28. 29. 30. 31. 32. 33. 34.

35. 36. 37. 38. 39. 40.

19 of 19

Kim, S.; Min, D.; Ham, B.; Ryu, S.; Do, M.N.; Sohn, K. Dasc: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. Kim, S.; Ryu, S.; Ham, B.; Kim, J.; Sohn, K. Local self-similarity frequency descriptor for multispectral feature matching. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014. Liu, Y.; Mo, F.; Tao, P. Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach. Remote Sens. 2017, 9, 1249. [CrossRef] Leordeanu, M.; Hebert, M. A Spectral Technique for Correspondence Problems Using Pairwise Constraints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China, 17–20 October 2005. Zhu, Y.; Hu, W.; Zhou, J.; Duan, F.; Sun, J.; Jiang, L. A New Starry Images Matching Method in Dim and Small Space Target Detection. In Proceedings of the International Conference on Image and Graphics (ICIG), Xi’an, China, 20–23 September 2009. Duchenne, O.; Bach, F.; Kweon, I.-S.; Ponce, J. A tensor-based algorithm for high-order graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2383–2395. [CrossRef] [PubMed] Zhang, Y.; Jiang, F.; Rho, S.; Liu, S.; Zhao, D.; Ji, R. 3D object retrieval with multi-feature collaboration and bipartite graph matching. Neurocomputing 2016, 195, 40–49. [CrossRef] Wang, A.; Li, S.; Zeng, L. Multiple order graph matching. In Proceedings of the Asian Conference on Computer Vision (ACCV), Queenstown, New Zealand, 8–12 November 2010. Chen, S.; Yuan, X.; Yuan, W.; Yang, C. Poor textural image matching based on graph theory. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 741. [CrossRef] West, D.B. Introduction to Graph Theory; Prentice Hall: Upper Saddle River, NJ, USA, 2001. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [CrossRef] Shi, X.; Ling, H.; Hu, W.; Xing, J.; Zhang, Y. Tensor power iteration for multi-graph matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. Sedaghat, A.; Mokhtarzade, M.; Ebadi, H. Uniform robust scale-invariant feature matching for optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4516–4527. [CrossRef] ANN: A Library for Approximate Nearest Neighbor Searching. Available online: http://www.cs.umd.edu/ ~mount/ANN/ (accessed on 6 April 2018). Chertok, M.; Keller, Y. Efficient high order matching. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2205–2215. [CrossRef] [PubMed] Lipschutz, S.; Lipson, M. Schaum’s Outline of Linear Algebra, 4th ed.; McGraw-Hill Professional: New York, NY, USA, 2008. Edmonds, J. Matroids and the greedy algorithm. Math. Program. 1971, 1, 127–136. [CrossRef] Jonker, R.; Volgenant, T. Improving the Hungarian assignment algorithm. Oper. Res. Lett. 1971, 5, 171–175. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).