The Effect of Prior Probabilities in the Maximum Likelihood ... - asprs

The Effect of Prior Probabilities in the Maximum Likelihood Classification on Individual Classes: A Theoretical Reasoning and Empirical Testing Zheng Mingguo, Cai Qianguo, and Qin Mingzhou

Abstract The effect of prior probabilities in the maximum likelihood classification on individual classes receives little attention, and this is addressed in this paper. Prior probabilities are designed only for overlapping spectral signatures. Accordingly, their effect on an individual class is independent of the classes that are spectrally separable from this class. The theoretical reasoning reveals that an increased prior probability, which shifts the decision boundary away from the class mean, will increase the assignment and boost the producer’s accuracy as compared to the use of equal priors; though the change of the user’s accuracy is not constant, it is expected to decrease in most cases. The tendency is just the opposite when a lower prior probability is used. A case study was conducted using Landsat TM data provided along with ERDAS Imagine® software. Two other pieces of evidence derived from the published literature are also presented.

Introduction The maximum likelihood classification (MLC) is the most widely used method of classifying remotely sensed data (Maselli et al., 1992). In standard digital image processing, MLC has been considered to be the most advanced classification strategy for a long time (Maselli et al., 1992). One of the superior properties of the MLC algorithm is that it can make use of the prior probabilities derived from ancillary information concerning the area to be classified, so that remotely sensed data can be integrated with data collected conventionally (Maselli et al., 1990). By helping to resolve confusion among classes that are poorly separable (McIver and Friedl, 2002), prior probabilities can be a powerful and effective aid to improve classification accuracy (Strahler, 1980). It is desirable to obtain a reliable prior probability for each class and use it to at least classify the pixels likely to be

Zheng Mingguo and Cai Qianguo are with the Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences & Natural Resources Research, Chinese Academic of Sciences, Beijing 100101, China ([email protected]). Qin Mingzhou is with the College of Environment and Planning, Henan University, Kaifeng 475001, China. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

misclassified (Ediriwickrema and Khorram, 1997). However, it is a common practice to perform the MLC with equal prior probabilities, as reliable prior probabilities are not always available. Up until now, a large number of studies have been carried out on the incorporation of prior probabilities into MLC (e.g., Maselli et al., 1990; Maselli et al., 1992; Strahler, 1980; Gorte and Stein, 1998; Pedroni, 2003). These studies are generally dedicated to an improvement in the general performance of the classifier over the entire image under examination and frequently culminate in reporting an improved overall accuracy or Kappa coefficient. However, overall accuracy or Kappa coefficient represents an average accuracy for all classes, and provides little information about individual classes. As compared to the use of equal priors, a higher overall classification accuracy is expected if prior probabilities are properly estimated. However, this does not mean a synchronous improvement in the classification accuracy for all individual classes. In many cases, though an increased Kappa coefficient or overall accuracy can be achieved by inserting prior probabilities into MLC, producer’s accuracy or user’s accuracy for individual classes decrease for some classes while they increase for others. Obviously, not all classes in a case are of interest; sometimes the purpose of a classification is simply to obtain information about one or several individual classes. Therefore, the effect of prior probabilities on individual classes deserves some attention. The objective of this paper is to understand how prior probabilities affect the MLC discrimination process and particularly to investigate the effect of prior probabilities on individual classes. One section of the present paper is dedicated to the discussion of the effect of prior probabilities on the MLC discrimination process. Further, a mathematical reasoning is carried out and some general rules are established on the variation in the classification accuracy of individual classes after the incorporation of prior probabilities into MLC. Finally, a case study is conducted to test these rules using Landsat TM data provided along with ERDAS Imagine® software.

Photogrammetric Engineering & Remote Sensing Vol. 75, No. 9, September 2009, pp. 1109–1117. 0099-1112/09/7509–1109/$3.00/0 © 2009 American Society for Photogrammetry and Remote Sensing S e p t e m b e r 2 0 0 9 1109

Maximum Likelihood Classification MLC Assuming Equal Prior Probabilities (MLCEP) The maximum likelihood decision rule is based on a normalized (Gaussian) estimate of the probability density function of each class. The discriminant function for MLC without considering any prior probabilities or assuming equal prior probabilities can be expressed as (Strahler, 1980): g1i (X ) ⫽ P(X 冷 vi)

(1)

where vi stands for class i, g1i(X ) stands for the discriminant function for vi, X is a pixel vector, and P(X ƒvi) is the probability density function for pixel vector X to be a member of vi. For classification of a pixel vector X using MLC, g1i(X ) must be calculated for each class; X is then assigned to the class for which g1i(X ) is the largest. This process takes into account not only the marginal properties of the data sets but also their internal relationships. This is one of the reasons for the great robustness of the process and for its relative insensitivity to distribution anomalies (Maselli et al., 1992). However, this process assigns a specific pixel to a class based only on the similarities between the pixel vector and predefined prototypes. This leads to a suboptimal performance of the MLC, especially in the presence of a large spectral overlap among the classes being considered. This will be detailed below. MLC using Actual Prior Probabilities (MLCPP) The maximum likelihood decision rule can be modified easily to take into account prior probabilities, which are simply the expected area proportions of the classes in a particular scene or stratum (Pedroni, 2003). These prior probabilities are sometimes termed “weights” since the modified classification rule will tend to weigh more heavily those classes with higher prior probabilities (Strahler, 1980). P(vi), the prior probability of vi, can be inserted into a maximum likelihood process by modifying the discriminant function in the following way (Strahler, 1980): g 2i (X ) ⫽ P(X 冷 vi) P(vi).

(2)

When prior probabilities are assumed equal or removed, Equation 2 is identical to Equation 1. g2i(X ) is directly related to the a posteriori probability, which is a relative probability that a pixel belongs to a given class (Strahler, 1980; Huang and Ridd, 2002). In this way, pixels are allocated to their most likely class of membership. When prior probabilities are used in the maximum likelihood decision rule, even the class mean vector may be classified as a different class. The more extreme the values of the prior probabilities, the less important are the actual observation values (Strahler, 1980). Strahler (1980) and Pedroni (2003) hold that prior probabilities tend to produce larger volumes for classes that are expected to be large and smaller volumes for classes expected to be small. However, the following analysis does not absolutely agree with this. For example, even for the class with the next largest anticipated size, the number of pixels assigned to it may still decrease, if it happens to have large spectral confusion with the class with the largest anticipated size.

Theoretical Analysis of the Effect of Prior Probabilities on the Maximum Likelihood Classification The Effect of Prior Probabilities on the Discrimination Process Figure 1 illustrates the effects of prior probabilities in onedimensional feature space. Class v1 and vj are two classes considered during the classification procedure and in this 1110 S e p e t e m b e r 2 0 0 9

Figure 1. An example illustrating the effects of prior probabilities in one-dimensional feature space (modified from Zheng et al. (2005)). Classes v1 and vj are two classes considered during the classification procedure. P(X ƒ v1) and P(X ƒvj) are their probability density functions. X, Xa, Xt, and Xe are one-dimensional feature vectors in this case. Points z and y are points at curves of P(X ƒv1) and P(X ƒvj), respectively. Line XeXe represents the decision boundary when prior probabilities are assumed equal; line Xa Xa represents the decision boundary when the decision boundary shifts away from the mean of class vj; line XtXt represents the decision boundary when the decision boundary shifts towards the mean of class vj. The value of P(X ƒvj) at point a is extremely small so point “a” can be considered as the lower limit of the distribution of class vj in this feature (for instance, spectral band); also, point “b” can be considered as the upper limit of the distribution of class v1. Thus, the spectral overlap between these two classes can be considered to occur roughly within the interval (a, b). A more detailed explanation can be found in the text.

section it is assumed that class vj is only significantly mixed spectrally with class v1. It should be noted that P(X ƒ v1) and P(X ƒvj) are actually discrete, and they are plotted in the style of a continuous function simply for a clear presentation; this amounts to the assumption of an infinitesimal radiometric resolution. P(X ƒ v1) is equal to P(X ƒ vj) at Xe; the decision boundary is therefore located at X e X e when MLCEP is performed. It is assumed that xz ⫽ 2xy; this implies that the probability that vector X occurs in v1 is twice as large as the probability in vj. As a result, pixel vector X is assigned to v1. However, this decision cannot lead to an optimal result. For example, suppose that the expected area proportions of v1 and v j are 10 percent and 30 percent, respectively. In this case, the number of pixels in v j is expected to be 1.5 times the number in v1 at vector X; v j is therefore favored for vector X over v1. Suppose that the expected area proportions above of v 1 and v j are assigned to their prior probabilities. In addition, simply for a clear presentation in Figure 1, we expand discriminant functions of the MLCPP for all classes by a factor of ten, which obviously has no effect on the discrimination process and classification results. Now the discriminant functions for v1 and vj are P(X ƒ v1) and 3P(X ƒ vj), respectively. Consequently, as shown in Figure 1, the decision boundary will shift away from the mean of class v j from PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 2. A virtual image illustrating the effects of prior probabilities. The area to the left of line aa⬘ is attributed to Class A and the area to the right of line bb⬘ is attributed to Class B. The uncertainty occurs in the zone aa⬘ b⬘b. Points a1 and b1 are isolated pixels.

it is assumed that more than one class is spectrally confused with v j and only one of them, namely v 1, is plotted in Figure 1. For most of the individual classes, there are expected to be two decision boundaries in the one-dimensional case. To decrease the complexity of the subsequent reasoning, the right boundary of class v j in Figure 1 is assumed to be fixed and located at positive infinity along the X axis. Accordingly, the decision boundary of class v j in the discussion below simply refers to its left decision boundary. Actually, the conclusions remain the same wherever the right boundary of class v j is assumed to be located. It is also assumed that when MLCEP is performed, the decision boundary of class v j is located at X e Xe as shown in Figure 1. The resultant user’s accuracy (UA(j)) and producer’s accuracy (PA(j)) for v j are: Na jPj

n

UA(j) ⫽ X jj / g X ji ⫽ i⫽1

X eX e to X aX a. This implies that to the right of line X aX a, class v j is expected to possess more pixels than class v1 at any pixel vector; and to the left of line X aX a, class v 1 is expected to possess more pixels than class v j at any vector. Actually, the right-hand term of Equation 2 will simply be the expected number of pixels in one class at the examined vector if multiplied by the total number of pixels in the image. Pixel vector X is then assigned to v j, the MLCPP procedure is therefore expected to produce more correct assignments than the MLCEP (Zheng et al., 2005). In Figure 1, the spectral overlap between class v 1 and v j can be considered to occur roughly within the interval (a, b). Hence, it is reasonable to assume that the decision boundary will not go beyond the interval (a, b) unless an extreme prior probability is assigned. Accordingly, in Figure 2, though any modification in prior probabilities will always be accompanied by a change of the border between class A and B on the classified image derived from MLCPP, the border will generally not be outside the zone aa⬘ b⬘b. For the same reason, the assigned class labels for point a1 and b1 in Figure 2 remain unchanged regardless of the assigned values of the prior probabilities, unless they have an overlapping spectral signal as in the zone aa⬘ b⬘b. Therefore, it can be concluded that prior probabilities in MLC are designed only for overlapping spectral signatures and have no effect where the spectral signatures are widely separated (Pedroni, 2003). As a result, the effect of prior probabilities on one individual class is independent of the classes that are spectrally separable from it and is determined by both its assigned prior probability (anticipated size) and the prior probabilities assigned to the other classes having overlapping signatures with it. The Effect of the Shift of the Decision Boundary on the Classification Result Producer’s accuracy and user’s accuracy are often used to measure the classification accuracy for an individual class. Producer’s accuracy indicates the probability of a reference pixel being correctly classified and is a measure of omission error. User’s accuracy is indicative of the probability that a pixel classified on the map actually represents that class on the ground and is a measure of commission error (Congalton, 1991). Clearly, the effect of prior probabilities is achieved by shifting the decision boundary in the feature space, as will be discussed below in terms of two sub-cases, i.e., shifting away from the class mean and shifting towards the class mean. Class vj in Figure 1 will be used to illustrate the effect in the following discussion. However, in this section, PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

n

g Nai Pi

⫽

i⫽1

Pj n

ai Pi g i⫽1 a j

PA(j) ⫽ X jj /Na j ⫽ Pj

(3a)

(3b)

where N is the total number of pixels in the image to be classified and “n” is the total number of classes considered; ai is the relative area proportion of class vi; Xji represents the number of pixels classified as v j that truly belong to vi. Pi represents the area under the probability density curve of class vi in the interval (Xe, ⫹⬁ ), i.e., Pi ⫽ 1 ⫹q P(X|vi )dx . Xe Also, we define Pia to represent the area under the probability density curve of class vi in the interval (Xa, Xe), Xe i.e., Pi a ⫽ 1Xa P(X|vi )dx . When the decision boundary for vj shifts from line XeXe to XaXa (away from its mean), the user’s accuracy and producer’s accuracy for class vj become: UA a(j) ⫽

Pj ⫹ Pj a n

g i⫽1

(4a)

ai (P ⫹ Pi a) aj i

PA a(j) ⫽ Pj ⫹ Pja.

(4b)

With the shift of the decision boundary away from the class mean, class vj will receive increasing assignments, including increasing correct assignments and increasing incorrect assignments. This will lead to a larger area and a higher producer’s accuracy in the classification result. Equations 3b and 4b reveal that the producer’s accuracy of class vj is improved by Pja when the decision boundary shifts from line X eX e to X aX a. Unlike the change of the producer’s accuracy, however, since both numerator and denominator in Equation 3a increase, the change of the user’s accuracy is not constant, i.e., only when n

n

i⫽1

i⫽1

Pj g ai Pia 7 Pja g ai Pi , is UA(j) ⬎ UAa(j); otherwise UA(j) ⬍ UAa(j). Clearly, class vj generally presents a higher occurrence probability in the interval (Xe, ⫹ ⬁ ) than in the Na jPj NajPja interval (Xa, Xe), i.e., n (NaiPi represents 7 n g Nai Pi g Nai Pi a i⫽1

i⫽1

the number of pixels contained in class vi in the interval (Xe,⫹⬁ ), and Nai Pia represents the number of pixels contained in class vi in the interval (Xa, Xe)). Consequently, though the change of the user’s accuracy is data-dependent, the user’s accuracy is expected to be decreased in most cases when the decision boundary shifts away from the class mean. S e p t e m b e r 2 0 0 9 1111

Especially when only one class, e.g., class v1 in Figure 1, spectrally overlaps class vj, its user’s accuracy will surely decrease in response to the shift of the decision boundary from line XeXe to XaXa. That is simply because Pj/P1 is larger than 1 and Pja/P1a is smaller than 1 in this case. When the decision boundary shifts towards the mean of class vj from XeXe to X tXt (Figure 1), the user’s accuracy and producer’s accuracy are: UA t (j) ⫽

Pj ⫺ Pjt n

g i⫽1

(5a)

ai (P ⫺ Pit) aj i

PA t(j) ⫽ Pj ⫺ Pjt

(5b)

Xt

where Pi t ⫽ 1 Xe P(X|vi)dx. Obviously, with the shift of the decision boundary towards the class mean, class vj will receive decreasing assignments, including decreasing correct assignments and decreasing incorrect assignments. This will lead to a smaller area and a lower producer’s accuracy in the classification result. Equations 3b and 5b reveal that the producer’s accuracy of class vj is decreased by Pjt in this case. The change in the user’s accuracy is also data-dependent as both numerator and denominator in Equation 3a become smaller n

n

i⫽1

i⫽1

simultaneously, i.e., only when Pj g ai Pit 7 Pjt g ai Pi , is UA(j) ⬍ UAt(j); otherwise UA(j) ⬎ UAt(j). In the interval (Xe, ⫹⬁ ), P(X ƒvj) is the greatest among all classes. In the neighborhood of point Xe, however, at least one class, for example class v1 in Figure 1, has more or less the same probability density as vj; in the interval (Xe, ⫹⬁ ), P(X ƒ vj) is significantly larger than the probability density of all other classes only in the region somewhat far away from point Xe. This implies that in the interval (Xe, ⫹⬁ ), the probability that class vj occurs in the neighborhood of point Xe is not so large as that in the region somewhat far away from point Xe. Hence, class vj generally possesses a higher probability of occurring in the interval (Xe, ⫹⬁ ) than in the interval (Xe, Xt), i.e., NajPj NajPjt . Consequently, in most cases, a higher 7 n n g Nai Pi g Nai Pi t i⫽1

i⫽1

user’s accuracy is expected when the decision boundary shifts towards the class mean. The Effect of Prior Probabilities on the Classification Result As previously discussed, the effect of prior probabilities on an individual class depends on both its anticipated size and the anticipated sizes of the other classes spectrally mixed with it. Under the condition that prior probabilities are acceptably estimated or at least the estimated prior probabilities can indicate adequately the rank order of the magnitudes of class sizes, if one class is mostly spectrally mixed with classes having a smaller size (class MMWSS), the decision boundary will shift away from its mean when MLCPP is performed; on the contrary, the decision boundary will shift towards the class mean if one class is mostly spectrally mixed with classes having a larger size (class MMWLS). Consequently, for class MMWSS, MLCPP will assign more pixels to it, accompanied by a higher producer’s accuracy and in most cases a lower user’s accuracy compared to MLCEP; and for class MMWLS, MLCPP will assign fewer pixels to it, accompanied by a lower producer’s accuracy and in most cases a higher user’s accuracy. In practice, the true values of prior probabilities are often unknown. Based on the analysis above, it is known 1112 S e p e t e m b e r 2 0 0 9

that the overestimation of a prior probability for a class will boost the producer’s accuracy and in most cases reduce the user’s accuracy; and the underestimation will reduce the producer’s accuracy and in most cases boost the user’s accuracy, provided that prior probabilities of the other classes, or at least the classes spectrally mixed with it, remain constant. Only when the true values of prior probabilities are used, can a global optimization be achieved. This can also be illustrated using the two-class cases in Figure 1. Suppose that the decision boundary between class v1 and vj is at line XaXa when the true values of prior probabilities are used. When the decision boundary shifts left from the mean of class vj, though the assignment to class vj increases and the assignment to class v1 decreases all the time, the cases are different on the two sides of line XaXa. To the right of line XaXa, for every shift in the decision boundary, most of the increased assignments to vj are attributed to the correct assignments and most of the decreased assignments to v1 are attributed to the incorrect assignments; as a result, the correct assignments increase all the time in the global sense. However, once the decision boundary goes to the left side of line XaXa, the increased incorrect assignments outnumber the increased correct assignments for class vj and the decreased correct assignments also outnumber the decreased incorrect assignments for class v1; the correct assignments will therefore be decreased in the global sense. Consequently, the maximum correct assignments, and thus the global optimization, are expected when the decision boundary is just located at line XaXa.

Experiment The “lanier.img” file provided along with ERDAS Imagine® software was selected to test the validity of the established rules. A reference classification map, the “inlandc.img” file, is also provided with the software. Lanier.img is a Landsat TM image of Lake Lanier, Georgia, obtained by the Landsat-5 sensor. It is 512 columns and 512 rows in size and there are ten classes present on it: deciduous, pine, deciduous/pine mixed, water, agriculture, bare ground, grass, urban/developed, shade, and cloud. Actually, the three vegetation classes are mixed with each other and are difficult to distinguish on the image. To ensure the correctness of the reference, they were merged into one single class named “vegetation.” Hence, the reference data involves eight classes and so do the two classification processes of MLCEP and MLCPP. For MLCPP, prior probabilities were derived from the class area proportions in the recoded inlandc.img file. The two classifications were undertaken using the same training set, which was selected directly from the recoded inlandc.img file. The classified images derived from these two classification procedures are presented in Figure 3. No cosmetic filtering was applied to them. In view of the fact that the sampled reference data may not give a completely faithful reflection of the classification, the accuracy assessment was carried out with the whole image. By referring to the whole recoded inlandc.img file, two confusion matrices were generated (Tables 1 and 2). The changes in the number of assigned pixels and the producer’s and user’s accuracies for each class are displayed in Table 3. Class area proportions derived from the recoded inlandc.img file are also listed in Table 3. Of all the eight classes, “vegetation” has the maximum area and thus the highest prior probability assigned, implying that it can only be spectrally confused with smaller-sized classes and thus acts as the class MMWSS. As shown in Table 3, the number of pixels labeled as “vegetation” increases from 154,102 to 161,284, accompanied by an increase of about 3 percent in its producer’s accuracy and a decrease of about 1 percent in PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 3. The classified images derived from (a), the MLCEP and (b), the MLCPP of the lanier.img file. A color version of this figure is available at the ASPRS website: www.asprs.org.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

S e p t e m b e r 2 0 0 9 1113

TABLE 1.

MLCEP

OF THE LANIER.IMG

FILE

water

agriculture

bare ground

grass

urban/ developed

shade

cloud

152113 110 14557 6526 435 1085 305 1856

184 20864 34 1418 3 2 409 156

1779 0 23712 457 699 311 0 28

0 0 71 6229 110 976 0 14

0 0 1707 2147 7829 767 0 68

0 0 998 1184 147 9217 2 88

26 659 0 104 0 11 2397 0

0 0 0 1 0 0 0 349

TABLE 2.

vegetation water agriculture bare ground grass urban/developed shade cloud

VARIATION

OF THE

vegetation vegetation water agriculture bare ground grass urban/developed shade cloud

TABLE 3.

CONFUSION MATRIX

OF

CONFUSION MATRIX

OF THE

MLCPP

OF THE LANIER.IMG

FILE

vegetation

water

agriculture

bare ground

grass

urban/ developed

shade

cloud

157504 112 11066 5609 520 1305 99 772

338 20950 48 1381 3 4 276 70

3254 0 22595 348 521 265 0 3

0 0 84 5965 136 1208 0 7

0 0 2101 1788 7751 860 0 18

0 0 1147 917 132 9380 1 59

188 798 0 106 0 12 2093 0

0 0 0 1 0 0 0 349

AREAS

AND

PRODUCER’S

Area proportion

AND

USER’S ACCURACIES

Number of assigned pixels

Class

value

Rank order

MLCEP

MLCPP

vegetation water urban/developed agriculture grass bare ground shade cloud

67.5% 8.8% 4.4% 10.3% 4.8% 2.8% 1.2% 0.1%

1 3 5 2 4 6 7 8

154102 21633 12369 41079 9223 18066 3113 2559

161284 21860 13034 37041 9063 16115 2469 1278

BETWEEN THE

TWO CLASSIFICATION RESULTS

Producer’s accuracy

c c c T T T T T

MLCEP

MLCPP

85.9% 90.4% 79.2% 87.9% 62.5% 84.2% 75.0% 99.7%

89.0% 90.8% 80.6% 83.7% 61.9% 80.6% 65.5% 99.7%

FOR THE LANIER.IMG

FILE

User’s accuracy

c c c T T T T –

MLCEP

MLCPP

98.7% 96.4% 74.5% 57.7% 84.9% 34.5% 77.0% 13.6%

97.7% 95.8% 72.0% 61.0% 85.5% 37.0% 84.8% 27.3%

T T T c c c c c

Area proportions are derived from the recoded inlandc.img file and used as prior probabilities in MLCPP. c means increase, T means decrease, – means no variation

its user’s accuracy. Class “cloud” has the minimum area of all classes, implying that it is expected to act as the class MMWLS. As expected, prior probabilities assign fewer pixels to it. However, its producer’s accuracy, remaining unchanged, does not show an expected decrease despite an expected increase shown in its user’s accuracy. This may be attributed to its narrow spectral distribution. When equal prior probabilities are assumed, almost all of the pixels belonging to “cloud” (349 out of 350) have been correctly classified (Table 1), suggesting the decision boundary may be quite far from its mean. After the MLCPP is performed, the incorrect assignment to “cloud” from each of the other classes decreases unless it is equal to zero (see the last rows in Tables 1 and 2, and this results in the increase in its user’s accuracy ); this suggests the decision boundary has shifted toward the mean of class “cloud.” However, there may be no pixels belonging to “cloud” between the two decision boundaries of the MLCPP and MLCEP in the feature space. Consequently, no pixels belonging to “cloud” were 1114 S e p e t e m b e r 2 0 0 9

relabeled as other classes and the producer’s accuracy of class “cloud” thus holds constant. When there are a number of classes in the image, any class is likely to be spectrally overlapped with both largersized and smaller-sized classes. However, the variation in the number of assigned pixels indicates the general effect of spectral confusion: the class which receives greater assignments after prior probabilities are used is expected to belong to the class MMWSS and the class that receives smaller assignments is expected to belong to the class MMWLS. According to this, the class of “vegetation,” “water,” and “urban/developed” are grouped as the class MMWSS, and the other classes are grouped as the class MMWLS. Except for “cloud,” as expected, both the number of assigned pixels and the producer’s accuracy increase for class MMWSS and decrease for class MMWLS, and the user’s accuracy decreases for class MMWSS and increases for class MMWLS. Classes of “shade” and “vegetation” are selected to investigate the effect of the overestimated or underestimated PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

class mostly spectrally overlapped with class “vegetation.” As shown in Tables 1 and 2, class “vegetation” is mostly spectrally confused with class “agriculture” and the relative area proportion of class “agriculture” is just 10.3 percent (Table 3). This suggests that when the prior probability of class “vegetation” was set to 0.1, the decision boundary between classes “vegetation” and “agriculture” was located at approximately the same place in the feature space as using equal priors. For the same reason, class “shade” presents approximately the same producer’s and user’s accuracies as the ones resulting from MLCEP at the value of 0.1. And this value is indicative of the size of class “water” (8.8 percent), which has substantial spectral confusion with class “shade”.

Other Evidence

Figure 4. The effect of the overestimated or underestimated prior probabilities on individual classes: (a) Class “shade,” and (b) Class “vegetation”. The dotted lines indicate the true value of prior probabilities derived from the recoded inlandc.img file.

prior probabilities on individual classes. Class “shade” represents the class MMWLS and the class “vegetation” represents the class MMWSS. The producer’s and user’s accuracies for them in Figure 4 result from the modification of the assigned values of their prior probabilities while prior probabilities of the other classes remained the same as used in the MLCPP above. As shown in Figure 4, both for class “shade” and for class “vegetation,” the producer’s accuracy increases and the user’s decreases with increasing prior probability. Consequently, compared to the MLCPP above, an overestimated prior probability boosted the producer’s accuracy and reduced the user’s accuracy, and an underestimated prior probability reduced the producer’s accuracy and boosted the user’s accuracy. Note that in Figure 4b, class “vegetation” presents more or less the same producer’s and user’s accuracies as the ones resulting from MLCEP at the value of 0.1. This value is indicative of the size of the PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

The data in Table 4 are derived from Tables 2 and 3 in Maselli et al. (1990), and the data in Table 5 are from Tables 2 and 4 in Maselli et al. (1992). MLCPP in Maselli et al. (1990) was carried out using the previously classified images resulting from MLCEP as references and the error probabilities derived from the confusion matrix found on the training pixels as priors. In Maselli et al. (1992), prior probabilities were derived from a nonparametric process proposed by Skidmore and Turner (1988). Except for Class 4 in Table 5, the number of the assigned pixels and the producer’s accuracy show the same change tendency for each of the other classes, and as expected, the user’s accuracy shows an inverse change. One possible reason for the only exception is that during the MLC in Maselli et al. (1992) the pixels would not be assigned to any class if their gray value was beyond the range of mean ⫾3 standard deviations of the relevant spectral signature in all features considered. A substantial spectral confusion is expected to occur at these pixels, so that these pixels, though only accounting for a small number, have the potential to confound the interpretation of the effect of the prior probabilities. For the large-sized class in the image, the smaller-sized class is expected to outnumber the larger-sized class. Statistically, the large-sized class is therefore subject to spectral confusion with the smaller-sized class and thus acts as the class MMWSS. For example, the largest-sized class will surely be confused with the smaller-sized class unless it is spectrally distinct. Likewise, the small-sized class is subject to confusion with the larger-size class and thus acts as the class MMWLS. This can be seen clearly in Tables 3, 4, and 5. However, when the large-sized class spectrally overlaps a larger-sized class or the small-sized class overlaps a smallersized class, an inverse tendency will be observed. For example, the class “agriculture” is significantly spectrally mixed with the largest-size class “vegetation” in the lanier.img case. Hence, though class “agriculture” is the second largest of all eight classes, it acts as the class MMWLS rather than the class MMWSS.

Conclusions When prior probabilities are inserted into the maximum likelihood process, the absolute number of pixels at the examined vector, rather than the relative probability that the vector occurs in each class, is used as the discriminant criterion. In this way, an improved classification or a greater correct assignment is expected as compared to the use of equal priors if prior probabilities are properly estimated. Prior probabilities are designed only for overlapping spectral signatures, and they have no effect where the spectral signatures are highly separable in the feature space. Accordingly, the effect of the prior probability on S e p t e m b e r 2 0 0 9 1115

TABLE 4.

VARIATION

OF AREAS AND AND MLCPP FOR

Number of assigned pixels Class

MLCEP

MLCPP

1 2 3 4 5

3223 2351 1660 2300 2277

4254 2393 1651 1902 1608

PRODUCER’S TM DATA OF

AND USER’S ACCURACIES BETWEEN THE ISLAND OF SARDINIA


c c T T T

MLCEP

MLCPP

63.6% 88.8% 82.9% 61.7% 80.6%

76.3% 89.6% 82.9% 57.7% 68.5%

MLCEP

User’s accuracy

c c T T T

MLCEP

MLCPP

83.7% 92.0% 71.0% 56.7% 56.7%

76.0% 91.3% 71.4% 64.1% 68.2%

T T c c c

Derived from Tables 2 and 3 in Maselli et al. (1990).

TABLE 5.

VARIATION OF AREAS AND PRODUCER’S AND USER’S ACCURACIES BETWEEN MLCEP AND MLCPP FOR TM DATA OF THE SOUTH OF FLORENCE, ITALY

Number of assigned pixels Class

MLCEP

MLCPP

1 2 3 4 5

26932 24863 4206 13112 6226

10720 44309 3623 15794 2356


T c T c T

MLCEP

MLCPP

82.3% 48.5% 80.6% 68.1% 43.7%

74.2% 81.7% 73.1% 67.2% 20.2%

User’s accuracy

T c T T T

MLCEP

MLCPP

27.8% 92.5% 60.8% 68.9% 16.7%

62.9% 86.9% 63.2% 64.3% 19.9%

c T c T c

Derived from Tables 2 and 4 in Maselli et al. (1992).

an individual class is independent of the classes that are spectrally separable from it and is determined by both its anticipated size and the anticipated sizes of the other classes significantly mixed spectrally with it. In most cases, prior probabilities will not produce an improvement in both the producer’s and user’s accuracies for individual classes at the same time. An increase in the assigned value of the class prior probability will shift the decision boundary away from its mean, and therefore boost the producer’s accuracy. Though the change of the user’s accuracy is datadependent, it is expected to decrease in most cases. On the contrary, a decrease in the assigned value of the class prior probability will shift the decision boundary towards the class mean and therefore reduce the producer’s accuracy and in most cases boost the user’s accuracy. Consequently, a large-sized class spectrally mixing with small-sized classes generally tends to receive more assigned pixels and presents a higher producer’s accuracy and a lower user’s accuracy when using prior probabilities in the classification. For a small-sized class spectrally mixing with largesize classes, the tendency is just the opposite. In the classification practice, if the modification of the assigned value of the prior probability cannot result in an expected change of an individual class in the classified result, the validity of the reference data and thus the reliability of the derived user’s and producer’s accuracies involving this class or its spectrally mixed classes can be considered to be suspect. In this way, the established rule can find its use in operational applications.

Acknowledgments The research is supported by National Natural Science Foundation of China (40871138), the Innovation Project of the Institute of Geographic Sciences and Natural Resources 1116 S e p e t e m b e r 2 0 0 9

Research, Chinese Academy of Sciences (O7V70030SZ) and National Basic Research Program of China (No. 2007CB407207). The authors are grateful to two PE&RS referees, also to the five IJRS referees. Their comments were invaluable in improving the manuscript; some of them rewrote the manuscript and greatly improved the English in this paper. Dr. Xiaojun Zhang’s efforts with the written English and Mr. Guanfu Han’s work on Figure 2 in the manuscript are also appreciated.

References Congalton, R.G., 1991. A review of assessing the accuracy of classification of remotely sensed data, Remote Sensing of Environment, 37:35–46. Ediriwickrema, J., and S. Khorram, 1997. Hierarchical maximum-likelihood classification for improved accuracy, IEEE Transactions on Geoscience and Remote Sensing, 35:810–816. Gorte, B., and A. Stein, 1998. Bayesian classification and class area estimation of satellite images using stratification, IEEE Transactions on Geoscience and Remote Sensing, 36(3):803–312. Huang, M.H., and M.K. Ridd, 2002. A subpixel classifier for urban land-cover mapping based on a maximum-likelihood approach and expert system rules, Photogrammetric Engineering & Remote Sensing, 68(11):1173–1180. Maselli, F., C. Conese, and G. Zipoli, 1990. Use of error probabilities to improve area estimates based on maximum likelihood classification, Remote Sensing of Environment, 31:155–160. Maselli, F., C. Conese, L. Petkov, and R. Resti, 1992. Inclusion of prior probabilities derived from a nonparametric process into the maximum likelihood classifier, Photogrammetric Engineering & Remote Sensing, 58:201–207. McIver, D.K., and M.A. Friedl, 2002. Using prior probabilities in decision-tree classification of remotely sensed data, Remote Sensing of Environment, 81:253–261.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Pedroni, L., 2003. Improved classification of Landsat Thematic Mapper data using modified prior probabilities in large and complex landscapes, International Journal of Remote Sensing, 24:91–113. Skidmore, A.K., and B.J. Turner, 1988. Forest mapping accuracies are improved using a supervised nonparametric classifier with SPOT data, Photogrammetric Engineering & Remote Sensing, 54(12):1415–1421.

PHOTOGRAMMETRIC ENGINEERING REMOTE SENSING Photogrammetric Engineering & Remote& Sensing

September Layout.indd 1117

Strahler, H., 1980. The use of prior probabilities in Maximum Likelihood Classification of remotely sensed data, Remote Sensing of Environment, 10:135–163. Zheng, M.G., Q.G. Cai, and Z.J. Wang, 2005. Effect of prior probabilities on maximum likelihood classifier, Proceedings of the 25th IEEE International Geoscience and Remote Sensing Symposium, 25–29 July, Seoul, Korea: pp. 3753–3756.

S embe ptem 1117 S e pte r b2 e0r0 92 0 0 9 1117

8/13/2009 1:36:30 PM

The Effect of Prior Probabilities in the Maximum Likelihood ... - asprs

The Effect of Prior Probabilities in the Maximum Likelihood ... - asprs

Suggest Documents

MAXIMUM LIKELIHOOD ESTIMATION The maximum

The Maximum Likelihood Degree

Effect of Prior Probabilities on the Classificatory Performance of ... - EMIS

Degeneracy in the maximum likelihood estimation of

Enclosing the Maximum Likelihood of the Simplest

Maximum likelihood

The maximum likelihood tree obtained in PHYML

Multivariate normal approximation of the maximum likelihood ...

A COMPREHENSIVE MAXIMUM LIKELIHOOD ANALYSIS OF THE ...

Multivariate normal approximation of the maximum likelihood ...

On the characterization of maximum likelihood

MAXIMUM LIKELIHOOD ESTIMATION OF THE COX ... - DSP

Maximum Likelihood Parameter Estimation of the ... - CiteSeerX

ON THE MAXIMUM LIKELIHOOD TRAINING OF GRADIENT ...

The Maximum Likelihood Degree of Toric Varieties

Tightness of the maximum likelihood semidefinite ... - CiteSeerX

Maximum Likelihood Programming in Stata

Maximum Likelihood Supertrees

Maximum Likelihood Estimation.pdf

Maximum-likelihood Stochastic-transformation

Maximum Lq-likelihood estimation

Maximum Kernel Likelihood Estimation

Calculating the Normalized Maximum Likelihood Distribution for

maximum likelihood estimation - NYU