A NEW COMPLEXITY PRIOR FOR MULTIRESOLUTION IMAGE ...

A NEW COMPLEXITY PRIOR FOR MULTIRESOLUTION IMAGE DENOISING Juan Liu and Pierre Moulin

University of Illinois at Urbana-Champaign Beckman Institute and ECE Department 405 N. Mathews Ave., Urbana, IL 61801 Email: fj-liuf, [email protected]

ABSTRACT

Application of the Minimum Description Length (MDL) principle to multiresolution image denoising has been somewhat unsuccessful to date. This disappointing performance is due to the crudeness of the underlying prior image models, which lead to overly sparse solutions. We propose a new family of complexity priors based on Rissanen's universal prior for integers, which produces estimates with better sparsity properties. This method vastly outperforms previous MDL schemes and is competitive with Bayesian estimators using Generalized Gaussian priors on wavelet coecients.

1. INTRODUCTION Rissanen's Minimum Description Length (MDL) principle [1, 2] provides a powerful theoretical framework for signal and image estimation. The method may be viewed as a penalized{likelihood approach, where the penalty represents the complexity of the unknown signal and is measured by the length of a binary string used to encode the signal. This notion of complexity is applicable to both random and nonrandom signals. The method possesses \universality" properties, in the sense that it delivers satisfactory performance over a broad class of signals, for suitably constructed codebooks [3, 4]. The application of this method to image estimation is particularly attractive, because there exist image representations such as wavelet representations which provide reasonable measures of image complexity. There is a close relationship between MDL methods and the universal thresholding estimation techniques championed by Donoho and JohnWork supported by the National Science Foundation under CAREER award MIP-9732995.

stone [5], in that both methods include explicit mechanisms for quantifying the sparsity of wavelet representations. In fact, MDL estimation using the simple coding technique used by Saito [6] is equivalent to a hard thresholding method with threshold slightly larger than Donoho and Johnstone's universal threshold [4]. Additionally, MDL estimation may be viewed as Bayesian estimation using a (possibly improper) prior. Despite the attractive properties of MDL estimators in a variety of theoretical frameworks, results obtained on actual images have been somewhat disappointing. In this paper, we identify two main reasons for this problem and present a powerful, universal alternative to Saito's coding scheme. This study is applied to the classical problem of estimating images corrupted by additive white Gaussian noise with zero mean and known variance 2 .

2. COMPLEXITY PRIORS The MDL estimator implicitly constructs complexity priors by associating a coding complexity (measured in number of bits) to each candidate signal in a discrete set A. The cost function for MDL estimation is the sum of the negative loglikelihood and the complexity penalty. Sparse wavelet models can be represented by encoding both the amplitude and the location of each nonzero coecient . The scheme introduced by Saito [6] implies a penalty of 23 K log2 N bits on the K nonzero coecients. Based on this penalty, the estimate is obtained simply by hard thresholding p every noisy coecient , with threshold = 3 ln N [8] (22.5% larger than pDonoho and Johnstone's universal hard threshold 2 ln N ). The penalty 32 log2 N for each

This technique can be extended to encode all integers, including zero. One needs to use a sign bit as well as a special codeword for zero. (Rissanen [2] ignores the latter.) De ne the length of the codewords as

0.25

0.2

(

0.15

L (j ) = 0.1

0.05

0 −25

−15

−10

−5

0

5

10

15

20

25

Figure 1: Probability model implied by log {A coding scheme (q = 0:2). nonzero coecient accounts for 21 log2 N bits for coding the amplitude (the standard MDL penalty), and log2 N bits for coding the location of the coecient out of N possible locations. In addition, a log2 N bits overhead is used for encoding the value of K . Two underlying assumptions are made in this model: A1. Each coecient is bounded in magnitude by ApN , where A is some constant, and the distribution of the quantized coecients is uniform. A2. The locations of the nonzero coecients are equally probable among N possible locations. For real{world images, these two assumptions are often too crude, and result in overly sparse wavelet models and in poor overall estimation performance. In this paper, we refer to this estimator as \U ? LA", since it uniformly encodes both the location and the amplitude of each nonzero coecient. In view of the weakness of these assumptions, we propose an alternative coding scheme for quantized wavelet coecients. The coding scheme we consider is based upon the so{called \universal prior for integers" proposed by Rissanen. This prior is an ideal one in a certain asymptotic entropy sense [2, Sec. 2.2.4]. The length of the codeword for positive integer j is log j = log2 j + log2 log j + ::: + log2 c0 where the summation stops at the rst negative term, and c0 2:865 is computed P to satisfy Kraft's inequality with equality, i.e., 2? log = 1. The quantity 4 2? log may be interpreted as a probability P (j ) = distribution over strictly positive integers. j

j

j

2

(1) where q 2 (0; 1) determines the length of the zero codeword. The probability q of the zero codeword controls the sparsity of the wavelet model. The higher q is, the sparser the model is. The probability P (j ) = 2? q ( ) has very heavy tails, decaying slightly faster than ?1 , for any value of q. It is possible to choose q so that the codelength function L (j ) does not exhibit an abrupt change at j = 0. A typical choice used in our experiments is q = 0:2. Fig. 1 plots the underlying P (j ) in this case. Thus the quantized amplitudes of all coecients, zero or nonzero, are directly encoded. This scheme is referred to as \log {A". If the probability of zero symbol is chosen to be q = 1 ? N ?1 ln N (q is very close to 1), the underlying wavelet model is highly sparse. With this particular choice of q, it can be shown that the coding scheme is equivalent to uniformly coding the location of each nonzero coecients (assumption A2), and coding the quantized amplitudes using Rissanen's universal prior for nonzero integers. We refer to this coding scheme as \log {LA". As noted by Rissanen, the construction of discrete universal priors can be easily extended to de ne a probability distribution over the real line. We extend the coding scheme (1) and de ne the complexity prior L

−20

? log q : j=0 ? log (1 ? q) + 1 + log jj j : j 6= 0 2

q

j

q

() = 2?

Lq (0)

() + 1

1 X j j=1

2?

Lq (j )

I

[0;1)

(jj= ? j )

j

(2) where is the discretization step and I[0 1) () is the indicator function over the interval [0; 1). We set the discretization step size = , a standard MDL choice. Using the complexity penalties in the MDL estimation, we derive basic properties of the resulting estimator. Since the distribution for the coecients is assumed to be separable, the MDL estimator is separable, in the sense that estimates of individual wavelet coecients are obtained by application of a nonlinearity (shrinkage function) to the empirical, noisy coef cients. Due to the use of the log function, there does not exist a closed form solution for the shrinkage ;

function. However, we can prove the following basic results: 1. The shrinkage functions associated with the new complexity q prior (2) with q > 12 exhibit a threshold 2 ln 1? . The threshold which controls the sparsity of the estimates increases with q. q

q

q

q

2. The shrinkage function associated with the new complexity prior (2) with q ! 1 is asymptotically equivalent p to a hard threshold with threshold = 2 ln(1 ? q)?1 . In particular, the estimator resulting from the choice q = 1 ? N ?1 ln N (log {LA) is asymptotically equivalent to Donoho and Johnstone's universal hard thresholding technique as N ! 1. q

3. A necessary condition for consistency of the estimates is that q ! 1 as the sample size N ! 1.

hibits a threshold which is much smaller ( 1:86), as shown in Fig. 2b. Fig. 3 shows denoised images. Fig. 3a is the noisy Barbara with = 7. Universal HT produced strong oversmoothing arctifacts, as can be seen in Fig. 3b. The results obtained by U ? LA and log {LA are visually similar. We show the log {A result in Fig. 3c. The image appears much clearer with sharper edges. The results are comparable to Bayesian estimation results using a Generalized Gaussian prior on the wavelet coecients [7]. However, the main advantage of the MDL scheme over Bayesian estimators is that the underlying prior is universal. Very competitive performance has been obtained for a variety of test images in addition to Barbara. This attests to the robustness of the new complexity priors and to the reasonable nature of heavy{tailed log distribution for image inference. MSE Barbara Peppers Universal HT 24.98 45.01 37.16 U{LA 30.6 68.67 44.57 log {LA 26.4 55.73 39.65 log {A 9.3 21.75 21.29

3. IMAGE DENOISING EXPERIMENTS We have applied the methods above to a variety of 512 512 test images. We report results on Barbara and Peppers corrupted by additive white Gaussian noise with variance 2 = 25 and 2 = 49. The wavelet decomposition was a 4{level decomposition using Daubechies' 4{tap lters. Numerical values for the MSE of various estimators are reported in Tables 1 and 2. The performance of the universal hard thresholding technique (HT in the table) was very poor: the MSE was actually worse than the MSE of the noisy image data themselves. We present results for three MDL schemes: the original scheme based on Saito's coding technique (U ? LA), and two instances of our new scheme based on Rissanen's universal prior for integers: log {A (with q = 0:2), and log {LA (with q = 1 ? N ?1 ln N ). Due to the very large threshold used by U ?LA, (approximately 22.5% larger than the universal HT), the MSE performance of U ? LA was unsatisfactory. It suers from the underlying assumption that the image is highly sparse. The log {LA scheme produced results similar to the universal HT. The corresponding shrinkage function is shown in Fig. 2a. The discontinuities are due to quantization eects,p and the threshold = 26:4 is slightly larger than 2 ln N = 24:98. Both U ? LA and log {LA suer from the weakness of assumption A2. In contrast, the log {A scheme, which encodes the amplitudes of all wavelet coecients, outperforms the previous two schemes. The shrinkage function also ex-

Method

Table 1: Estimation of 512 512{images corrupted by AWGN with standard deviation = 5. Here HT = hard threshold, = threshold value, and MSE = 1 ^ jj ? jj 2 corresponds to one particular realization of the noise. N

l

Method

Universal HT U{LA log {LA log {A

35.0 42.8 36.9 13.0

MSE Barbara Peppers 72.60 48.40 109.97 60.92 90.94 53.32 38.29 34.69

Table 2: Estimation of 512 512{images corrupted by AWGN with standard deviation = 7.

4.

REFERENCES

[1] J. Rissanen, \Universal Coding, Information, Prediction, and Estimation," IEEE Trans. on Info. Theory, Vol. 30, pp. 629|636, 1984.

120

120

100

100

80

80

60

60

40

40

20

20

0

0

20 λ=26.4

0 40

60

80

100

120

0

λ=9.3

20

40

60

80

100

120

Figure 2: Shrinkage functions corresponding to the complexity prior (2) . Image size is N = 5122, noise variance is 2 = 25. (a) log {LA coding scheme. (b) log {A coding scheme: probability of the zero symbol is q = 0:2. The threshold is obtained numerically as 1:86.

Figure 3: Denoising of Barbara: a) Noisy image, MSE= 48.94; b) Universal hard threshold, MSE= 72.60; c) MDL estimates using log {A prior, MSE=38.29.

[2] J. Rissanen, Stochastic Complexity in Statistical Inquiry, World Scienti c, Singapore, 1992. [3] A. R. Barron and T. M. Cover, \Minimum Complexity Density Estimation," IEEE Trans. on Info. Theory, Vol. 37, No. 4, pp. 1034|1054, 1991. [4] P. Moulin, \Model Selection Criteria and the Orthogonal Series Method for Function Estimation," Proc. IEEE Int. Symp. on Information Theory, Whistler, B.C., Sep. 1995. [5] D. L. Donoho and I. M. Johnstone, \Ideal Spatial Adaptation Via Wavelet Shrinkage," Biometrika, Vol. 81, pp. 425|455, 1994.

[6] N. Saito, \Simultaneous Noise Suppression and Signal Compression Using a Library of Orthonormal Bases and the MDL Criterion," in Wavelets in Geophysics, E. Foufoula{Georgiou and P. Kumar, Eds., pp. 299|324, Academic Press, 1994. [7] P. Moulin and J. Liu, \Analysis of Multiresolution Image Denoising Schemes Using Generalized{ Gaussian Priors," IEEE-SP Symp. on TFTS, Pittsburgh, 1998. [8] J. Liu and P. Moulin, \Complexity{regularized image denoising," Proc. of ICIP'97, vol. II, Santa Barbara, CA, Oct. 1997, pp. 370|373.