A variational Bayesian approach for unsupervised

IOP PUBLISHING

INVERSE PROBLEMS

Inverse Problems 28 (2012) 125005 (31pp)

doi:10.1088/0266-5611/28/12/125005

A variational Bayesian approach for unsupervised super-resolution using mixture models of point and smooth sources applied to astrophysical map-making Hacheme Ayasso 1 , Thomas Rodet 2 and Alain Abergel 1 1 Institut d’Astrophysique Spatiale (UMR 8617 CNRS-Univ Paris sud 11), Centre Universitaire d’Orsay, Bat 120-121, F-91405 Orsay Cedex, France 2 Laboratoire des Signaux et Syst` emes (UMR 8506 CNRS-SUPELEC-Univ Paris sud 11), 3 rue Joliot-Curie, F-91192 Gif-sur-Yvette Cedex, France

E-mail: [email protected], [email protected] and [email protected]

Received 27 April 2012, in final form 10 October 2012 Published 14 November 2012 Online at stacks.iop.org/IP/28/125005 Abstract We present, in this paper, a new unsupervised method for joint image superresolution and separation between smooth and point sources. For this purpose, we propose a Bayesian approach with a Markovian model for the smooth part and Student’s t-distribution for point sources. All model and noise parameters are considered unknown and should be estimated jointly with images. However, joint estimators (joint MAP or posterior mean) are intractable and an approximation is needed. Therefore, a new gradient-like variational Bayesian method is applied to approximate the true posterior by a freeform separable distribution. A parametric form is obtained by approximating marginals but with form parameters that are mutually dependent. Their optimal values are achieved by iterating them till convergence. The method was tested by the model-generated data and a real dataset from the Herschel space observatory. (Some figures may appear in colour only in the online journal)

1. Introduction Research for methods providing images with the highest resolution taking into account the instrument properties and the observing modes has gained a lot of interest over the last two decades. Super-resolution (Park et al 2003) techniques have been applied in many fields such as medical imaging (Robinson et al 2010), radar (Borison et al 1992), microscopy (Wilson and Hewlett 1991), satellite imaging (Kasetkasem et al 2005) and astronomy (Willett et al 2004). Moreover with increasing complexity in the observed object and higher sensitivity, automatic component separation methods became essential for a proper analysis of the 0266-5611/12/125005+31$33.00 © 2012 IOP Publishing Ltd

Printed in the UK & the USA

1

Inverse Problems 28 (2012) 125005

H Ayasso et al

data. One can find applications for source separation in medical imaging (Zibulevsky and Pearlmutter 2001), spectroscopy (Moussaoui et al 2006), astronomy (Nuzillard and Bijaoui 2000) and telecommunications (Gorokhov and Loubaton 1997). In this work, we develop a new method based on super-resolution and source separation for map-making of astronomical data taken by the SPIRE instrument onboard the space observatory Herschel (Pilbratt et al 2010) of the European Space Agency. Launched in 2009, Herschel carries the largest space telescope actually in operation. Herschel is designed to observe the cool universe, to study early epoch galaxy building, the formation of stars and planetary systems, the late stages of stellar evolution and the physico-chemical processes in the interstellar medium both locally in our own Galaxy and in external galaxies. Herschel performs mapping and spectroscopy in the far infrared and submillimetre part of the spectrum with unprecedented sensitivity and angular resolution. In maps, different components must be separated (like dust clouds and faraway galaxies) for a proper analysis of their properties. This is quite a challenge due to the limited resolution, the presence of drifts of the detector response and the lack in information present in the data to disentangle the different astrophysical components, the instrument transfer function and the measurement noise. Hence, a new method is needed to account for the maximum available information on the instrument and the observed skies for a faithful reconstruction. Several methods were proposed in the literature to address the problem of super-resolution. An interested reader may refer to Park et al (2003) and Mohammad-Djafari (2009) for surveys of existing generalist methods and Murtagh and Starck (2006) for methods used in astronomy. However, for Herschel, high-resolution map-making methods are classified into three categories. The first is the so-called na¨ıve (co-addition), which consists of averaging measurements falling on the same sky pixel. The second regroups maximum likelihood methods such like MADmap (Cantalupo et al 2010) and SANEPIC (Patanchon et al 2008) that make a correlated form of noise on measurement error. The third is the Bayesian methods such as in Orieux et al (2011).3 However, these methods suffers from several drawbacks. For example, maps estimated with the first and second methods have limited spatial resolution, since they do not account for the instrument optical transfer function in their models. Furthermore, the smooth only method is limited to extended emission modelling which creates several artefacts around point sources. From a point of view of smooth and point sources separation, it has been performed as a post-treatment step once the map is obtained as in the CLEAN method (Högbom 1974), which iteratively removes the beam effect for points with highest intensities. In spite of being dedicated for impulsional sources’ map reconstruction, this method is incapable of going beyond the instrument resolution. Furthermore, in the presence of a regular component, lowintensity sources in cold areas (background) are undetectable in comparison with sources in warmer areas (dust cloud), which leads to serious problem in constructing a histogram of coherent sources. Other methods were based on union of bases as in Bobin et al (2007) and Kowalski and Torrésani (2008). The morphological component analysis (MCA) (Bobin et al 2007) proposes to decompose the map into two bases: wavelet for texture and curvelet for contours. In a context of separation between smooth and point source components, the latter already has a sparse structure and there is no need to look for a sparse representation in a curvelet basis. Furthermore, maps considered in this work do not have sharp edges adapted with the curvelet representation and a different approach should be considered. Therefore, we propose in this paper to tackle the problems of high-resolution mapmaking and component separation jointly in a Bayesian framework where prior information 3

2

We refer to this method as smooth only method.


H Ayasso et al

on different components is seamlessly integrated into the probabilistic form. Several works have been devoted to the problem of smooth and point sources separation (De Mol and Defrise 2004, Giovannelli and Coulais 2005). Different priors were proposed for the point source component such as by Chen et al (1999) where a total variation prior was used. Moreover, heavy tail priors were used like Laplace distribution (Giovannelli and Coulais 2005), Bernoulli– Gaussian distribution (Rabaste and Chonavel 2007, Goussard and Demoment 1989), finite Gaussian mixtures (Snoussi and Mohammad-Djafari 2004), or infinite Gaussian mixtures (Wainwright and Simoncelli 2000). In this work, we have opted for Student’s t-distribution for the point source component as by Chantas et al (2008) and a correlated Gaussian field for smooth component modelling. The mixture prior was first introduced for image deconvolution problem (Rodet and Zheng 2009). Nevertheless, the joint posterior distribution of the sky components and the other hyperparameters (prior and noise parameters) has a complex expression and neither the joint maximum a posteriori (JMAP) nor the posterior mean (PM) has a tractable form. Hence, an approximation of the true posterior becomes necessary. Two main methods were generally used: stochastic, as in Markov chain Monte Carlo (MCMC) methods (Robert and Casella 2004), and deterministic, as in variational Bayesian approach (VBA) (Sm´ıdl and Quinn 2006). The idea of the former is to draw random samples from the posterior distribution, and then use them to calculate the moments numerically once a sufficient number of samples is obtained. These methods provide very good exploration of the posterior space which permits avoiding a local solution. However, they are highly time consuming so their implementation to a large scale problem is infeasible. The variational Bayesian technique approximates the true posterior analytically by a freeform separable distribution which minimizes the Kullback–Leibler divergence. The approach leads, for our case, to a parametric form with shaping parameters that are mutually dependent. Their optimal values are reached by singly updating them till convergence in an iterative framework. We propose here a deterministic approach based on the variational Bayesian technique, which updates shaping parameters simultaneously in a gradient-like framework (Fraysse and Rodet 2011, 2012) to accelerate the convergence of the method. The novelties of our approach are the joint super-resolution and component separation framework, the unsupervised approach where almost all the hyperparameters are estimated jointly with the unknowns, the joint estimation of drift in detectors and the application of the new variational Bayesian technique which provides an efficient estimator for high-dimensional datasets. The rest of this paper is organized as follows. In section 2, we present our Bayesian approach for the problem with the mixture prior model. Then, the estimation problem is addressed in section 3 using the VBA. Application on simulated and real data from the SPIRE/Herschel observatory is discussed in section 4. Finally, we conclude this work in section 5 and we give several perspectives.

2. Bayesian approach In the context of the current work, we want to reconstruct a high-resolution image from several shifted observations. The detector, composed of Nd pixels, has a small field of view (FOV) to cover the whole zone of interest. Therefore, a scanning system (basically a straightline translation, see figure 1) is used to acquire a number of samples y1 , . . . , yT , where yt = {yt1 , . . . , ytNd } is the acquisition of the detector at time index t ∈ {1, . . . , T } and T is the number of acquisitions (low resolution images). The translation distance ( = {α , β }) is considered to be known. 3


H Ayasso et al

Figure 1. Schematic representation of the scanning process in the telescope.

In the following, we present the forward model which accounts for the optical transfer function of the telescope, the pointing process and the temperature drift model. Meanwhile, other instrument effects are considered to be corrected by pretreatment phase. Then, the prior models for the different unknowns are given with the resulting joint posterior distribution. 2.1. Forward model We suppose in this work a linear model for the acquisition where the instrument observes the sky x at an instant t with the pointing positions αt , βt and obtains the data yt with an additive error nt . Hence, the acquisition model can be written as yt = Ht x + nt ,

(1)

where Ht is a matrix whose elements are calculated to perform discrete convolution between the point spread function (PSF) c(α, β ) and the sky x(α, β ) taken at points (αti , βti ), i ∈ {1, . . . , Nd }. These points are supposed to be an integer number of pixels. We write the forward model based on the acquisition model (equation (1)) by concatenating all the observations yt , t = 1, . . . , T in one-column vector as follows: y = Hx + n ⎡ ⎤ y1 ⎢ .. ⎥ ⎢.⎥ ⎢ ⎥ ⎥ with y = ⎢ ⎢ yt ⎥, ⎢.⎥ ⎣ .. ⎦ yT 4

(2) ⎡

⎤ H1 ⎢ .. ⎥ ⎢ . ⎥ ⎢ ⎥ ⎥ H=⎢ ⎢ Ht ⎥, ⎢ . ⎥ ⎣ .. ⎦ HT

⎡

⎤ n1 ⎢ .. ⎥ ⎢ . ⎥ ⎢ ⎥ ⎥ n=⎢ ⎢ nt ⎥. ⎢ . ⎥ ⎣ .. ⎦ nT

(3)


H Ayasso et al

The system matrix H can be decomposed into two matrices: H = UC,

(4)

where C is a circular matrix representing the convolution with the optical part and U is a pointing matrix which defines the FOV at acquisition. Moreover, there is a risk of drift in the detector values during observations which should be estimated to avoid artefacts in the final map. So, we choose to represent this by an unknown offset for each detector and the error model is set as white Gaussian with an unknown mean ot = {o1t , . . . , odt } and variance (ρn−1 ). So, we can write ρn n − o22 P (n|ρn ) ∝ exp − . (5) 2 Hence, the corresponding observation likelihood is given as ρn y − Hx − o22 P (y|ρn , H, x) ∝ exp − . 2

(6)

The mixture model supposes that the sky can be written as a sum of a smooth part s and a sparse one p: x = s + p.

(7)

We assign a prior distribution that privileges slow varying structure for the smooth part. Meanwhile, the sparse component is modelled by heavy tail prior. We give in the next section the expression of the chosen prior distributions for the unknown variables. 2.2. Prior models The choice of the prior is crucial for the separation process since no distinguishing information is present in the data and the separation depends only on the property of each component. Therefore, we chose a Markovian model for the smooth component s accounting for homogeneity and a heavy tail prior for p accounting for sparsity. The smooth part s is modelled by a Markovian prior which takes the form of a correlated Gaussian: ⎛

2 ⎞ ρs Dα s22 + Dβ s 2 ⎠, (8) P (s|ρs ) ∝ exp ⎝− 2 where ρs is the spatial correlation parameter, and Dα and Dβ are the finite-difference matrices in α and β directions, respectively. For p, we have chosen a centred separable homogeneous Student’s t-prior which privileges sparse maps: ρ p 2 − ν+1 2 pi 1+ , (9) P (p|ρ p , ν) ∝ ν i where ρ p is a scale parameter and ν is the number of degrees of freedom that determines the shape of the tail. This distribution can be obtained by marginalization of the three-parameter normal-Gamma distribution 2 ν −1 , dρi P (p|ρ p , ν) = P (p|ρ, ρ p )P (ρ|ν)dρ = N (0, (ρ p ρi ) )G ν 2 RN p R i νρ −ν−2 ρ p ρi (pi )2 i ρi 2 exp − dρi , ∝ exp − (10) 2 2 R i 5


H Ayasso et al

Figure 2. Hierarchical graphical model.

where Np is the number of elements of p,N (.) is Gaussian distribution, G(.) is the Gamma distribution and ρ = ρ1 , . . . , ρi , . . . , ρNp is an auxiliary precision parameter. This property makes dealing with the t-distribution easier as we need to deal only with Gaussian and Gamma distributions which are conjugate with respect to the likelihood. Therefore, we choose to solve the extended problem as by Chantas et al (2008), where s, p and ρ are estimated jointly. However, this comes at the price of increasing the number of unknowns to be estimated. This approach can be seen as replacing the t-distribution by an infinite mixture of Gaussian. We introduced in our model several factors (hyperparameters ) θ = {ρn , ρs , o} which affect the quality of estimation. Therefore, an automatic estimation of these parameters is needed in order to have a more robust method. Hence, we assign conjugate prior distributions for the hyperparameters and they read P (ρn |γn , φn ) = G(γn , φn ),

(11)

P (ρs |γs , φs ) = G(γs , φs ),

(12)

P (o|mo, ρo ) = N (mo, ρo−1 I).

(13)

Hence, the equivalent graphical model to likelihood and hierarchical prior over the unknown map and hyperparameters is given in figure 2. All the necessary ingredients to form the joint posterior distribution are ready and its expression is given in the next section, where the problem of joint estimation is also evoked. 2.3. Posterior distribution In the previous two subsections, we presented the necessary elements to calculate the joint posterior. By the application of Bayes’ rule, we obtain P (u|y) ∝ P (y|s, p, ρn , o)P (s|ρs )P (p|ρ p , ρ)P (ρ|ν)P (ρn |γn , φn )P (ρs |γs , φs )P (o|mo, ρo ) ∝ exp J , (14) where 6


H Ayasso et al

u = {s, p, ρ, θ},

2 [ρi pi 2 J = − 0.5 ρn y − H(s + p) − o22 + ρs Dα s22 + Dβ s 2 +ρ p i

ρn − (φn − 1) log (ρn ) γn ρs − Ns log (ρs ) + − (φs − 1) log (ρs ) + ρo o − mo22 , γs

− log(ρi ) + νρi − (ν − 2) log(ρi )]−Ny log (ρn ) +

(15)

where Ns is the number of pixels in the smooth component and Ny is the number of data points. Then, the estimation can be performed using one of the classical estimators (e.g. JMAP or the PM). However, both these estimators have an intractable form and an approximation is needed to find an efficient solution. We propose a deterministic method based on the VBA. In the next section, we briefly present this approach and give the estimator form. 3. Variational Bayesian approximation One the main difficulties in obtaining a tractable form for the estimator is the mutual dependence between different unknowns. Therefore, the true posterior needs to be approximated by another separable distribution which facilitates the calculation of the estimators. The main idea of the VBA is to approximate the true posterior by a free-form distribution that minimizes the Kullback–Leibler (KL(·)) divergence (Hinton and van Camp 1993). In other words, we approximate P (u|y) by Q(u) = i Qi (ui ) in a way that P (u|y) du , (16) Q(u) = arg min KL(P||Q) = Q(u) log Q(u) where ui is subset determined by the chosen separation. The latter divergence can be written as the sum of two entities: KL(P||Q) = log (P (y|M)) − F (Q)

(17)

where P (y/M) is the model evidence and F (Q) is the negative free energy. When working with the distribution from the exponential family, the solution of this functional optimization problem can be written as follows: (18) Q(ui ) ∝ exp J j=i Q(u j ) , where AB is the expectation of variable A w.r.t. the distribution B. This leads to a parametric form for Q(u) whose shaping parameters are mutually dependent. Hence, the optimal solution can be obtained by iterating the value of shaping parameters till the convergence. For more details about this approach, an interested reader can refer to Sm´ıdl and Quinn (2006). Recently, a gradient-like version of this approach was proposed by Fraysse and Rodet (2011) in order to accelerate the convergence towards the optimal solution. The concept is to update some of the shaping parameters simultaneously. In this case, the approximating distribution at iteration k is given as 1−λ (19) exp λJ Qˇ k−1 (u j ) , Qˇ k (ui ) ∝ Qˇ k−1 (ui ) j=i

where λ > 0 is the gradient step. For λ = 1, this version gives the same update procedure as for the standard VBA. We give in the next section our choice for the separation where we propose strong separation layout. 7


H Ayasso et al

3.1. Separation choice The choice of separation layout plays an important role in determining the estimation quality and complexity. We have chosen here a strong one which imposes all unknowns to be independent posteriorly. So, the approximating distribution is given as Q(u) = Q(s, p, ρ, θ) = Q(ρn )Q(ρs ) Q(o j ) Q(si )Q(pi )Q(ρi ), (20) j

i

which guarantees a simple updating equations. However, this comes at the price of quality of approximation, since we lose all the information about the posterior correlation between different variables. In the following section, the expressions for the approximating marginals are given with updating formulae for the shaping parameters using the gradient-like variational approach presented earlier. 3.2. Approximating marginals In order to calculate the approximating marginals using equation (19) for each unknown variable, a functional optimization problem should be solved. Thanks to the conjugate form of the prior, the marginals become parametric from the same family of the prior (derivation steps are given in appendix B). We choose to update the approximating posterior by six groups (p, s, ρ, ρn , ρs and o) which necessitate six different gradient steps (λs , λ p , λρ , λρn , λρs , λo). Nevertheless, since there is no dependence between elements of the same group for ρ, ρn , ρs and o, their gradient steps can be set to 1 to accelerate the convergence (λ = 1, ∈ {ρ, ρn , ρs , o}). So Q(u) can be written as ˇ ˇ ˇ ), ˇ p ), ˇ p, V ˇ ,V Q(p) = N (m Q(s) = N (m s s ˇ ˇ ˇ ˇ Q(ρ) = G(γˇi , φi ), Q(ρn ) = G(γˇn , φn ), i

ˇ s ) = G(γˇs , φˇs ), Q(ρ

ˇ ˇ o ), ˇ o, V Q(o) = N (m

where the shaping parameters are given as t −1 t t ˇ ks = [(1 − λs )V ˇ −1 V s + λs Diag ρ¯n H H + ρ¯s Dα Dα + Dβ Dβ ] ,

(21)

ˇ ks ρ¯n Ht (˜y − H(m ˇ ks = m ˇ s + λs V ˇp+m ˇ s )) − ρ¯s (Dtα Dα + Dtβ Dβ )m ˇs , m

(22)

−1 t t ˇ kp = (1 − λ p )V ˇ −1 ˇ φˇ V , p + λ p Diag ρ¯n H H + ρ p γ

(23)

ˇ kp (ρ¯n Ht (˜y − H(m ˇ kp = m ˇ p ), ˇ p + λ pV ˇs+m ˇ p )) − ρ p γˇ ◦ φˇ ◦ m m

(24)

ν+1 , (25) φˇik = 2 ν −1 ρp 2 + mˇ pi + vˇ pi γˇik = , (26) 2 2 Ny , (27) φˇnk = φn + 2

⎤−1 ⎡

2

ˇ t ˇ ˇ ˇs+m ˇ p ) 2 + V 2γn−1 + y˜ − H(m o + H H : (Vs + V p ) 1 ⎦ (28) γˇnk = ⎣ , 2 8


H Ayasso et al

Ns , φˇsk = φs + 2 −1

2

2

Dα m ˇs ˇ s 2 + Dβ m ˇ s 2 + (Dtα Dα + Dtβ Dβ ) : V k −1 γˇs = γs + , 2 ˇ ko = (ρo + ρ¯n Nd )−1 , V k k ˇ o moρo + ρ¯n ˇo =V yi − yˇ i , m

(29) (30) (31) (32)

i∈

where ρ¯n = ρn Q(ρn ) = γˇn φˇn ,

ρ¯s = ρs Q(ρs ) = γˇs φˇs ,

(33)

Ny = Nd × NT = Dim(y),

Ns = Dim(s),

(34)

ˇs+m ˇ p ), yˇ = H(m A1 = ai j ,

ˇ o, y˜ = y − m A:B= ai j bi j ,

(35)

i, j

(36)

i, j

where λs and λ p are the step values for s and p, respectively, whose values are given in appendix A, B = Diag (A) gives a diagonal matrix B out of the diagonal elements of A, a ◦ b is the Hadamard product (element-wise) between vectors a and b and is a set of observation for which the offset is considered constant. As observed from the previous equations, the shaping parameters are mutually dependent. Therefore, their optimal values should be found iteratively. At the convergence (iteration K), all parameters can be estimated directly from the approximating marginal Q(u), using one of the conventional estimators (MAP or PM). For ˇ Ks for MAP and PM example, the smooth component s has the same estimate value sˇ = m estimators. We have used the PM estimator for all the parameters of the model. In order to establish a stopping criterion, the negative free energy F (Q) is evaluated and the algorithm is stopped when its value is stabilized. In the next section, the expression of this energy is given as a function of the shaping parameters and the other problem constants. 3.3. Negative free energy The Kullback–Liebler divergence cannot be evaluated most of the time because of the presence of model evidence which is often intractable. We can decompose this divergence (equation (17)) into a function of model evidence P (y|M) and the free negative energy F (Q), which is given as F (Q) = PQ + H(Q).

(37)

Here, H(Q) is the entropy of Q. This energy can be calculated from the shaping parameters of the approximating posteriors and the other problem constants and its evolution corresponds to the evolution of the divergence since the evidence is constant for a given model. After development, the following expression is found:

2 ρ¯n ˇs +V ˇ p ) + V ˇ o1 − ρ¯s Dα m ˇ o − H(m ˇ s 2 ˇs+m ˇ p )22 + Ht H : (V y − m F (Q) = − 2 2

2 t ρp 2 Ny + Ns + Np t ˇs − ˇ s 2 + Dα Dα + Dβ Dβ : V log (2π ) + Dβ m γˇi φˇi mˇ pi + vˇ pi − 2 i 2 9


H Ayasso et al

ν − 1 2( ν2 ) 2 i γ˜i φ˜i ν 1 log det Dtα Dα + Dtβ Dβ + − log (φ˜i ) + log (γ˜i ) − 2 2 ν 2 ν i Ns + Np 1 ˜s + log (2π e) + log det V + φ˜i + log(γ˜i (φ˜i )) + (1 − φˇi )(φˇi ) 2 2 i

+

ˇ 1 ˇ p ) − γˇn φn − φn log(γn (φn )) + (φˇn + log(γˇn (φˇn )) + (1 − φˇn )(φˇn )) log det(V 2 γn Ny γˇs φˇs + φn + − 1 ((φˇn ) + γˇn ) − + (φˇs + log(γˇs (φˇs )) + (1 − φˇs )(φˇs )) 2 γs Ns (38) + φs + − 1 ((φˇs ) + γˇs ) − φs log(γs (φs )), 2 +

∂ where (x) is the gamma function and (x) = ∂x log ((x)) is the digamma function. Despite the length of the free-energy expression, most of its values are already computed during iterations. Therefore, its calculation does not add any computational burden. Besides helping to determine the stopping criterion, this expression is used to obtain the optimal step value as shown in appendix A.

3.4. Algorithm layout After giving the theoretical fundamentals for our approach in the previous sections, we discuss herein its practical implementation. An initialization phase is always needed because of the iterative nature of the problem. This phase is very important since the optimization problem in its parametric form is non-convex and an arbitrary initialization may lead to a local solution. ˇ 0p might lead to the divergence of ˇ 0s and V Moreover, a bad initialization of the variances V the method. Therefore, we propose an initialization phase based on the maximum likelihood estimation of different variables. The method gives a satisfactory initialization for different parameters for all the treated datasets despite the big difference in final values. The mean value t ˇ 0s = Ut y ), which ˇ 0s is initialized by a normalized retro-projection (i.e. m for the smooth part m U1 is known in the literature as the co-addition map maker. The point source part is initialized by a null image. Noise and smooth source precisions (ρ¯n0 , ρ¯s0 ) are determined empirically from ˇ 0p are set to the following values: ˇ 0s and V initial means. Meanwhile, the variances V ˇ 0s = V

3 , Diag ρ¯n0 Ht H + ρ¯s0 Dtα Dα + Dtβ Dβ

(39)

ˇ 0p = V

3 t, Diag ρ¯n0 Ht H + ρ p γˇ φˇ

(40)

which have the same form as that of the limit of the sequence defined by the update ˇ 0o is set to zero. Moreover, the hyperparameters equations (21) and (23) and the offset value m γs , φs , γn , φn , ρo and m0 are chosen to have flat prior (i.e. φs Ns , φn Ny , γs

(||Dα s0 ||2 + ||Dβ s0 ||2 )−1 , γn ||y − Hx0 ||2 and ρo ρn ). In practice, we set these values 6 N Ns 106 −6 0 to φs = 20 , φn = 20y , γs = ||Dα s0 ||210+||Dβ s0 ||2 ,γn = ||y−Hx ρ¯n and mo = 0 . The 0 ||2 , ρo = 10 complete outline of the algorithm is given in algorithm 1. 10


H Ayasso et al

Algorithm 1. Calculate Q(s, p, ρ, θ).

ˇ 0s , V ˇ 0p , m ˇ 0s , m ˇ 0p , V ˇ 0o , γˇ 0 , γˇs0 , γˇn0 (i) Initialize the parameters m (ii) for the iteration k compute the following values: (iii) λs from equation (A.17) ˇ ks , m ˇ ks from equations (21) and (22), (iv) V (v) λ p from equation (A.29) ˇ kp , m ˇ kp from equations (23) and (24), (vi) V ˇ o from equation (32), (vii) m k (viii) γˇ from equation (26), (ix) γˇnk from equation (28), (x) γˇsk from equation (30), (xi) F (Q) from equation (38), (xii) repeat previous steps till convergence ( iteration K). ˇ Kp for point source part which leads to the total ˇ Ks for the smooth part, pˆ = m Estimated values are sˆ = m estimated image as xˆ = sˆ + pˆ .

In the following section, the proposed algorithm is evaluated by simulated and real data from the Herschel telescope. 4. Application In the previous section, we have discussed the theoretical basis for our approach where the estimator equation is given and the algorithm layout is explained. In order to demonstrate the validity of our approach, the algorithm is tested against simulated and real data. The objective of the simulation step is to show the capacity of not only reconstructing the sky map but also estimating the hyperparameters (noise variance and smooth part variance). We have tested our algorithm on four simulated skies covering several scenarios. For the real data test, we have used data obtained for the photometer of the SPIRE instrument of the space telescope Herschel (Pilbratt et al 2010). This test allows us to verify the performance for the real application and its robustness regarding errors in our theoretical model. 4.1. Simulations The simulated data are prepared using four skies composed of a sum of smoothly varying source and some broad point sources4. Then, it is passed through the forward model5 and white noise is added. In the following, the structure of each of the test objects and its restoration results using our method is discussed. We compare also more conventional methods like the co-addition and a Bayesian map maker based on smooth only (Orieux et al 2011). 4.1.1. Model-generated sky (I1 ). In this test, the smooth part is obtained using a sample of Markovian field with a correlation parameter ρs = 200. Meanwhile, the point source component is composed of 13 sources with the following fluxes: Source Flux Source Flux

1 16.4 11 190.0

2 22.9 12 282.0

3 439.0 13 416.0

4 434.0

5 198.0

6 48.0

7 563.0

8 274.0

9 522.0

10 430.0

4 A broad point source occupies more than one pixel which is the case for most point sources studied in real application. Furthermore, it makes the analysis more robust to changes in the pixel size. 5 The same forward model of SPIRE/Herschel instrument is used to generate the simulated data. The PSF has a six-pixel full-width at half-maximum.

11


H Ayasso et al 18

16

14

12

10

(a)

(b) 8

6

4

2

0

(c)

(d)

Figure 3. Comparison between different map-making methods for test sky I1 . (a) True sky, (b) co-addition map, (c) smooth only map maker and (d) proposed method xˆ .

The variance of the used white noise is ρn−1 = 1.3 × 10−3 . This gives a signal to noise ratio SNRs = 10 log(ρn ∗ s2 ) = 6 dB for smooth part and SNR p = 10 log(ρn ∗ pfor a source ) for point sources between 20 and 50 dB. These values are fixed according to SNR levels estimated in real data. This dataset will allow us to verify the quality of reconstruction for a sky corresponding to our prior. It will also confirm the quality of hyperparameter estimation. Figure 3 shows a comparison between the real map and the estimated maps using different methods. All the methods were able to restore the sky in an acceptable form. However, the co-added map gives less spatial resolution since it does not account for the optical transfer function of the instrument. This can be seen first by the form of the point source which looks like a spot. The smooth only method gives better spatial resolution. Nevertheless, the sources still have spot shape narrower than the one in the co-added map. Moreover, a dark ring appears around each source due to the mismatch between the smooth prior and the strong variation of point sources. Meanwhile, our method was able to give an estimation of the map which is very close to that of the true one without any artefacts. 12


H Ayasso et al

(a)

(b)

Figure 4. (a) Slice cut comparison for test sky I1 at coordinates β = 442, α ∈ [550, 650] (given in pixels). (b) Relative error for both smooth and point source components versus the signal to noise ratio in dB.

0

2

4

(a)

6

8

10

12

14

16

18

(b)

Figure 5. Comparison between point sources for test sky I1 : (a) true sky and (b) proposed method pˆ .

In order to push the analysis further, we have made a cut across (figure 4(a)) one of the point sources to see in detail how each method reacts to it. As we can see, the co-add gives a bell-shaped form of the point source with a width that corresponds to the width of the main lobe of the PSF. The smooth only method gives narrower main lobe but with some oscillations (Gibbs phenomena), which corresponds to the dark ring artefact. Our method has the ability to better reconstruct the point source. Nevertheless, the peak value of intensity is higher than that of the true source and it drops quickly due to sparse prior. However, this source shape modification does not affect the estimated source flux as demonstrated in figure 9. 13


0

2

4

H Ayasso et al

6

8

10

12

(a)

14

16

18

(b)

Figure 6. Comparison between smooth source for test sky I1 : (a) true sky and (b) proposed method sˆ.

300

900 800

250

700 200

600 500

150

400

100

300 Estimated Real

200 100

0

500

1000 (a)

1500

2000

50 0

Estimated Real 0

500

1000 (b)

1500

2000

Figure 7. Hyperparameter evolution through iterations for test sky I1 : (a) noise precision ρn and (b) extended source precision ρs .

The important property of our method is also its capacity to separate point emission pˆ from the extended sˆ, which is highly interesting for the astrophysical analysis. Figures 5 and 6 display a comparison between the true and the estimated point and smooth sources, respectively. The method gives very good quality compared to the true data (relative error, s−mˇ Ks 22 , is only 4% between real and estimated smooth component). er = s22 Furthermore, we study the performance of the method with a varying signal to noise ratio (figure 4(b)). As expected, relative error increases with increasing noise. Moreover, the smooth component is more resistant to noise than the point source one because of the Markovian prior 14


H Ayasso et al

(a)

(b)

Figure 8. More hyperparameters for test : (a) Student’s precision ρ and (b) a log-variance t sky I21 x| map for the residual log (Vres ) = log U |y−Hˆ . Ut 1 600

600

550 500 500

450

400

400 300 350

300

200

250 100 200 Estimated True 150 200

300

400

500

(a)

600

700

800

0

0

2

4

6

8

10

12

14

(b)

Figure 9. Extracted sources properties for test sky I1 : (a) centres (axis α versus β) and (b) fluxes versus source number.

which guaranties the smoothness. Meanwhile, with a high noise level, more noise points are interpreted as point sources which makes the error increase at a higher rate. In addition to the main variables of interest s and p, our method is capable of estimating the hyperparameters of the model like the noise precision and the source which determine the trade off between data fitting and prior. Figure 7 shows the evolution of the hyperparameter estimations with iteration compared to their true values. At convergence, the parameters reach values close to the true ones as expected since it is hardly attainable because of the uncertainty on s and p.6 The values of γs and γn are systematically under-estimated since their estimation involves second-order moments of s, p and o. However, these errors on the hyperparameters have a small effect on the quality of the sky components. For example, in comparison with a supervised reconstruction using the true hyperparameter values, the relative error between smooth components is only 2% 6

ˇ o1 , Ht H : (V ˇs +V ˇ p ) and (Dtα Dα + Dt Dβ ) : V ˇ s. The uncertainty of variables is reflected by the terms V β 15


H Ayasso et al 9 8 7 6 5

(a)

(b)

4 3 2 1 0

(c)

(d)

Figure 10. Comparison of estimated map for I2 showing sources with shortest distances: (a) true, (b) co-add, (c) smooth only and (d) proposed method xˆ .

(a)

(b)

Figure 11. Slice view through sources for two distances : (a) our method resolution corresponding to 2 pixels and (b) nominal resolution of the instrument corresponding to full-width at half-maximum of the PSF (herein 6 pixels).

Figure 8(a) shows the precision parameter ρ introduced by Student’s t-distribution as a byproduct. Although it is not useful for the astrophysical analysis, presenting it is helpful to understand how this prior works. For the observed region, the precision takes high values which forces the corresponding point source map p to zero. Around sources, the precision value drops in a way that helps the point source escape to match the value given by the data. ˇ s are also a useful byproduct of the algorithm to help ˇ p and V The map variances V establish a confidence value. Nevertheless, values depend more on the configuration of these the measurement (coverage map Diag Ht H ) and the model precision parameter. In a real application, one is more interested in an error estimation on each pixel. A variance map of t x|2 is shown in figure 8(b) which gives the error distribution over the residual Vres = U |y−Hˆ Ut 1 16


H Ayasso et al

12

10

8

(a)

(b) 6

4

2

0

(c)

(d)

Figure 12. Comparison between different map-making methods for test sky I3 : (a) true sky, (b) co-addition map, (c) smooth only map maker and (d) proposed method xˆ .

the map pixels, so pixels associated with big error (outliers) can be rejected. As we can see, the variance map has a quasi-uniform distribution over the observed area with undetermined values corresponding to non-observed pixels. One of the essential quantities studied by astrophysicists is the distribution of sources flux7. Hence, it is very important to verify that our method conserves fluxes. Therefore, the position and the flux of each source are computed and compared to the true values. Figure 9 shows accordance in source properties between original sources and restored ones. Although reconstructed sources tend to have higher intensities than the true value at the central pixels, neighbours have lower values and the integral stays conserved. 4.1.2. Stars only sky (I2 ). This is composed only of point sources. They are distributed in 12 groups of twin sources with the same flux (160) but it was arranged so that the distance 7

The source flux Fp is defined as the integral of its intensity over its support, i.e. Fp =

sup(p)

p(α, β )dαdβ. 17


0

2

H Ayasso et al

4

6

8

(a)

10

(b)

Figure 13. Comparison between point sources for test sky I3 : (a) true sky and (b) proposed method pˆ .

0

2

4

(a)

6

8

10

(b)

Figure 14. Comparison between smooth source for test sky I3 : (a) true sky and (b) proposed method sˆ.

between each twin increases linearly between 1 pixel and 12 pixels. This test helps define the spatial resolution of our method (i.e. the smallest distance between two point sources detectable without confusion). The variance of the used white noise is ρn−1 = 1.4 × 10−3 . The reconstruction results (figure 10) show that our method is able to resolve sources with 2 pixels of distance only. Meanwhile, co-addition or smooth only have confused these sources. We can see this clearly by taking a slice view through the centre of sources (figure 11). For a distance of two pixels between the sources, the reconstructed profile has two distinct peaks 18


H Ayasso et al

5.1

5

4.9

4.8

4.7

4.6

4.5

True sky Coadd Smooth Smooth+Point

4.4

4.3

0

20

40

60

80

100

12 0

(a)

(b)

Figure 15. (a) Slice view comparison for test sky I3 at coordinates α = 510, β ∈ [400, 500] (given in pixel number). (b) Circular mean of power spectra for the cirrus part in the test sky I3 . The ‘black +’ line denotes the spectrum of true cirrus. The blue ‘’ represents the spectrum obtained by co-addition of all the data. The green ‘×’ gives the spectrum of the co-added map using special data based on the cirrus only without the point sources. The red ‘’ is the spectrum of the smooth part sˆ given by our method.

with the value between them dropping to less than half maximum while other methods give only one peak. Hence, the gain of resolution is at least three times compared to the nominal resolution of the instrument, which corresponds to full-width at half-maximum of the PSF (figure 11(b)).

4.1.3. Cirrus with stars (I3 ). The goal of this simulation is to study the quality of our method when the studied sky is not model generated. Therefore, the simulated sky is composed of a high-resolution image of a galactic cirrus superposed with six sources, whose fluxes are as follows: Source Flux

1 562.0

2 248.0

3 86.2

4 153.0

5 110.0

6 55.4

and the noise variance used is ρn−1 = 1.67 × 10−3 , which gives a SNRs = 20 dB and a SNR p = 40 dB. Figure 12 show the global reconstruction results which confirm a good reconstruction quality compared to real sky. Furthermore, the separation between the smooth and point sources has good performance compared to original maps (figures 14 and 13, respectively). This proves that the proposed Markovian model for the smooth part corresponds in a good way to real astrophysical structure like the cirrus. In this part, we take a closer look at the smooth part restoration quality and flux conservation. A cut across the cirrus (figure 15(a)) shows that our method is able to restore most of its spatial details as for the smooth only reconstruction method. Meanwhile, the co-addition map fails to capture these details since it does not have a correction for the PSF. 19


H Ayasso et al 250

200

150

(a)

(b)

100

50

0

(c)

(d)

Figure 16. Comparison between different map-making methods for test sky I4 : (a) true sky, (b) co-addition map, (c) smooth only map maker and (d) proposed method xˆ .

In order to better understand the limit of the spatial resolution gain for the smooth part, we study the circular mean of the power spectrum (CMPS)8 of different maps (see figure 15(b)). All power spectra agree on the true value for spatial frequencies lower than 10−3 pixel−1 . However, for higher frequencies, the CMPS of the co-add map for the cirrus only drops due to the instrument PSF. Meanwhile, we can see that our method power corresponds to the true one for a higher frequency interval. This interval is generally determined by the frequency for which noise dominates the signal. This CMPS is a widely used tool by astrophysicists to study the statistical properties of the spatial structures of dust clouds. Nonetheless, the existence of galaxies and stars in the analysed map will affect the quality of results. That can be seen from the CMPS for the co-added map with stars (figure 15(b)). Higher frequencies have higher power than true value CMPS(s) = fs (r) = polar coordinates.

8

20

2π 0

|F T [s](r, θ )|2 dθ , where F T [s] is the two-dimensional Fourier transform expressed in


H Ayasso et al 250

200

150

(a)

(b)

100

50

0

(c)

(d)

Figure 17. Enlarged bottom right corner (magenta rectangle in figure 16(a)) for test sky I4 to study the effect of offset: (a) true sky, (b) co-addition map, (c) smooth only map maker and (d) proposed method xˆ .

for the smooth part because of the existence of point sources. Hence, the importance of our method is for proper study of dust properties. 4.1.4. Galaxy image (I4 ). In this last simulation, we study the capacity of our method to estimate detector offsets in a context of realistic sky. We have chosen a real optical image of Messier galaxy containing a mixture of smooth and point sources. In addition, we introduced a random vector of offsets on the detectors in order to verify the quality of offset estimation proposed by the method. The noise variance is set to ρn−1 = 0.14 leading to SNRs = 41 dB. A general comparison (figure 16) of reconstruction results confirms the capacity of our method to restore high frequencies. Moreover, taking a closer look (figure 17), we observe a strip effect present for methods not accounting for offsets since different detectors have 21


H Ayasso et al

(a)

(b)

Figure 18. Offset estimation study for test sky I4 : (a) real and estimated offsets comparison and (b) relative error.

0

50

100

(a)

150

200

(b)

Figure 19. Smooth and point images for test sky I4 : (a) smooth part and (b) point part (magnified by a factor of 4).

different values when observing the same sky region. Nevertheless, this effect was corrected by the proposed method. Furthermore, estimated offset analyses and comparison with the true value (figure 18) prove the quality of our method estimation with a maximum relative error inferior to 2%. Moreover, the bright stars inside the galaxy were separated from the smooth structure as shown in figure 19. 22


0

0.2

0.4

H Ayasso et al

0.6

0.8

1

1.2

Figure 20. Comparison between estimated maps for a part of Polaris flare in short wavelength band (PSW) (colour scale is saturated to see details in co-added): (a) co-added map and (b) proposed method xˆ .

4.2. Real data Real data tests were performed using SPIRE data sent by Herschel. We study two fields: the Polaris flare, which is a nearby high Galactic latitude cirrus cloud presenting no sign of star-formation activity, and an extragalactic field ‘17p723’ (courtesy Hervé Dole). Data were pretreated by HIPE (level-1)9. For the optical model, the simulated PSF function, proposed by (Sibthorpe et al 2011), was used. We see on figures 20–23 that all the fuzzy punctual sources seen on the co-added images are identified as point sources by our method. Most of them are due to unresolved distant galaxies. However the faintest ones can also be due to noise, since on real data, the noise contains a 1/f nature which is not properly taken into account in our model (see section 2.1) beside pointing and other model errors. For the Polaris flare, we also detect a few point sources which are associated with bright filamentary structures. They are due to an increase of the local gradient and do not necessarily correspond to real punctual objects (for instance pre-star forming cores) located within these structures. Finally, the comparison of the co-added and smooth maps obtained with our method illustrates the gain of angular resolution. The small scale structures in the smooth maps of both fields (left panels of figures 21 and 23) are expected to reveal confused unresolved galaxies, but the noise can also contribute. A detailed analysis is currently undergoing to quantify the gain in source detection and angular resolution. For the Polaris flare, the measured signal to noise ratios are SNRs = 7 dB and SNR p = 40 dB for the smooth and the point sources images, respectively, which correspond to the levels set in the first simulation set (section 4.1.1). For the extragalactic field, the signal to noise ratio is lower (SNRs = −16 dB) since this field contains less extended emission than the Polaris flare. 9 Herschel Interactive Processing Environment (HIPE) is the official data processing tool for the Herschel telescope to correct effects not included in our model (electronics,...).

23


0

0.2

H Ayasso et al

0.4

0.6

0.8

1

1.2

Figure 21. Point and smooth images for field Polaris flare in short wavelength band (PSW): (a) smooth source map sˆ and (b) point source map pˆ .

0

0.05

(a)

0.1

0.15

0.2

0.25

0.3

(b)

Figure 22. Comparison between estimated maps for field ‘17p723’ in short wavelength band (PSW) (colour scale is saturated to see details in co-added): (a) co-added map and (b) proposed method xˆ .

From a point of view of execution time, it varies according to the field and pixel sizes. For a medium field (≈ 45 × 45 arcmin) like ‘17p723’ with (0.1 × 0.1 arcmin) pixel size, R running on a personal computer with 2.66 GHz reconstruction takes 20 min using Matlab Intel core 2 Duo processor. Meanwhile, it may take up to 30 h for a huge field like the Polaris flare (≈ 240 × 240 arcmin). 24


(a)

H Ayasso et al

(b)

Figure 23. Smooth and point images for field ‘17p723’: (a) smooth source map sˆ and (b) point source map pˆ (magnified by a factor of 4).

5. Conclusion We have presented in this paper a new Bayesian approach for unsupervised super-resolution map-making. Furthermore, the method performs jointly a separation between smooth and point sources. This is achieved by choosing a Markovian prior for the smooth part and heavy tail t-distribution for point sources. Joint estimation is performed using a new gradient-like variational Bayesian method which approximates the true posterior distribution by a free-form separable distribution. The estimates are given in an iterative form. The performance of the proposed method is studied by means of several simulated and real datasets. The results show a clear improvement in the quality of estimation compared to conventional methods. Indeed, the spatial resolution was enhanced by a factor of 3. Furthermore, the power spectrum study proves the resolution gain by a restoration of a wider range of frequencies. The conservation of photometry, which is very important for astrophysical study, was also verified. Further improvements in cosmological parameter estimation study is in progress and it will be published soon. Nevertheless, there are several enhancements that can be made to the method. For example, in the astrophysical context, additive noise is correlated, so the Markov chain model for noise is more adapted. Moreover, the prior assigned to the smooth part supposes stationarity over the image space. This might be untrue for certain fields where we have small dust clouds over large dark sky. Therefore, we are looking into non-stationary priors. Furthermore, for some applications, point source positions and intensities are required rather than their map. For this reason, we are considering a new prior which accounts directly for these parameters.

Acknowledgments The authors are grateful to Hervé Dole for the real dataset and for the helpful discussion. 25


H Ayasso et al

Appendix A. Optimal step values derivation The gradient-like VBA introduces a new variable λ which regulates the update rate between the approximating posterior Q(·) and the joint distribution mean value as given in equation (19). This step value can be calculated in an optimal way to guarantee the fast convergence of the shaping parameters towards their final values. For this purpose, we apply Newton’s method on the negative free-energy function, so the optimal step is written as λopt = −

∇F (λ) ∇ 2 F (λ)

,

(A.1)

λ=0

and ∇ 2 F = ∂∂λF2 where ∇F = ∂F ∂λ Before applying the Newton method, we rewrite the update equations for p and s with more convenient notations: " ! −1 −1 ˇ ks = (1 − λs )V ˇ −1 V , (A.2) s + λs Rs 2

where

ˇ ks d s , ˇ ks = m ˇ s + λs V m

(A.3)

" ! −1 −1 ˇ kp = (1 − λ p )V ˇ −1 V , p + λ pR p

(A.4)

ˇ kp d p , ˇ kp = m ˇ p + λ pV m

(A.5)

Rs = Diag ρ¯n Ht H + ρ¯s Dtα Dα + Dtβ Dβ ,

(A.6)

ˇp+m ˇ s )) − ρ¯s (Dtα Dα + Dtβ Dβ )m ˇ s, d s = ρ¯n Ht (y − H(m

(A.7)

t

R p = Diag(ρ¯n Ht H + ρ p γˇ φˇ ),

(A.8)

˜ p. ˜s+m ˜ p )) − ρ p γ˜ ◦ φ˜ ◦ m d p = ρ¯n Ht (y − H(m

(A.9)

Then, we rewrite the negative free energy in two versions as a function of λs and λ p in order to use them to obtain the optimal values. A.1. Smooth component optimal step value λs First, we reduce the expression of the negative free energy to a function of smooth part step λs , so it reads

2 k 2 t ρ¯n

H m ˇ ks − ρ¯s ˇs +m ˇ ks ˇ kp 2 − 2m ˇ ks Ht y + Ht H : V Fs (λs ) = −

Dα m 2 2 2

2

k ˇ ks ) . ˇ s + 0.5 log det(V + Dβ m (A.10) 2

Then, we calculate the first and the second derivatives that yield the following: vˇ s k ∂Fs (λs ) i ˇ s + 0.5 = dts m − rs−1 vˇ s i k , i k ∂λs v ˇ s i i k ∂ 2 Fs (λs ) t t kt k ˇ s k Ht Hm ˇ s k Dtα Dα + Dtβ Dβ m ˇs ˇ s d s − ρ¯n m ˇ s k − ρ¯s m =m 2 ∂λs vˇ s k vˇ si k − vˇ s k2 i i + 0.5 − rs−1 vˇ si k , i k2 v ˇ si i 26

(A.11)

(A.12)


H Ayasso et al

where vi is diagonal element i of matrix V and ˇ s k d s , ˇ ks d s + λs V ˇ s k = V m

(A.13)

ˇ s k d s , ˇ s k d s + λs V ˇ s k = 2V m

(A.14)

−1 ˇ s k = V ˇ s − R−1 ˇ ks 2 V V , s

(A.15)

−1 2 ˇ s k = V ˇ s − R−1 ˇ ks 3 V V , s

(A.16)

and the optimal value in this case becomes ˇ s0 + 0.5 i ts2i dts m opt λs = ˇ s0 t ds + m ˇ s0 + 0.5 i ts3i ˇ s0 t ρn Ht H + ρs Dtα Dα + Dtβ Dβ m m

(A.17)

with tsi = rs−1 vˇ s0i − 1, ∀i, i ˇs ∂ 2m ∂λ2s

ˇs ∂m ∂λs

ˇ s0 d s , ˇ s0 = 2V =m λ=0

ˇ s0 d s , ˇ s0 = V =m

(A.18)

λ=0

ˇs ∂V ∂λs

ˇ s0 = −V ˇ s0 Ts . =V

(A.19)

λ=0

A.2. Point source part optimal step λ p As for the case of smooth component, we reduce the negative energy to a function of λ p (F p (λ p )). So, the new function reads 2 2 t ρ¯n

ˇ kp − ρ p ˇ ks + m ˇ kp − 2m ˇ kp Ht y + Ht H : V γˇi φˇi mˇ kpi + vˇ pi F p (λ p ) = −

H m 2 2 2 i ˇ kp )). +0.5 log(det(V

(A.20)

ˇ p , we found After arranging the terms including the variance V t 2 ρn Ht H + ρ p γˇ φˇ ρ p k2 t ρ¯n k k kt t ˇ ˇ kp ˇs +m ˇ γˇ ◦ φ − ˇ p − 2m ˇ pH y − m :V F p (λ p ) = −

H m 2 2 2 p 2 k ˇp (A.21) + 0.5 log det V ρ¯n =− 2

2 ρ p k2 t

k k kt t ˇs +m ˇ p γˇ ◦ φˇ + 0.5 ˇ p −2m ˇ pH y − m log vˇ kpi − r−1 ˇ kpi .

H m pi v 2 2 i (A.22)

Calculating the first and the second derivatives of this function, gives the following expressions: vˇ pi k ∂F p (λ p ) t ˇ p k d kp + 0.5 =m − r−1 ˇ pi k , pi v k ∂λ p v ˇ p i i

(A.23)

vˇ p k vˇ pi k − vˇ p k t ∂ 2 F p (λ p ) kt k kt t k k2 i i ˇ + 0.5 ˇ ˇ ˇ ˇ ˇ m m γ ◦ φ = m d − ρ ¯ H H m − ρ − r−1 ˇ pi k , n p p p p p p pi v k2 ∂λ2p v ˇ p i i 2

(A.24) 27


H Ayasso et al

where ˇ p k d p , ˇ kp d p + λ p V ˇ p k = V m

(A.25)

ˇ p k d p , ˇ p k d p + λ p V ˇ p k = 2V m k 2 −1 ˇ p k = V ˇ p − R−1 ˇp V V p , 2 k 3 −1 ˇ p k = V ˇ p − R−1 ˇp V V , p

(A.26) (A.27) (A.28)

and the optimal value according to Newton’s method will read ˇ p0 t d p + 0.5 i t p2i m opt λ p = , ˇ p0 t d p + ρn m ˇ p0 Ht Hm ˇ p0 2t γˇ ◦ φˇ + 0.5 i t p3i ˇ p0 + ρ p m m with ˇp ∂m ˇ p0 d p , ˇ p0 = V t pi = r−1 ˇ pi − 1, ∀i, =m pi v ∂λ p

(A.29)

(A.30)

λ p =0

ˇp ∂ 2m ∂λ2p

ˇ p0 d p , ˇ p0 = 2V =m λ p =0

ˇp ∂V ∂λ p

ˇ p0 = −V ˇ p0 T p . =V

(A.31)

λ p =0

Appendix B. Approximating marginals’ derivation In this appendix, the details of approximation of posterior marginals are provided. The key equation used to obtain these forms is equation (19). First the logarithm of the distribution at iteration k is calculated as a function of the distributions at the iteration k − 1 and the other model parameters (for readability, we only denote parameters at the iteration k with a superscript ‘k’). Then, the shaping parameters are obtained by identification: Q(si ) : log(Qk (si )) ∝ (1 − λs ) log (Q(si )) + λs P (y, u) j=i Q(s j )Q(u/s ) ρ¯n h2ji + ρ¯s (dα2 ji + dβ ji dβli ) s2i − 2si ∝ (1 − λs ) vˇ si s2i − 2mˇ si si + λs j

× ρ¯n h ji y j − mˇ o j − h jl mˇ xl + h j,i mˇ pi + ρ¯s (dα j,i dα j,l + dβ j,i dβ j,l )mˇ sl l=i

j

j,l=i

(B.1) with Q(u/s ) = Q(p)Q(ρ)Q(o)Q(ρn )Q(ρs ), mˇ x = mˇ s + mˇ p . By identification, we find ⎛ ⎞−1 ρ¯n h2ji + ρ¯s (dα2 ji + dβ2 ji ) ⎠ , vˇ ski = ⎝(1 − λs )ˇvsi + λs

(B.2)

j

mˇ ksi = (1 − λs )mˇ si + λs ρ¯n h ji (y˜ j − h jl mˇ xl + h j,i mˇ pi ) k vˇ si j l=i + ρ¯s (dα j,i dα j,l + dβ j,i dβ j,l )mˇ sl . j,l=i

28

(B.3)


H Ayasso et al

By adding the missing mˇ si to the right-hand side of the previous equation and putting it in a vectorial form, we obtain the final form given in equations (21) and (22): Q(pi ) : k log Q (pi ) ∝ (1 − λ p ) log (Q(pi )) + λ p P (y, u) j=i Q(p j )Q(u/p ) 2 ρ¯n h2ji + ρ p γˇi φˇi ) p2i ∝ (1 − λ p ) vˇ pi pi − 2mˇ pi pi + λ p j

− 2 ρ¯n

h ji (y j − mˇ o j −

ˇ h jl mˇ xl + h ji mˇ si ) + ρ p γˇi φi pi .

l=i

j

(B.4) By identification, we find ⎞−1 ⎛ vˇ kpi = ⎝(1 − λ p )ˇv pi + λs ρ¯n h2ji + ρ p γˇi φˇi ⎠ ,

(B.5)

j

mˇ kpi vˇ kpi

⎛ = (1 − λ p )mˇ pi + λ p ⎝ρ¯n

h ji (y˜ j −

⎞ h jl mˇ xl + h ji mˇ si ) + ρ p φˇi γˇi ⎠ .

(B.6)

l=i

j

By adding the missing mˇ pi to the right-hand side of the previous equation and putting it in a vectorial form, we obtain the final form given in equations (23) and (24): Q(ρi ) : k log Q (ρi ) ∝ (1 − λρ ) log (Q(ρi )) + λρ P (y, u) j=i Q(ρ j )Q(u/ρ ) ρi ˇ ∝ (1 − λρ ) − + (φi − 1) log (ρi ) γˇi ρp 2 ν−1 νρi + λρ − pi ρi + log (ρi ) − . (B.7) 2 2 2 By identification, we find ν+1 φˇik = (1 − λρ )φˇi + λρ , (B.8) 2 # γˇik =

(1 − λρ ) + λρ γˇi

ν + ρ p (mˇ 2pi + vˇ pi ) 2

$−1 ,

(B.9)

Q(ρn ) : k log Q (ρn ) ∝ (1 − λρn ) log (Q(ρn )) + λρn P (y, u)Q(u/ρn ) y − o − H(s + p)22 ρn ∝ (1 − λρn ) − + (φˇn − 1) log (ρn ) + λρn − ρn γˇn 2 Ny ρn + . (B.10) + φn − 1 log (ρn ) − 2 γn By identification, we find Ny + φn , φˇnk = (1 − λρn )φˇn + λρn 2

(B.11) 29


# γˇnk

=

H Ayasso et al

2

$−1

y˜ − H(m ˇs+m ˇ p ) 2 + i vˇ oi + i, j h2ji vˇ si + vˇ pi (1 − λρn ) −1 + λρn γn + , γˇn 2 (B.12)

Q(ρs ) : log(Qk (ρs )) ∝ (1 − λρs ) log (Q(ρs )) + λρs P (y, u)Q(u/ρs )

2 Dα s22 + Dβ s 2 ρs ρs ∝ (1 − λρs ) − + (φˇs − 1) log (ρs ) + λρs − γˇs 2 Ns ρs + φs − 1 log (ρs ) − + . (B.13) 2 γs By identification, we find Ns k ˇ ˇ + φs , φs = (1 − λρs )φs + λρs 2

(B.14)

⎞⎤−1 ⎛

Dα mˇ s 2 + Dβ mˇ s 2 + dα2 ji + dβ2 ji vˇ si i, j (1 − λ ) 2 2 ρs ⎠⎦ , γˇsk = ⎣ + λρs ⎝γs−1 + γˇs 2 ⎡

(B.15) Q(oi ) : log(Qk (oi )) ∝ (1 − λo ) log (Q(oi )) + λoP (y, u) j=i Q(o j )Q(u/o ) 2 ∝ (1 − λo ) vˇ oi oi − 2mˇ oi oi + λo (Nd ρ¯n + ρo )o2i − 2oi ρ¯n ρomo +

yj −

j∈

j

h jl (mˇ sl + mˇ pl )

,

(B.16)

l

where is set of observation positions for which the detector offset oi is considered constant (the whole scan, a scan leg, ...) and Nd = Dim( ). By identification, we find vˇ oki = ((1 − λo )ˇvoi + λo (Nd ρ¯n + ρo ))−1 , ⎛ ⎞ mˇ koi = (1 − λo )mˇ oi + λo ⎝ρomo + ρ¯n h jl (mˇ sl + mˇ pl ) ⎠ . yj − vˇ oi j∈ l

(B.17)

(B.18)

References Bobin J, Starck J-L, Fadili J, Moudden Y and Donoho D 2007 Morphological component analysis: an adaptive thresholding strategy IEEE Trans. Image Process. 16 2675–81 Borison S L, Bowling S B and Cuomo K M 1992 Super-resolution methods for wideband radar Linc. Lab. J. 5 441–461 (http://adsabs.harvard.edu/abs/1992LLabJ...5..441B) Cantalupo C, Borrill J, Jaffe A, Kisner T and Stompor R 2010 Madmap: a massively parallel maximum likelihood cosmic microwave background map-maker Astrophys. J. Suppl. Ser. 187 212 Chantas G, Galatsanos N, Likas A and Saunders M 2008 Variational Bayesian image restoration based on a product of t-distributions image prior IEEE Trans. Image Process. 17 1795–1805 30


H Ayasso et al

Chen S, Donoho D and Saunders M 1999 Atomic decomposition by basis pursuit SIAM J. Sci. Comput. 20 33–61 De Mol C and Defrise M 2004 Inverse imaging with mixed penalties Proc. Int. Symp. on Electromagnetic Theory (Pisa, Italy) pp 798–800 Fraysse A and Rodet T 2011 A gradient-like variational Bayesian algorithm IEEE Statistical Signal Processing Workshop pp 605–8 Fraysse A and Rodet T 2012 A measure-theoretic variational Bayesian algorithm for large dimensional problems Technical Report hal-00702259 Laboratoire des Signaux et Système UMR 8506 (http://hal-archives-ouvertes.fr/docs/00/70/22/59/PDF/var_bayV8.pdf) Giovannelli J F and Coulais A 2005 Positive deconvolution for superimposed extended source and point sources Astron. Astrophys. 439 401–12 Gorokhov A and Loubaton P 1997 Subspace-based techniques for blind separation of convolutive mixtures with temporally correlated sources IEEE Trans. Circuits Syst. I 44 813–20 Goussard Y and Demoment G 1989 Recursive deconvolution of Bernoulli–Gaussian processes using a MA representation IEEE Trans. Geosci. Remote Sens. 27 384–94 Hinton G E and van Camp D 1993 Keeping the neural networks simple by minimizing the description length of the weights Proc. Sixth Annu. Confe. on Computational Learning Theory (New York: ACM) pp 5–13 Högbom J 1974 Aperture synthesis with a non-regular distribution of interferometer baselines Astron. Astrophys. Suppl. 15 417 Kasetkasem T, Arora M K and Varshney P K 2005 Super-resolution land cover mapping using a Markov random field based approach Remote Sens. Environ. 96 302–14 Kowalski M and Torrésani B 2008 Random models for sparse signals expansion on unions of bases with application to audio signals IEEE Trans. Signal Process. 56 3468–3481 Mohammad-Djafari A 2009 Super-resolution: a short review, a new method based on hidden Markov modeling of HR image and future challenges Comput. J. 52 126–41 Moussaoui S, Brie D, Mohammad-Djafari A and Carteret C 2006 Separation of non-negative mixture of non-negative sources using a Bayesian approach and MCMC sampling IEEE Trans. Signal Process. 54 4133–45 Murtagh F and Starck J 2006 Astronomical Image and Data Analysis (Berlin: Springer) Nuzillard D and Bijaoui A 2000 Blind source separation and analysis of multispectral astronomical images Astron. Astrophys. Suppl. 147 129–38 Orieux F, Giovannelli J, Rodet T, Abergel A, Ayasso H and Husson M 2012 Super-resolution in map-making based on a physical instrument model and regularized inversion. Application to SPIRE/Herschel Astron. Astrophys. 539 16 Park S C, Park M K and Kang M G 2003 Super-resolution image reconstruction: a technical overview IEEE Signal Process. Mag. 20 21–36 Patanchon G et al 2008 Sanepic: a map-making method for time stream data from large arrays Astrophys. J. 681 708 Pilbratt G L et al 2010 Herschel Space Observatory: an ESA facility for far-infrared and submillimetre astronomy Astron. Astrophys. 518 L1 Rabaste O and Chonavel T 2007 Estimation of multipath channels with long impulse response at low SNR via an MCMC method IEEE Trans. Signal Process. 55 1312–25 Robert C and Casella G 2004 Monte Carlo Statistical Methods (Berlin: Springer) Robinson M D, Chiu S J and Lo J Y 2010 Novel applications of super-resolution in medical imaging Super-Resolution Imaging (Boca Raton, FL: CRC Press) pp 384–412 Rodet T and Zheng Y 2009 Approche Bayésienne variationnelle: application a` la déconvolution conjointe d’une source ponctuelle dans une source e´ tendue Proc. 22nd Conf. GRETSI (Signal Processing and Image) (Dijon, France, 8–11 Sep.) http://hal.archives-ouvertes.fr/docs/00/44/39/80/PDF/Article.pdf Sibthorpe B, Ferlet M, Bendo G, Papageorgiou A and SPIRE ICC 2011 Spire Beam Model Release Note version 1.1 Sm´ıdl V and Quinn A 2006 The Variational Bayes Method in Signal Processing (Berlin: Springer) Snoussi H and Mohammad-Djafari A 2004 Bayesian unsupervised learning for source separation with mixture of Gaussians prior J. VLSI Signal Process. Syst. 37 263–79 Wainwright M and Simoncelli E 2000 Scale mixtures of Gaussians and the statistics of natural images Adv. Neural Inf. Proc. Syst. 12 855–61 Willett R, Jermyn I, Nowak R and Zerubia J 2004 Wavelet-based super-resolution in astronomy Astronomical Data Analysis Software and Systems (ADASS) XIII vol 314 (Astronomical Society of the Pacific) p 107 Wilson T and Hewlett S J 1991 Super-resolution in confocal scanning microscopy Opt. Lett. 16 1062–4 Zibulevsky M and Pearlmutter B A 2001 Blind source separation by sparse decomposition in a signal dictionary Neural Comput. 13 863–82

31

A variational Bayesian approach for unsupervised

A variational Bayesian approach for unsupervised

Suggest Documents

A Variational Bayesian approach for the Joint Detection ... - Mistis

NON-LOCAL UNSUPERVISED VARIATIONAL

a variational method for bayesian blind image

Variational Bayesian Approach for Interval Estimation of ... - grottke.de

A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging

A Variational Bayesian State-Space Approach to Online Passive ...

Variational Bayesian Approach for Interval Estimation of ... - grottke.de

a variational approach - IOPscience

a variational approach - IOPscience

a variational approach - SciELO

A Variational Approach for Multi-valued Velocity

A VARIATIONAL APPROACH FOR FIBER ...

A VARIATIONAL APPROACH FOR OVERLAPPING CELL ... - SFU

A variational approach for particle tracking

A Variational Approach for Color Image Segmentation

Variational Bayesian Approximation for Learning ... - Semantic Scholar

VIGoR: Variational Bayesian Inference for Genome-Wide

Variational Bayesian Grammar Induction for Natural ... - CiteSeerX

Variational Algorithms for Approximate Bayesian ... - Semantic Scholar

Variational Bayesian Grammar Induction for Natural Language

VIGoR: Variational Bayesian Inference for Genome-Wide ...

Variational Algorithms for Approximate Bayesian Inference - Computer ...

Variational Algorithms for Approximate Bayesian Inference - Computer ...

A Variational Approach for Combined Segmentation