Nonorthogonal Joint Diagonalization/Zero ... - IEEE Xplore

3 downloads 0 Views 1MB Size Report
Abstract—This paper deals with the blind separation of in- stantaneous mixtures of source signals using time-frequency distributions (TFDs). We propose ...
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

1673

Nonorthogonal Joint Diagonalization/Zero Diagonalization for Source Separation Based on Time-Frequency Distributions El Mostafa Fadaili, Nadège Thirion Moreau, and Eric Moreau, Member, IEEE

Abstract—This paper deals with the blind separation of instantaneous mixtures of source signals using time-frequency distributions (TFDs). We propose iterative algorithms to perform the nonorthogonal zero diagonalization and/or joint diagonalization of given sets of matrices. As an application, we show that the source separation can be realized by applying one of these algorithms to a set of spatial quadratic TFD matrices corresponding only to the so-called cross-source terms and/or to the so-called autosource terms. The determination of the above matrices to be jointly decomposed requires first an automatic selection procedure of useful time-frequency points. Regarding this last point, we also propose a new selection procedure and a modification of an existing one and provide a comparison with other existing ones. The nonorthogonal joint diagonalization and/or zero diagonalization algorithm’s main advantage is to not require (in the blind source separation context) a prewhitening stage, which allows them to work even with a class of correlated signals and provides generally improved separation performance. Finally, an analytical example and computer simulations are provided in order to illustrate the effectiveness of the proposed approach and to compare it with classical ones. Index Terms—Automatic time-frequency points detection, blind source separation, joint diagonalization, joint zero diagonalization, quadratic time-frequency distributions.

I. INTRODUCTION

W

E consider the blind separation of instantaneous mixtures of signals called sources (see, e.g., [1]–[4]). This problem has found numerous solutions in the past 20 years. However, more recently, a growing interest for solutions based on the use of time-frequency distributions (TFDs) could be noted [5]–[8], [16]–[18], and [20]–[23]. One of the main reasons is that they constitute an alternative to other existing methods, making it possible to consider a wider class of source signals than the classical one (statistically independent random source signals). Notice, however, that other approaches have been suggested to consider the case of nonstationary signals (see, e.g., [24]). One can distinguish between two main classes of TFDs: linear and quadratic. The use of quadratic TFDs has lead to useful algorithms based on the joint diagonalization (JD) and/or the joint zero diagonalization (JZD) of some particular sets of matrices (see, e.g., [6] and [7]).

Manuscript received December 5, 2005; revised June 8, 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ercan E. Kuruoglu. The authors are with the STD, ISITV, Université du Sud Toulon, av. G. Pompidou, BP56, F-83162 La Valette du Var, Cedex, France (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2006.889469

One of the first approaches to have emerged [5] proposes to joint diagonalize a set of spatial quadratic TFDs matrices calculated at some specific time-frequency ( - ) points: the ones that correspond only to autosource terms (there are no cross-source terms in such - points). However, a prewhitening stage is required. As shown in [9] and [15], such a prewhitening stage imposes a limit on the attainable performance in the context of blind source separation (BSS). Later in [8], [21], and [22], it was shown that this prewhitening stage can be eliminated, leading to various advantages, among which the fact that better performances are generally obtained and that the separation of “correlated” source signals can be considered. Let us notice that another purely algebraic approach has also been developed in [23]. It does not require a prewhitening stage either and, with the help of additional assumptions on the sources, it makes it possible to treat the underdetermined case, i.e., less sensors than sources. At the same time, a complementary approach was developed in [6] consisting of joint zero diagonalizing another set of spatial quadratic TFDs matrices calculated again at some specific points. However, this time, these - points correspond only to cross-source terms. Again, however a prewhitening stage is required. Finally, a solution that combines JD and JZD was also proposed [7]. This paper follows three main objectives. First, we intend to generalize the joint zero-diagonalization approach to the case of nonorthogonal matrices. The resulting algorithm is based on the optimization of a quadratic criterion. Second, we also provide a solution to combine nonorthogonal JD and nonorthogonal JZD. Finally, we illustrate the usefulness of these algorithms showing that they find applications in BSS based on spatial quadratic TFDs (SQTFDs). We compare different automatic selection procedures of particular - points (those corresponding to autosource terms only and/or cross-source terms only). In fact, most of the methods based on the use of SQTFD rely on this preliminary stage. Let us remark that several works are at least partially dedicated to the problem of the automatic selection of autosource terms (see, e.g., [6], [11], [20]), or to the problem of the automatic selection of one single autosource term (see, e.g., [18], [22], and [23]). In comparison, less literature is devoted to the problem of the cross-source terms selection (see, e.g., [6], [10], [20], and [21]). The paper is organized as follows. The basic notations and definitions for SQTFDs are given in Section II. Section III deals with the model and the related assumptions. In Section IV, we present the algebraical derivations leading to the proposed nonorthogonal JZD algorithm. We also explain how to combine the nonorthogonal JZD with the nonorthogonal JD. In Section V, the proposed algorithms are applied to the source

1053-587X/$25.00 © 2007 IEEE

1674

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

separation problem. Practical procedures used to determine the useful - points are then presented; a new one is introduced and an existing one is modified. In Section VI, the performances of the proposed methods are evaluated and compared with some other existing methods by computer simulations. Finally in Section VII, conclusions are given.

Now, let us just give four important examples. First, let us recall the expression of the spatial pseudo-Wigner transformation ) (SPW, denoted

(4) II. A REVIEW OF SPATIAL QUADRATIC TIME-FREQUENCY DISTRIBUTIONS Since the proposed developments are based on the use of spatial quadratic transforms SQTs of signals and their properties [5], [12], [13], [19], [26], let us now briefly recall the important compoints related to our developments. Considering an , the SQT is given by a plex deterministic vectorial signal matrix written as (1) where denotes the complex conjugate transpose operator. This matrix is defined componentwise by (2) for all and stands for the complex are conjugate operator. The diagonal terms of the SQT called by the generic name of auto-terms while the off-diagonal ones are called by the generic name of cross-terms. The function , which is generally a complex function, is referred to as the kernel of the transform. With regard to the source separation application presented in Sections V and VI (and more precisely the - points selection procedure), we are interested in SQTs satisfying the following Hermitian symmetry: (3) which implies, in particular, that the diagonal components of the considered SQT are real. This property gets reflected as a constraint on the kernel, which has to sat. isfy the following property: In spite of this constraint, many known transformations remain available, such as the Wigner(–Ville) transformation (W and WV), the Choï–Williams transformation, the PW PWV , the pseudo-Wigner(–Ville) transformation smoothed pseudo-Wigner(–Ville) transformation (provided conditions on the windows [13]), the Born–Jordan distribution, etc. The auto-terms correspond to the same quadratic transform associated with different scalar deterministic signals. This quadratic transform is termed energetic if its is equal to the energy of the double integral over and , we have considered signal, i.e., for a scalar signal . Such energetic transforms form the basis of quadratic time-frequency distributions QTFDs . Commonly, additional properties of invariance are imposed. Hence, the Cohen’s class is the class of energetic TFDs covariant under time and frequency shifts, whereas the Affine class is the class of energetic TFDs covariant under time scalings and shifts.

is any smoothing window satisfying and . Notice that the spatial , is directly Wigner transformation SW , denoted , for all . The spatial obtained by considering ), Wigner–Ville transformation SWV , denoted is the spatial Wigner transformation applied to the (com, where plex) analytic signal and represents the Hilbert transform. The spatial pseudo-Wigner–Ville transformation SPWV , , is the spatial pseudo-Wigner transfordenoted . Most simulations mation applied to the analytic signal in Section VI are performed using the latter transformation. where

III. DETERMINISTIC SOURCE SEPARATION We consider the classical instantaneous blind source separasources signals are received tion problem where sensors. We also suppose that ; the on model is then called “overdetermined.” In matrix and vector notation, the input/output relationship of the mixing model reads, in the noiseless case (5) with the complex mixing matrix, which is assumed is the to have full column rank, observations vector and is complex deterministic vector of sources. the The problem of blind source separation consists of estimating the “separating” matrix, say , which applied to the observation yields an estimate of the source signals. as as the matrix of the global system, the Defining source separation problem is solved when one has found a sep, where is an arating matrix in such a way that invertible diagonal matrix that corresponds to arbitrary attenuations for the restored sources and is a permutation matrix that corresponds to an arbitrary order of restitution of source signals. It is also necessary to impose assumptions onto the source signals. In this paper, we consider the two following ones. Assumption D: There exist points in the time-frequency plane where each of them correspond to a single auto-source term for all source signals. In other words, we suppose that there exist such that couples (6) , such that , there exists at least one where for all , such that and where if and otherwise. Assumption Z: There exist points in the time-frequency plane where each of them correspond to all two by two cross-source

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

terms. In other words, we suppose that there exist couples such that

and

(11)

(7) where for all such that , there exists at least one , . such that Notice that each of the above assumptions for deterministic signals plays the role of the classical statistical independence assumption for random signals. It is clear that a “known” discriminating property for source signals is always required for a blind separation. Here, we consider deterministic signals whose quadratic TFD do not overlap too much two by two in the above sense. In other words the signatures of the sources in the timefrequency plane are “sufficiently” different to be able to find points satisfying the considered assumptions. Finally, an analytical example in Section V-A as well as computer simulations performed on both synthetic and speech signals in Section VI are provided in order to illustrate the validity in practice of such assumptions.

1675

where stands for the Euclidean (Frobenius) norm of the and stands for the matrix argument and where sets of diagonal and zero-diagonal matrices, respectively, under in (10) was consideration. Notice that the first cost function used in [25] to derive nonorthogonal JD algorithms. However, another way can be considered. Indeed, multiplying in (8) (respectively, in (9)) on the left by the pseuof doinverse (Moore–Penrose generalized matrix inverse) (respectively, of ) and on the right by (respectively, ), leads to (12) and (13)

IV. JOINT DIAGONALIZATION AND/OR ZERO DIAGONALIZATION A. Criteria It is necessary to consider two sets of matrices related to the JD problem and to the JZD one. The first set, denoted by , is a , matrices , which set of are assumed to all admit the following factorization: there exists , full column rank matrix , and a set a of diagonal matrices , such that

Hence, for the goal of the direct estimation of the pseudoinverse or , one can consider the following two new of matrices quadratic cost functions: (14) and (15)

(8) The second set denoted by is a set of , mawhich are assumed to all admit trices , full the following factorization: there exists a , and a set of zero-diagcolumn rank matrix , such that onal matrices (9) A zero-diagonal matrix is a square matrix whose all diagonal is a zero-diagonal matrix components are null, i.e., for all . if and only if The JD problem consists in the estimation of matrices and using only the matrices set . Similarly, the JZD problem consists of estimating the matrices and using only the matrices set . If , we define the joint diagonalization/zerodiagonalization JDZD problem consisting in the estimation of and matrices using only the two matrices sets and . Knowing the matrices factorization, a rather classical way to solve the above JD and JZD problems consists of the minimization of the two following quadratic cost functions, respectively:

Notice that, if (respectively, ) is a unitary matrix then (respectively, ) since the Euclidean norm is invariant under unitary matrix multiplication of its matrix argument. Otherwise the criteria are different in general. The cost in (14) was used in [14] to derive a nonorthogonal function JD algorithm. and For simplicity reasons, we will use the cost functions . This is because, in this case, the minimization of and with respect to, respectively, matrices and have a direct solution when matrix is fixed. Indeed, defining (16) and (17) it is quite straightforward to see that (18) and (19)

(10)

where the matrix operators defined for a square matrix

and are componentwise as

1676

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

and Thus, using (18) in (14), we have

. and

Proof: Let us rewrite the two cost functions in (21). For , we have

in (20)

(20)

(27)

Also, using (19) in (15), we have

where For

is defined in (25). , we have

(21)

The goal is now to find a matrix argument of the minimization and/or . For generality and if the searched maof trix is the same in both cases, we introduce the combined cost function (22) It takes into consideration both a diagonalization aspect and a zero-diagonalization one. B. Proposed Algorithm Before presenting our algorithm, we derive an useful result. in (22) reads Proposition 1: The cost function

(28)

where is defined in (26). Finally, using (27) and (28) in (22), we directly get (23). Looking at (23), it can be noticed that the minimization of can be realized row by row. For a given row, i.e., being fixed, one of the simplest way to find a minimum of the quadratic form in (23) consists of the determination of the normalized associated with the lowest (non zero) eigenvector of for a given also deeigenvalue. However, since matrix , and because the estimation pends on the vector when is performed row by row, we propose the use of an iterative procedure as follows.

(23) where (24) with

Given , a (good) initial matrix with unit norm rows. , for each , do 1) and 2): With ; 1) calculate 2) find the th lowest eigenvalue and the associated unit-norm eigenvector of matrix . Stop when , where is a given small positive threshold.

(25) and (26)

Computational burden mainly involves the calculation of eigenvectors at each iteration. When only the JZD (respectively, JD) part is considered, the proposed algorithm is denoted by JZD (respectively, JD ). When the two are considered altogether, it is denoted by JD/JZD (in simulations (Section VI),

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

1677

we will consider only the case where in (22)). Finally, it is worth mentioning that the JD algorithm is close to the one proposed in [14]. However, the matrices set under consideration, in our case, is not “whitened.” C. Discussion and Computer Simulations 1) Discussion: The existence of a solution that minimizes is guaranteed because the criterion is a continuous funcand it is bounded by zero. The nonintion of creasing nature of the criterion is also locally ensured at each ). However, this is not sufiteration (i.e., ficient (as is well known) to ensure its global convergence to any global minima. It is known as a difficult problem and it remains to be further considered. That is why, the choice of the initial point remains important. One possible way to initialize is to consider the solution given by the orthogonal joint (zero) diagonalization [6] to start in the neighborhood of the solution. With regard to the JD, another possible way is to consider the solution given by the generalized eigendecomposition of two matrices coming from the set (or built from matrices of this set). One of the interesting aspects of this nonorthogonal iterative algorithm is its implementation. It also makes it possible to find , which is a diagonalizer and/or a zero diagonalizer a general (not necessarily orthogonal) matrix, with the matrices in not required to be positive definite like in [24]. Finally, addressing the issue of computational load per iteration, this algorithm is not that computationally demanding at all. Even though it is more computationally demanding than the orthogonal approach in [6], it remains less computationally demanding than the nonorthogonal approach in [25], denoted JD . 2) Computer Simulations: To quantify the performances of the different algorithms, the following performance index given in [4, eq. (42)] is used:

D

Fig. 1. A comparison of the JD and JD algorithms. Top: is built from a of N = 20; 100; 400, randomly chosen (Gaussian law), (3,3) diagonal set of N = 20 100 400, randomly matrices. Bottom: is built from a set chosen (Gaussian law), (3,3) not diagonal matrices whose diagonal terms remain preponderant. The square mixing matrix A is used. The displayed results are averaged over ten Monte Carlo runs.

D

D

D

(29) is close to . It is given This index measures how dB . In Fig. 1 (for in decibels defined by JD ) and Fig. 2 (for JZD ), the performance index is displayed versus the number of iteration. With regard to JD, two cases are studied. is a set of (respectively, 100 and 400), 1) The set (3,3) diagonal matrices with random entries, chosen from a uniform distribution on the interval [0,1]. 2) 20 (respectively, 100 and 400), (3,3) matrices with random entries, chosen from a Gaussian distribution with zero mean and variance 0.01, are added to the above matrices. is then built from using (30) and the square The set given in Section VI-B. The obtained results are dismatrix played on the top of the Fig. 1 for the first case (respectively, its bottom for the second case). With regard to the JZD, two cases are also studied. is a set of (respectively, 100 and 1) The set 400), (3,3) zero-diagonal symmetric matrices with random entries, chosen from a uniform distribution on the interval . 2) 20 (respectively, 100 and 400), (3,3) matrices with random entries, chosen from a Gaussian distribution with mean zero and variance 0.01, are added to the previous matrices.

Z

Z

Fig. 2. JZD algorithm. Top: is built from a set of N = 20; 100; 400, randomly chosen, (3,3) zero-diagonal matrices. Bottom: is built from a set of N = 20; 100; 400, randomly chosen, (3,3) not zero-diagonal matrices whose diagonal terms remain small. The rectangular mixing matrix A is used. The displayed results are averaged over ten Monte Carlo runs.

Z

Z

using (30) and the rectangular The set is then built from given in Section VI-B. The obtained results are dismatrix played on the top of the Fig. 2 for the first case (respectively, its bottom for the second case). All the displayed results have been averaged over ten Monte Carlo runs. One can observe that few iterations depending on the number of matrices to be joint (zero) diagonalized and on their size are generally needed for the convergence to be reached. In Fig. 1, we also compare JD with the nonorthogonal JD algorithm developed in [25] and denoted JD . When the same initial point is chosen for both JD and JD , these algorithms provide close, yet different, results. This is not surprising at all since the criteria on which they are based are different. Let us finally say that similar results are

1678

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

obtained in all the cases: the square one and the rectangular one with either the JD JZD , and JD/JZD algorithms. V. AUTOMATIC TIME-FREQUENCY POINTS SELECTION PROCEDURE

where is the Dirac distribution, is a unit distribution equal 1 for all and denotes the product of two distributions. , which From now on, we will consider the case finally leads to

Here, we show how the above algorithm finds application in the BSS problem. Thanks to a formal analytical example, we illustrate the need for an automatic - points selection procedure. Some existing procedures are then recalled, a new one is proposed and an existing one is slightly modified. A. Matrix Decomposition and an Example From now on, the mixing matrix is assumed real. Using of the observation (5), it is easy to see that the SQT signals vector directly admits the following decomposition: (30) is the SQT of the source signals vector. where for any and Notice that, in general, the matrix has no special structure. Nevertheless, there can exist some points where this matrix has a specific structure. In particular, it occurs when one considers - points conforming the assumption D or Z. One can see that, according to the assumpis diagonal, while according to the tion D, the matrix assumption Z, it is zero diagonal. Under the mixing effect, these is generally no more properties are lost (the matrix (zero) diagonal). Our goal is to take advantage of these properties and the decomposition (30) at the same time in order to be able to estimate the searched separating matrix and, then, to restore the unknown sources vector. But first, let us detail an analytical example to better appreciate the pertinency of the considered assumptions D and Z. We consider as source signals the two following analytic multicomponent exponential waves:

(31) and . Since the considered sources are with , one can show already analytic signals, using (4) with that

Notice that there exist - points where the matrix has a particular structure even when partially overlapping signals are considered. Indeed, for and for all while . Also for and for all while . Hence, for the above considered points corresponding to one single auto-source term, the is diagonal and for such - points, a matrix JD procedure can be considered. Now, for and for all while . Hence, for the above considered points corresponding to cross-source terms only, the matrix is zero diagonal and generally complex and for such - points, a JZD procedure can be considered. In the case of - points corresponding to both cross-source terms , for all and and auto-source terms as for for , for all , in the example, the matrix has no particular algebraic structure. Finally, - points corresponding to neither cross-source terms nor auto-source terms (all the other time-frequency points) are obviously not interesting either. Thus, there exist four different types of - points: those corresponding to cross-source terms only, to auto-source terms only, to no cross-source term and no auto-source term, and to both cross-source terms and auto-source terms. Only the two first types of - points are of interest with regard to BSS based on the use of SQTFDs. In the following subsection, we address the issue of how to detect those two classes of - points. B. Selection Procedures We have to find the two categories of interesting - points. Notice that it is not globally a simple problem since a property about the unknown SQTFD matrices of the source signals has to be determined from the only use of the SQTFD or of the whitened obmatrices of the observed signals . served signals denoted A certain number of - points selection procedures can be encountered in the literature. However, in most cases, they require a prewhitening step [6], [18], [20]. In the nonprewhitened case, very few solutions have been proposed (see, e.g., [21]–[23]). Depending on the kernel, it appears that, for

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

any signals, the SQTFD matrix proves to be a complex matrix (see, e.g., the example at the end of Section III). Nevertheless, according to the considered Hermitian symmetry (see (3)) and is assumed real, the SQTFDs because the mixing matrix of sources signals and of (whitened) observed signals for points corresponding to auto-source terms only are real and nonzero matrices. They are generally complex for - points corresponding to cross-source terms. The idea is then to directly exploit these properties. As noticed in [20] and [23], after prewhitening of the observaTrace , because the trace tions, Trace is invariant under unitary transformation. Among different possibilities, one of the simplest and direct way to select SQTFDs matrices (of the whitened observed signals) to be joint zero-diagonalized consists in choosing those associated with - points leading to matrices with a “very” low (lower than a fixed level ) absolute value of the trace Trace together with a high Euclidean (higher than a fixed level ) imaginary part . This can be stated as norm For JZD choose

such that

Trace (32)

In the same manner, a way to select SQTFD matrices to be joint diagonalized consists of considering “almost” real matrices according to the following rule ( and being fixed thresholds too): For JD choose

such that

Trace (33)

The values of the different thresholds are empirically chosen since the calculation of their optimal value is a difficult task which remains to be further considered (yet, in simulations we will show that the performances are not too affected by these thresholds). This detection procedure [(32) and/or (33)] is generically denoted C . Let us notice that such a decision criterion is still not restrictive enough, since a problem may occur when cross-source terms do exist (their real part is not null), but their imaginary part is null (see, for example, what and for all in the analytical example of occurs in previous section). In such - points, if there are cross-source terms only, the corresponding SQTFD matrices should be joint zero diagonalized (whereas we decide to do nothing), and if there are both sources auto and cross-source terms, nothing should be done whereas we decide to joint diagonalize. With regard to JZD, this is not really a problem: it only means that the set of matrices that will be joint zero diagonalized is smaller than the one that could have been considered. Concerning JD, the set of matrices that will be joint diagonalized is bigger than the one that should have been considered. We now propose to introduce an additional condition under the shape of a third in the decision algorithm in order to threshold denoted eliminate this last ambiguity. It is playing upon the imaginary part of the SQTFD matrices after rectification and filtering by a (such an approach makes it possible 2-D low-pass filter to make a distinction in the - plane between areas where the cross-source terms are null in a whole neighborhood of the

1679

and areas where the cross-source terms studied - point ). The modified deciare cancelled only in the - point sion rule with regard to JD can then be expressed as follows: such that

For JD choose

Trace (34)

where stands for the convolution operation. This procedure [(32) and/or (34)] is generically denoted C . In the computer simulations section, our two detection procedures proposed above will be compared with five other ones. Let us also notice that one can find some other detection procedures in order to take into account the noise in [10], [11], and [20]. They will not be presented here as we only consider the noiseless case. 1) The first one is described in [6] and operates on prewhitened observations. The underlying idea is to detect - points with sufficient energy (thanks to ) where auto-source terms are present or absent (thanks to ). It reads Trace Keep

such that

For JZD choose

(35) such that

Trace (36)

For JD choose

such that

Trace

(37)

where is a small positive scalar. This detection procedure [(35)+(36) and/or (37)] is generically denoted C . 2) The second detection procedure can be found in [23]. It is based on the assumption that the sources are quasi-disjoint (i.e., only a small overlapping of the - supports of the different sources is allowable). With regard to JD, the underlying idea is to keep - points with a sufficient energy and then to use the rank-one property to detect single auto-source terms, like in [18] and [22]. It reads Keep

such that

For JZD choose

(38) such that (39)

For JD choose

such that (40)

acwhere is a small positive scalar (typically represents the largest cording to its authors) and eigenvalue of the matrix in the bracket. This detection procedure [(38)+(39) and/or (40)] is generically denoted for both JD and JZD C . To keep the same threshold amounts, somehow, in considering only three classes of - points instead of four. 3) We suggest to modify the previous decision rule, replacing by in (40). It leads to the third detection procedure: For JD choose

such that (41)

1680

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 3. First line: the trace of the sources SPWV (left); the sum of the off-diagonal terms of the imaginary part of the sources SPWV (right). Lines 2, 3, 4, 5: t-f points selected for joint diagonalization (first and third columns) and for joint zero diagonalization (second and fourth columns) thanks to the different detection procedures. Second line: left (first and second columns), by hand; right (third and fourth columns), using C . Third line: left, using C ; right, using C , fourth line: using C ; right, using C , fifth line: left, using C ; right, using C .

This detection procedure [(38)+(39) and/or (41)] is generically denoted C . Given its inherent normalization, this selection scheme should be less signal dependent than the other - points selection schemes. 4) The fourth detection procedure, based on a rank-one property, is dedicated to the search of single auto-source terms

only. It can be found in [18] and it operates on prewhitened observations: Keep

such that Trace

moy Trace

(42)

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

1681

TABLE I FIRST CASE OF SOURCES AND MIXTURE IS CONSIDERED (SECTION VI-A). A SUM UP OF THE SQTFD MATRICES STRUCTURE VERSUS THE CONSIDERED t-f POINT (t;  )

TABLE II SUM UP OF THE THRESHOLDS USED VERSUS THE CONSIDERED t-f POINTS SELECTION PROCEDURE. NUMBER OF MATRICES KEPT FOR JOINT DIAGONALIZATION AND/OR JOINT ZERO DIAGONALIZATION

TABLE III COMPARISON OF THE PERFORMANCE INDEXES REACHED DUE TO THE DIFFERENT JOINT ZERO DIAGONALIZATION AND/OR JOINT DIAGONALIZATION ALGORITHMS ON DIFFERENT SQTFD MATRICES SETS

where moy Trace trace on the whole - plane. For JD choose

stands for the mean of the

such that (43)

with the th eigenvalue of the matrix in the bracket and a positive constant close to 0. This detection procedure [(42)+(43)] is denoted C . 5) The fifth detection procedure, based on a rank-one property, is also dedicated to the search of single auto-source terms. It operates without prewhitening of the observations. It can be found in [22]: For JD choose

such that (44)

where a positive constant close to 0 and a positive constant ensuring that the denominator in the first equation is not null. This detection procedure (44) is denoted C . According to our analytical example, notice that problems can occur using the JD part of C . In fact, if a nonnegligible value of the trace ensures the presence of auto-source terms, it does not exclude the presence of cross-source terms. However if auto-source terms and cross-source terms are simultaneously present in a - point, the SQTFD of the sources does

not have any more a particular algebraic structure. Moreover, even if the sources are disjoint in the - plane, nothing guarantees the absence of - points where are simultaneously present auto-source terms and cross-source terms as exemplified by our analytical example. C. Summary of the Method The proposed method is summarized by the following four points. 1) Estimate the SQTFDs of the observation vector for a given set of - points. 2) Determine the useful - points thanks to one of the selection procedures described in Section V-B (C C C C C C , or C ). 3) Joint diagonalize and/or zero diagonalize the corresponding SQTFS matrices thanks to the nonorthogonal proposed procedure described in Section IV (JD JZD , or JD/JZD ). 4) Recover the unknown sources using the estimated separating matrix. VI. COMPUTER SIMULATIONS In this section, considering the case where , the performances of the proposed JZD JD , and JD/JZD algorithms are analyzed via computer simulations in the context of BSS based on SQTFDs. They are compared with other

1682

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 4. First line: the trace of the sources SPWV (left); the sum of the off-diagonal terms of the imaginary part of the sources SPWV (right). Lines 2, 3, 4: t-f points selected for joint diagonalization (first and third columns) and for joint zero diagonalization (second and fourth columns) thanks to the different t-f points (third and fourth column). Third line: t-f points selected thanks to C (first selection procedures. Second line: using C (first and second column), using C (third and fourth column). Fourth line: t-f points selected thanks to C (first and second column), thanks to C (third and and second column), thanks to C fourth column).

algorithms: the orthogonal JD, JZD algorithms and their combination used in [6] and [7] (and denoted JD JZD , and JD/JZD ), all applied after whitening of the observations and the nonorthogonal JD algorithm developed in [25] denoted JD and already used in [21], [22]. These different algorithms are compared on matrices sets built thanks to different points selection procedures (C C C C C C , and C ). These different approaches are also compared with the TFBSS [18] algorithm.1 1Downloaded at http://www-sigproc.eng.cam.ac.uk/~cf269/TFBSS_pack. html. This algorithm is based on the use of the spatial smoothed pseudoWigner–Ville distribution, the rank-one detection procedure C and the JD algorithm.

A. First Case of Sources and Mixture source signals for which we collect 128 We consider time samples. We consider as source signals the two multi-component exponential waves described in Section III, with , and . The signatures of the two sources do not overlap in the time-frequency plane. The spatial pseudo-Wigner–Ville [SPWV, [13], [19], cf. (4)] distribution is used as SQTFD because it affords a good compromise. In fact, it is preferred to the spatial smoothed pseudo-Wigner–Ville distribution for example (or any other kind of SQTFD belonging to the Cohen’s class with a reduced number of interferences) to make sure that a sufficient number of - points will be kept to

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

1683

TABLE IV SUM UP OF THE THRESHOLDS USED VERSUS THE CONSIDERED t-f POINTS SELECTION PROCEDURE. NUMBER OF MATRICES KEPT FOR JOINT DIAGONALIZATION AND/OR JOINT ZERO DIAGONALIZATION

TABLE V COMPARISON OF THE PERFORMANCE INDEXES REACHED DUE TO THE DIFFERENT METHODS ON THE DIFFERENT SETS OF t-f MATRICES CONSIDERED IN THE SECOND EXPERIMENT, WITH A SQUARE OR RECTANGULAR MIXING MATRIX

build a correct set of matrices to be joint zerod iagonalized. It is preferred to the SWV distribution, because we do not want too many - points to be kept with the automatic selection schemes. It is computed over 64 frequency bins and with a Hamming window of length 33. These source signals are mixed by the following mixing matrix:

Hence we consider that they are received on sensors. In Table I, we have summed up the structure of the SQTFD matrices versus the considered - points. We consider eight different scenarios to build the matrices sets to be joint-diagonalized and/or zero-diagonalized depending on the - points selection procedure that is used (selection by hand, C C C C C C and C ). In Table II, we have given the used thresholds versus the considered - points selection procedure as well as the number of matrices kept for JD and/or JZD. In each scenario (one given column of Table III), all the JD algorithms (JD JD , and JD ) are compared on the same set of points; all the JZD algorithms (JZD JZD ) are compared on the same set of - points; all the JD/JZD algorithms (JD/JZD JD/JZD ) are compared on the same set of - points. In the first scenario, the used - points have been chosen by hand to avoid possible errors due to the selection scheme and are displayed on the second line (left) of Fig. 3. On the first quadrant are displayed the 18 - points used in building the set of matrices to be joint diagonalized, whereas on its second quadrant are displayed the 18 time-frequency points used in building the

Fig. 5. Two speech sources.

set of matrices to be joint zero-diagonalized. For an easier interpretation, we have also given, in the first line of Fig. 3, the trace of the sources SPWV (left) and the sum of the off-diagonal terms of the imaginary part of the sources SPWV (right). In the second scenario, the - points have been obtained in an automatic mode using C . They are displayed on the second line (right) of Fig. 3. In the third scenario, they are obtained using C (a [7 7] 2-D low-pass filter whose reduced cutoff frequency equals 0.3 is applied) and displayed on the third line of Fig. 3 (left). In the fourth scenario, they are obtained

1684

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

TABLE VI PERFORMANCE OF JOINT-DIAGONALIZATION ALGORITHMS VERSUS THE TWO THRESHOLDS USED IN C

TABLE VII PERFORMANCE OF JOINT ZERO-DIAGONALIZATION ALGORITHMS VERSUS THE TWO THRESHOLDS USED IN C

using C and displayed on the third line of Fig. 3 (right). In the fifth scenario, they are obtained using C and displayed on the fourth line of Fig. 3 (left). In the sixth scenario, they are obtained using C and displayed on the fourth line of Fig. 3 (right). In the seventh scenario, they are obtained using C and displayed on the fifth line of Fig. 3 (left). Finally, in the eighth scenario, and displayed on the fifth line of they are obtained using C Fig. 3 (right). The resulting performances indexes are given in Table III. C and C enable us to build the set of matrices to be joint diagonalized only. However, by using the set of matrices obtained thanks to the detection procedure leading to the best performance with regard to the JZD [C , in this specific case (for example, ), C in the two other cases of sources and mixtures (for example, and )], we have given (in italic) the results obtained for the combination of JD and JZD. With respect to these results, one can notice that, in this case, the methods based on nonorthogonal JZD and/or nonorthogonal JD always perform better than those based on the orthogonal assumption except when - points are badly selected (C and C for the JD part; in fact orthogonal J(Z)D algorithms seem to be less sensitive to possible errors in the building of the sets to be joint (zero) diagonalized). Moreover, one can even observe that the methods based on a nonorthogonal JZD perform as good as those that are based on a nonorthogonal JD [when - points are well selected in both cases because they are selected by hand or using a more elaborated automatic mode (see, e.g., C and C )] or even better [because of possible errors in the set of matrices to be joint diagonalized, a problem that does not occur with the set of matrices to be joint zero diagonalized (see, e.g., C and

C )]. Finally, a combination of nonorthogonal JD and JZD appears as less sensitive to possible errors in the selection of points, always leading to relatively satisfying performances. Let us also notice that best performances are obtained when the points are selected by hand or thanks to more elaborated detecor C . Notice also that the TFBSS tion schemes, e.g., C algorithm provides expected results since it is based on a unitary JD algorithm and requires a prewhitening of the observations. B. Second Case of Sources and Mixtures real source signals of 128 time samples. We consider and The first one is the sum of two sinusoïdal signals ( ), the second one is the real part of a sinusoidal frequency modulation signal and the third one is the real part of a linear frequency modulation signal. In these simulations, the SPWV representation is used. It is computed over 64 frequency bins with a Hamming window of length 33. Two mixing maand the second trices are considered. The first one is denoted one

Hence, we consider that the three sources are received on 3 sensors in the first case and on 4 sensors in the second case. We consider two experiments: in the first one, the square

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

Fig. 6. Left: the real part of the 2

1685

2 2 SPWV distribution of the observations; right: the imaginary part of the 2 2 2 SPWV distribution of the observations.

TABLE VIII SUM UP OF THE THRESHOLDS USED VERSUS THE CONSIDERED t-f POINTS SELECTION PROCEDURE. NUMBER OF MATRICES KEPT FOR JOINT-DIAGONALIZATION AND/OR JOINT ZERO-DIAGONALIZATION

mixing matrix is used while the rectangular matrix is used in the second one. In Table IV, we have given the used thresholds versus the considered - points selection procedure as well as the number of matrices kept for JD and/or JZD for each experiment (square and rectangular mixing matrix). The - points selected thanks to C (respectively, C C C C , and C ) are displayed on the second line (left) [respectively, second line (right), third line (left), third line (right), fourth line (left) and fourth line (right)] of Fig. 4. The resulting performances indexes are given in Table V. The conclusions are the same as in the first case whether a square or a rectangular mixing matrix is considered. The purpose of this study is also to better understand the influence of the thresholds involved in C and to show that in some degree the BSS performance is not too affected by their choice. We consider the same set of sources and the square mixture. The performance versus the thresholds is summed up in Table VI (for JD) and in Table VII (for JZD). With regard to JD: one can observe that with badly adapted thresholds, the performances may decrease. On the one hand, the selection procedure may be too selective, which results in too few - points being selected and a fall in the performance (for example with and , only eight matrices are kept). On the other hand, the selection procedure may not be sufficiently selective, which results in too many - points being selected and again, a decrease in the performance (for example and , 2217 matrices are kept). With rewith gard to JZD: one can observe that, in this case, performances do not depend too much on the two chosen thresholds (best perforand , mances ( 49.88 dB) being obtained with

which leads to the selection of 122 matrices to be joint zero diagonalized). Algorithms based on nonorthogonal JZD always perform better than those based on orthogonal zero diagonalization whatever the chosen thresholds. C. Third Case of Sources and Mixtures This last experiment deals with two speech signals. Its aim is to show that these techniques apply to more “complex” signals (dealing with their representations in the - plane) than the more academic ones used in the two previous experiments. These two speech signals correspond to the two words: “interference” and “diagonalization” pronounced by the same speaker. They have been sampled at 2000 Hz and mixed according to the following mixing matrix:

There is no noise. The plots of the two individual speech signals are shown in Fig. 5. The SPWV distribution is used. It is computed over 1024 frequency bins and 2048 time samples using a Hamming window of length 65. The real (respectively, the imaginary) part of the SPWV distribution of the mixed sources is displayed in the left (respectively, the right) of Fig. 6. In Table VIII, we have given the used thresholds versus the considered points selection procedure as well as the number of matrices kept for JD and/or JZD. The resulting performance indexes are given in Table IX. The conclusions are the same as in the two first cases, when we compare the nonorthogonal JZD and/or

1686

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

TABLE IX COMPARISON OF THE PERFORMANCE INDEXES REACHED THANKS TO THE DIFFERENT JOINT ZERO-DIAGONALIZATION AND/OR JOINT-DIAGONALIZATION ALGORITHMS ON DIFFERENT SQTFD MATRICES SETS

the nonorthogonal JD with their orthogonal equivalents. In this specific case, the nonorthogonal JD seems, however, to provide better results than nonorthogonal JZD. VII. DISCUSSION AND CONCLUSION In this paper, we have proposed new algorithms to perform the nonorthogonal J(Z)D of a given set of matrices and the combination of both nonorthogonal JZD and nonorthogonal JD of a given set of matrices. We have also shown that these algorithms can be applied to the problem of blind source separation based on spatial quadratic time-frequency representations. Such an approach generally makes it possible to achieve better performances than that obtained using algorithms based on the orthogonal constraint especially when the source signals are correlated. This approach even allows one to totally eliminate the classical preliminary whitening stage if it is combined with a points selection procedure operating on non prewhitened observations. Finally, BSS methods based on SQTFDs enable one to treat the case of mixtures of nonstationary correlated sources (even for short time intervals). REFERENCES [1] C. Jutten and J. Hérault, “Blind separation of sources, Part I: An adaptative algorithm based on neuromimetic architecture,” Signal Process., vol. 24, pp. 1–10, 1991. [2] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian signals,” Proc. Inst. Elect. Eng.-F, vol. 40, pp. 362–370, 1993. [3] P. Comon, “Independent component analysis: A new concept?,” in Signal Process., 1994, vol. 36, pp. 287–314. [4] E. Moreau, “A generalization of joint-diagonalization criteria for source separation,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 530–541, Mar. 2001. [5] A. Belouchrani and M. G. Amin, “Blind source separation based on time-frequency signal representations,” IEEE Trans. Signal Process., vol. 46, no. 11, pp. 2888–2897, Nov. 1998. [6] A. Belouchrani, K. Abed-Meraim, M. G. Amin, and A. M. Zoubir, “Joint anti-diagonalisation for blind source separation,” in Proc. Int. Conf. Acoustic Speech, Signal Processing, (ICASSP 2001), Salt Lake City, UT, May 2001, pp. 2789–2792. [7] A. Belouchrani, K. Abed-Meraim, M. G. Amin, and A. M. Zoubir, “Blind separation of nonstationary sources,” IEEE Signal Process. Lett., vol. 11, no. 7, pp. 605–608, Jul. 2004. [8] A. Bousbia-Salah, A. Belouchrani, and H. Bousbia-Salah, “A one step time-frequency blind identification,” in Proc. Int. Symp. Signal Processing Its Applications (ISSPA), Paris, France, Jul. 2003, vol. I, pp. 581–584. [9] J.-F. Cardoso, “On the performance of orthogonal sources separation algorithms,” in Proc. EUSIPCO, Edinburgh, U.K., 1994, pp. 776–779.

[10] L. Cirillo, A. Zoubir, N. Ma, and M. G. Amin, “Automatic classification of auto- and cross-terms of time-frequency distributions in antenna arrays,” in Proc. Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), Orlando, FL, May 2002, vol. II, pp. 1445–1448. [11] L. Cirillo and M. G. Amin, “Auto-term detection using time-frequency array processing,” in Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), Honk-Kong, Apr. 2003, vol. VI, pp. 465–468. [12] T. Claasen and W. Mecklenbrauker, “The Wigner distribution—A tool for time-frequency signal analysis. Part I. Continuous time signals,” Philips J. Res., vol. 35, pp. 217–250, 1980. [13] L. Cohen, Time-Frequency Analysis. Englewood Cliffs, NJ: PrenticeHall, 1995. [14] S. Dégerine, “Sur la diagonalisation conjointe approchée par un critère des moindres carrés,” in Proc. 18éme Colloque GRETSI sur le Traitement du Signal et des Images, Toulouse, France, Sep. 2001, pp. 311–314. [15] L. DeLathauwer, “Signal processing based on multilinear algebra,” Ph.D. dissertation, Elect. Eng. Dept., K. U. Leuven, Leuven, Belgium, 1997. [16] E. M. Fadaili, N. Thirion-Moreau, and E. Moreau, “Non-Orthogonal zero-diagonalization for source separation based on time-frequency representation,” in Proc. Int. Conf. Acoustic Speech, Signal Processing (ICASSP), Philadelphia, PA, Mar. 2005, vol. IV, pp. 297–300. [17] E. M. Fadaili, N. Thirion-Moreau, and E. Moreau, “Combined non-orthogonal joint zero-diagonalization and joint diagonalization for source separation,” in Proc. IEEE Workshop Statistical Signal Processing (SSP), Bordeaux, France, Jul. 2005. [18] C. Févotte and C. Doncarli, “Two contributions to blind source separation using time-frequency distributions,” IEEE Signal Process. Lett., vol. 11, no. 3, pp. 386–389, Mar. 2004. [19] P. Flandrin, Time-Frequency/Time-Scale Analysis. New York: Academic, 1999. [20] L. Giulieri, N. Thirion-Moreau, and P.-Y. Arquès, “Blind sources separation based on bilinear time-frequency representations: A performance analysis,” in Proc. Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), Orlando, FL, May 2002, pp. 1649–1652. [21] L. Giulieri, N. Thirion-Moreau, and P.-Y. Arquès, “Blind sources separation based on quadratic time-frequency representations: A method without pre-whitening,” in Proc. Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), Honk-Kong, Apr. 2003, vol. V, pp. 289–292. [22] L. Giulieri, H. Ghennioui, N. Thirion-Moreau, and E. Moreau, “Nonorthogonal joint-diagonalization of spatial quadratic time frequency matrices for source separation,” IEEE Signal Process. Lett., vol. 12, no. 5, pp. 415–418, May 2005. [23] N. Linh-Trung, A. Belouchrani, K. Abed-Meraim, and B. Boashash, “Separating more sources than sensors using time-frequency distributions,” EURASIP J. Appl. Signal Process., pp. 2828–2847, 2005. [24] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mixtures of non-stationary sources,” IEEE Trans. Signal Process., vol. 45, no. 10, pp. 2608–2612, Oct. 1997. [25] A. Yeredor, “Non-orthogonal joint diagonalization in the least square sense with application in blind source separation,” IEEE Trans. Signal Process., vol. 50, no. 7, pp. 1545–1553, Jul. 2002. [26] Y. Zhang, W. Mu, and M. G. Amin, “Subspace analysis of spatial timefrequency distribution matrices,” IEEE Trans. Signal Process., vol. 49, no. 4, pp. 747–759, Apr. 2001.

FADAILI et al.: NONORTHOGONAL JOINT DIAGONALIZATION/ZERO DIAGONALIZATION FOR SOURCE SEPARATION

El Mostafa Fadaili was born in Mohammedia, Morocco, in 1978. In 2002, he graduated from the Institut des Sciences de l’Ingénieur de Toulon et du Var (ISITV), La Valette, France, and he received the Master’s degree on signal and digital communications (Sicom) from the University of Nice Sophia Antipolis, France. He is currently working towards the Ph.D. degree at the Université du Sud Toulon Var (USTV), La Garde, France. His main research interests are in blind source separation.

Nadège Thirion Moreau was born in Montbéliard, France. She graduated from the Ecole Nationale Supérieure de Physique (ENSPG) of the Institut National Polytechnique de Grenoble (INPG), Grenoble, France, in 1992, where she received the D.E.A. degree in 1992 and the Ph.D. degree in 1995, both in the field of signal processing. From 1996 to 1998, she was an Assistant Professor with the Ecole Supérieure des Procédés Electroniques et Optiques (ESPEO), Orléans, France. Since 1998, she has been an Assistant Professor in the Department of Telecommunications with the Institut des Sciences de l’Ingénieur de Toulon et du Var” (ISITV), La Valette, France. Her main research interests are in deterministic and statistical signal processing including blind source separation, high-order statistics, nonstationary signals, time-frequency representations, decision, and classification.

1687

Eric Moreau (M’01) was born in Lille, France. He graduated from the Ecole Nationale Supérieure des Arts et Métiers” (ENSAM), Paris, France, in 1989. He received the Agrégation de Physique degree from the Ecole Normale Supérieure de Cachan, Cachan, in 1990 and the D.E.A. and Ph.D. degrees, both in the field of signal processing, from the University of Paris-Sud, France, in 1991 and 1995, respectively. From 1995 to 2001, he was an Assistant Professor with the Telecommunications Department of the Engineering School Institut des Sciences de l’Ingénieur de Toulon et du Var (ISITV), La Valette, France. Currently, he is a Professor with the University of Toulon, France. His main research interests are in statistical signal processing using high-order statistics.