Complex-valued signal processing - IEEE Xplore

3 downloads 5095 Views 230KB Size Report
models, tools and statistics. Esa Ollila. ∗† ... Email: {esollila,visa}@wooster.hut.fi. H. Vincent ... important tools, statistics, models and estimators that are useful.
Complex-valued signal processing – essential models, tools and statistics ∗ Department

Esa Ollila∗† and Visa Koivunen∗†

H. Vincent Poor†

of Signal Processing and Acoustics, SMARAD CoE, Aalto University P.O.Box 13000, FI-00076 Aalto, Finland Email: {esollila,visa}@wooster.hut.fi

Abstract—Complex-valued signals arise in many diverse fields such as communications, radar, biomedical sciences, physical sciences, and related fields. This paper briefly reviews some important tools, statistics, models and estimators that are useful for handling complex-valued random signals. Over the past four decades, circularity (i.e. invariance of the distribution under multiplication by a unit complex number) or second-order circularity (i.e. uncorrelatedness of the random vector with its complex conjugate) has been a common implicit assumption. Hence in this paper a special emphasis is put on this circularity property, as optimal signal processing methods for circular and non-circular signals are often different and choosing the right type of processing can provide significant performance gains. Topics reviewed in this paper include different types of circularity measures and detectors of circularity, complex elliptical symmetry of random variables, Cram´er-Rao lower bounds on the estimation of complex-valued parameters, optimization of a realvalued cost function with respect to complex-valued parameters using CR-calculus, and complex-valued independent component analysis.

I. I NTRODUCTION Complex-valued signals are encountered in a wide variety of applications including wireless communications, sensor array signal processing, as well as biomedical sciences and physics. Consequently there is an increasing need in science and engineering for a statistical and mathematical theory for processing complex-valued random signals. For example, most practical modulation schemes in communications are complexvalued and applications such as radar, spectral analysis of time series and magnetic resonance imaging lead to data that are inherently complex-valued. In some applications on the other hand, great simplifications can be achieved by representing the observed 2-dimensional real-valued data matrix as a complex vector and then conducting the analysis in the complex domain (instead of the real domain). The complex valued representation is also compact and simpler in terms of notation and for algebraic manipulations, and is convenient for computation. It is evident that the need for expertise in the processing, statistical modelling and estimation of complex-valued multivariate signals and phenomena is rapidly increasing; see e.g. recent text-books [1] and [2] and references therein. Analysis in the complex domain presents a number of challenges since solid mathematical and statistical foundations, tools and algorithms for handling complex-valued signals are

† Department

of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: [email protected]

lacking, or, are simply too scattered in the literature. There appears to be a need for concise, unified, and rigorous treatment of such topics. Several recent research papers have profoundly widened our knowledge and understanding of complex-valued random signals (cf. references herein). This paper briefly reviews some recent developments in this field by surveying some recently proposed tools, statistics, models and estimators that are useful for handling complex-valued random signals. Since the circularity (i.e. invariance of the distribution under multiplication by a unit complex number) or second-order circularity (i.e. uncorrelatedness of the random vector with its complex conjugate) has been a common implicit assumption during the last four decades, we put special emphasis on detectors of circularity, as optimal signal processing methods for circular and non-circular signals are often different and choosing the right type of processing can provide significant performance gains. The paper is organized as follows. In Section II we discuss differentiation of a complex-valued function using the CRcalculus (or Wirtinger calculus) while Section III reviews important descriptive statistics, tools and bounds that can be used e.g. to characterize and analyse complex-valued random vectors and estimators. Also the circularity assumption is recalled and its implications are discussed. Section IV reviews the important class of complex elliptically symmetric random vectors and ML-estimation of its parameters. Detectors of circularity are discussed in Section V while Section VI reviews some recent developments in complex-valued ICA. Finally, Section VII presents our conclusions. Notation: Recall that the set of complex numbers, denoted by C, is the plane R×R = R2 equipped with complex addition and complex multiplication making it the complex field. The complex conjugate of z = (x, y) = √ x + jy ∈ C is defined as z ∗ = (x, −y) = x−jy, where j = −1 denotes the imaginary unit. Recall that any nonzero complex number has a polar representation, z = |z| exp(jθ), where θ = Arg(z) ∈√[−π, π) ∗ is  called the (principal) argument of z and |z|  zz = 2 2 x + y denotes the modulus of z. Ω = {z ∈ C : |z| ≤ 1} denotes the closed unit disk, ∂Ω its boundary, the unit circle, and Ω0 = Ω \ ∂Ω denotes the open unit disk. Let (·)T denote the transpose and (·)H the Hermitian (conjugate) transpose of their arguments.

II. C OMPLEX DIFFERENTIATION : CR- CALCULUS Let f = u + jv : U → C denote a complex function, where U denotes an open set in C, and the real and imaginary part, u = u(x, y)  u(z) : U → R and v = v(x, y)  v(z) : U → R, possess first real partial derivatives at c in U. Define ∂u ∂v ∂f ∂u ∂v ∂f (c)  (c) + j (c) and (c)  (c) + j (c). ∂x ∂x ∂x ∂y ∂y ∂y Observe that these are directional derivatives of f at c in the direction t = 1 = (1, 0) ∈ C and t = j = (0, 1) ∈ C; that is, they describe the rates of change of f (z) as z moves towards c along the real axis and imaginary axis, respectively. Further define   ∂f 1 ∂f ∂f (c)  (c) − j (c) and ∂z 2 ∂x ∂y   ∂f ∂f 1 ∂f (c) + j (c) (c)  ∗ ∂z 2 ∂x ∂y which are called the complex partial derivatives (c.p.d.’s) of f with respect to (w.r.t.) z and z ∗ at c, respectively. In [3], [4], the c.p.d.’s are called as the R-derivative and the conjugate R-derivative, respectively. The differential calculus based on these operators is known as Wirtinger calculus [5], [6], or, as we prefer, the CR-calculus [3], [4], [7]. See also [9], [8], [10], [11] to mention only a few. A function f is said to have a derivative of f at c ∈ U if lim

h→0

f (c + h) − f (c) ∈C h

exists; the value of the limit is denoted by f  (c). Despite the apparent similarity to the real case, the complex case is remarkably different: h may approach c in any manner from any direction without affecting the value of the limit. Possession of a derivative implies that f  (c) = ∂f ∂z (c) and that Cauchy-Riemann (C-R) equations, ∂v ∂v ∂u ∂f ∂u = and =− ⇔ = 0, ∂x ∂y ∂x ∂y ∂z ∗

(1)

hold at c. Then recall that a function f is said to be holomorphic or C-differentiable on U if f has a derivative at every point c ∈ U. If f = u + jv is holomorphic on U, then it implies that f satisfies C-R equations (1) in U, f is infinitely C-differentiable, u and v are harmonic functions (i.e. they satisfy Laplace’s equations in U), and for each c ∈ U, the power series ∞  an (z − c)n (2) n=0

with an = f (n) (c)/n! converges to f (z) for all z ∈ B(c, r)  {z ∈ C : |z − c| < r} ⊂ U; see e.g. [5]. It is thus clear that holomorphic functions form a rather restricted class of complex functions, the concept of Cdifferentiability may be considered too stringent condition for many signal processing applications. For example, consider a real function of a complex variable, e.g. a cost function in optimization that arises naturally in a number of signal

processing applications [8], [9], e.g. in solving the maximum likelihood (ML) estimates of a complex parameter. However, if the (real-valued) function is C-differentiable on U, then the function is a constant. Moreover, as we shall see later, from the point of view of probability theory of complex random variables, it is the non-holomorphic functions that are of major importance. A less restrictive notion is real sense differentiability or R-differentiability [12]: f : U → C is Rdifferentiable at c ∈ U if there exists an R-linear function (or, widely linear [13]) function L(h) = αh + βh∗ , α, β ∈ C, such that f (c + h) − f (c) − L(h) = 0. (3) lim |h| |h|→0 The R-linear function L is called the R-differential of f at c and we denote it by Lf,c . If f is R-differentiable at c ∈ U, then first order partial derivatives of u and v exists at c and ∂f ∂f (c) · h + ∗ (c) · h∗ . ∂z ∂z The function f is said to be R-differentiable if it is Rdifferentiable at every point c ∈ U. As with usual partial derivatives, an important application of c.p.d.’s is related to optimization. It is known [9], [10] that both c.p.d.’s (and their multivariate extensions defined below) vanish at stationary points of a function, but the (conjugate) c.p.d. ∂/∂z ∗ defines the direction of the maximum rate of change, i.e. it defines the complex gradient. When computing partial derivatives of a function f it is more convenient to express the function as f (z, z ∗ ) as if z and z ∗ are independent variables. Then the partial derivatives can be evaluated by treating z (equivalently z ∗ ) as a constant in f . For example, the partial derivative of a function f (z) = |z|2 can be computed by writing it as f (z, z ∗) = zz ∗ and then ∂f /∂z = z ∗ and ∂f /∂z ∗ = z using the above rule of thumb. Indeed the usefulness of the c.p.d.’s stems from an easily verifiable fact that they follow formally the same sum, product, and quotient rules as the ordinary partial derivatives. For example, due to product rule and induction, one has the usual rules for polynomials Lf,c (h) =

∂ n ∗m ∂ n ∗m z z = nz n−1 z ∗m and z z = mz n z ∗m−1 . ∂z ∂z ∗ However, it is easy to verify that the chain rule for the composition function (f ◦ g)(c) = f (g(c)) is of the form ∂f ◦ g ∂f   ∂g ∂f   ∂g ∗ = g · + ∗ g · , (4) ∂z ∂z ∂z ∂z ∂z ∗     ∂f ◦ g ∂f ∂f ∂g ∂g = (5) g · ∗ + ∗ g · ∗. ∂z ∗ ∂z ∂z ∂z ∂z Hence one should be cautious as simple and direct adaptation of the results derived for the real domain problems to the complex domain can lead to wrong results and conclusions. If u and v possess first partial derivatives in some set U, then ∂f /∂z and ∂f /∂z ∗ are complex functions from U to C. Thus they themselves can have complex partial derivatives w.r.t z and z ∗ at c ∈ U. There are called higher-order c.p.d.’s. ∂f ∂2f ∂ For example, ∂z ( ∂z ∗ )  ∂z∂z ∗ , is a second-order c.p.d. (one

among the four) and the total number of c.p.d.’s of order k ≥ 1 which can be formed is 2k . The theory of CR-calculus extends to complex-valued functions of vector or matrix arguments. For a function f : U ⊂ Cd → C we define ∂f /∂z = (∂f /∂z1, · · · , ∂f /∂zd)T and ∂f /∂z∗ is defined similarly. In optimization problems, f is a real-valued function and although both partial derivatives vanish at the stationary point of a function, only ∇z∗ ≡ ∂/∂z∗ points to the direction of maximum rate of change, i.e. it defines the complex gradient. Similarly, for a complex-valued function of a matrix variable Z = (zij ), f : Cn×m → C, we define ∂f /∂Z as an n×m complex matrix whose (i, j)th entry is ∂f /∂zij . As noted earlier, when calculating the c.p.d.’s it is useful to write f (Z) as f (Z, Z∗ ) instead of f (Z). For example, suppose function f : Cn×n → C is of the form Tr(AZ−1 ), or, det(Z), or det(Z∗ ). Then their matrix differentials are of the form ∂Tr(AZ−1 ) ∂Tr(AZ−1 ) = −(Z−1 AZ−1 )T , = 0, (6) ∂Z ∂Z∗ ∂ det(Z) ∂ det(Z) = 0, (7) = det(Z)Z−T , ∂Z ∂Z∗ ∂ det(Z∗ ) ∂ det(Z∗ ) = 0. (8) = det(Z∗ )Z−H , ∗ ∂Z ∂Z Additional matrix differential rules are collected in [14]. III. C OMPLEX RANDOM VARIABLES - ESSENTIALS TOOLS AND STATISTICS

A. Complex distributions A complex random vector (r.v.) z = x + jy ∈ Cd is comprised of a pair of real r.v.’s x and y in Rd . The distribution of z on Cd determines the joint real 2d-variate distribution of x and y on R2d and conversely due to isomorphism between Cd and R2d . Hence the distribution of z is identified with the joint (real 2d-variate) distribution of the composite real r.v. ¯ z = (xT , yT )T , Fz (c)  Pr(x ≤ a, y ≤ b) where c = a + jb ∈ Cd . In a similar manner, the probability density function (p.d.f.) of z = x + jy is identified with the joint p.d.f. f (x, y) of x and y. As was noted earlier, in some applications (e.g. for optimization purposes) it is preferable to write the p.d.f. f (z) using dual notation f (z, z∗ ) that separates z and its conjugate z∗ as if they were independent variates. Now recall that the mean (or expectation) of a complex r.v. z is defined as E[z] = E[x] + jE[y]. For simplicity of presentation we now assume throughout the paper that the r.v. z = x + jy is of zero mean. Recall that the expectation can be used to define an important alternative characterization of the real r.v. ¯ z via the concept of characteristic function (c.f.). The c.f. of the composite composite real. r.v. ¯ z can be represented with complex notations as follows:  

Φz (c) = E exp{jRe(cH z)} = E exp 2j (cH z + cT z∗ ) , where c ∈ Cd . Note that Φz : Cd → C is, in general, not C-differentiable and therefore CR-calculus is needed in developments of complex cumulants, for example [4].

A complex r.v. can be characterized via symmetry properties of its distribution. The most commonly made symmetry assumption in the statistical signal processing literature is that of circular symmetry. See e.g. [15]. Circularity, or lack of it (noncircularity) is the fundamental concept differentiating complex signal analysis from the real case. A complex r.v. z is said to be circular or, to have a circularly symmetric distribution about the origin, if its distribution remains invariant under multiplication by any (complex) number on the unit complex circle, i.e. d z = ejθ z, ∀ θ ∈ R, d

where notation = should be read “has same distribution as”. A circular r.v. z, in general, does not necessarily possess a density. However, if it does, then its p.d.f f (z) satisfies f (ejθ z) = f (z),

∀ θ ∈ R.

In the univariate case (d = 1), this is equivalent to saying that the composite r.v. (x, y)T is spherically symmetric. The p.d.f f (z) = f (x, y) is then a function of |z|2 = x2 + y 2 only, i.e. f (z) = C · g(|z|2 ) for some non-negative function g(·) and normalizing constant C [16]. Hence the regions of constant contours are circles in the complex plane, thus justifying the name for this class of distributions. In the vector case, however, the term “circular” is a bit misleading since for d ≥ 2, it does not imply that the regions of constant contours are spheres in complex Euclidean k-space. An important class of circular distributions, called the circular complex elliptically symmetric (CES) distributions, are reviewed in Section IV-A. B. Descriptive measures 1) Univariate case: Full second-order description of a complex random variable (r.va.) z = x + jy is obtained from the real 2 × 2 covariance matrix of z¯ = (x, y)T ,     σ 2 1 + Re( ) E[x2 ] E[xy] Im( ) = E[xy] E[y 2 ] Im( ) 1 − Re( ) 2 or equivalently as indicated above, from the variance σ 2  2 E |z| > 0 and the circularity quotient [17]  E[z 2 ]/σ 2 ∈ Ω

(9)

of z. If z is circular, then = 0. Hence a r.va. z with vanishing circularity quotient is said to be second-order circular [15] or proper [18], [19]. The modulus of the circularity quotient, | | ∈ [0, 1], is called as the circularity coefficient [20] of z. The circularity coefficient measures the degree of circularity in the sense that | | = 0 when z is second-order circular and | | = 1 when z is maximally non-circular (i.e. x or y is equal to zero with probability 1, or, x is a linear function of y). The asymptotic distribution of the ML estimator (MLE) of the circularity coefficient was recently studied in [21] and [22]. A r.va. z has p+1 pth-order moments, namely α0;p , α1;p−1 , α2;p−2 , . . . , αp;0 where αn;m  E[z n z ∗m ],

for m, n ∈ N0 = {0, 1, 2, . . .}. Note that symmetric moments are redundant in the sense that αm;n = α∗n;m . The relationship between the moments and the c.f. Φz (c) of a complex r.va. z is [4, Theorem 5.2]:  m+n m+n 2 ∂ Φz αn;m = (0) (10) j ∂cm ∂c∗n for all non-negative integers n, m such that n + m ≤ p when z is a r.va. whose pth-order moment exists; See also [23]. Equation (10) and applying the Taylor R-series [4, Theorem 3.3] at zero then gives an expansion for the c.f. of z: p  m  j m c∗n cm−n αn;m−n + o(|c|p ) Φz (c) = 1 + 2 n!(m − n)! m=1 n=0 as c → 0 and provided that z has finite pth-order moment; see [12], [4], [7] for details, proofs and examples. Another useful descriptive statistic of a complex r.va. is the kurtosis. Let (11) γ(z)  E[|z|4 ]/(E[|z|2 ])2 denote the normalized fourth-order moment of z. The realvalued quantity κ(z)  γ(z) − | (z)|2 − 2 is the most commonly used generalization of the kurtosis for a complex r.va. z (e.g. [24], [6], [25]). In [12] and [4] it was pointed out (based on definition of complex fourthorder cumulants) that there exists three natural measures of complex kurtosis. All the mentioned kurtosis definitions possess the natural property that if z is purely real valued (i.e. y = Im[z] = 0 with probability one), then κ reduces to the kurtosis κ(x) = γ(x) − 3 of the real r.va. x and that they vanish when z has a complex normal (Gaussian) distribution. 2) Multivariate case: The earlier defined statitistics can be readily extended to multivariate case. A complete second-order description of a complex r.v. z is obtained by the covariance matrix  C  E zzH = E[xxT ] + E[yyT ] + j(E[yxT ] − E[xyT ]) and pseudo-covariance matrix [18]      P  E zzT = E xxT − E yyT + j(E xyT + E yxT ). The notation cov(z) and pcov(z) will also sometimes be used for these quantities. The pseudo-covariance matrix is also called the relation matrix in [15] or complementary covariance matrix in [19]. Note that P is a symmetric matrix and C is a positive semidefinite Hermitian matrix (or, a positive definite Hermitian (PDH) matrix given that z is non-degenerate in any subspace of Cd ). If C is a PDH d×d matrix and P is complex symmetric d × d matrix, then the pair (C, P) is a covariance matrix and pseudo-covariance matrix of some r.v. z if and only if C − PC −∗ P ∗ is positive semidefinite [26]. The circularity coefficients of a r.v. z are defined as the singular values of a symmetric matrix K  C −1/2 P(C −1/2 )T , where C −1/2 can be any matrix square-root such that C −1 =

(C −1/2 )H C −1/2 . To be more specific, since K is complex symmetric, it has a special form of singular value decomposition (SVD), called Takagi factorization [27], K = U diag(λ1 , . . . , λd ) UT ,

(12)

where U is a unitary matrix (i.e. UH U = I) and 1 ≥ λ1 ≥ · · · ≥ λd ≥ 0 are the (ordered) circularity coefficients of z. Alternatively, circularity coefficients can be defined as the square-roots of the eigenvalues of the matrix RR∗ , where R  C −1 P; see [28, Theorem 2]. Note that the set {λi , i = 1, . . . , d} do not in general coincide (except for d = 1) with the set { i , i = 1, . . . , d}, where i = (zi ) denotes the circularity coefficient of the marginal r.va. zi , i = 1, . . . , d. It should also be highlighted that circularity coefficients are canonical correlations between z and z∗ [29] and invariant under non-singular linear transformations of the data. R.v. z is said to be second-order circular or proper if P = 0, or equivalently, if E[xxT ] = E[yyT ] and E[xyT ] = −E[yxT ].

(13)

or equivalently, if λi = 0 for i = 1, . . . , d. From the above criteria for (second-order) circularity, circularity coefficients are the most natural descriptive statistics for circularity as they remain invariant under non-singular linear transformations of the data. This is desirable as circularity (or, second-order circularity) is preserved under such transformations. Define W  (C −1/2 )H U, where U is defined in (12), then it is easy to verify that the transformed data s = WH z has strongly-uncorrelated components in the sense that C(s) = I and P(s) = Λ. If the r.v. z is not 2nd-order circular (i.e. P(z) = 0), then the conventional whitening transform C −1/2 does not decorrelate the components of z in the sense that correlations within and between the real part x and imaginary part y of z vanish. In such cases, the matrix W, called the strong-uncorrelating transform (SUT) [30], [20], is needed. Generalization of the SUT was described in [28]. C. Complex Cr´amer-Rao lower bound Let f (z|θ) denote the p.d.f. of the r.v. z ∈ Cd depending on an unknown complex parameter θ ∈ Ck . Let us define the information matrix and the pseudo-information matrix as:  I θ  E ∇θ∗ ln f (z; θ){∇θ∗ ln f (z; θ)}H  and P θ  E ∇θ∗ ln f (z; θ){∇θ∗ ln f (z; θ)}T . Only if the pseudo-information matrix vanishes (P θ = 0), then I −1 θ gives a Cram´er-Rao bound (in the sense that cov(t) ≥ I −1 θ , where the notation C ≥ D means that the matrix C − D is positive semidefinite) for an unbiased estimator t of θ; otherwise the bound depends on pseudo-information matrix as well [31, Corollary 1]. See also [8], [32], [33], [34], [35] for studies concerning the complex Cram´er-Rao bound.

case. Differentiating L(Σ) w.r.t. Σ by using complex matrix differentiation rules (6) and (7) and equating to zero shows that the MLE is a solution of the estimating equation

IV. C OMPLEX ELLIPTICALLY SYMMETRIC (CES) DISTRIBUTIONS

A. Circular CES distributions Let us first recall that an r.v. z = x + jy is said to have a centered (symmetric about zero) circular complex normal (CN) distribution if its p.d.f. is of the form fCN (z|C) = π −d det(C)−1 exp(−zH C −1 z).

fCE (z|Σ) = C · det(Σ)−1 g(zH Σ−1 z), where g(·) is a non-negative function, called the density generator, Σ is a positive definite symmetric matrix, called the scatter matrix and C is a normalizing constant that could be absorbed into function g except that the above notation g can be independent of the dimension d. We shall denote z ∼ CEd (Σ, g). The covariance matrix exist if g(·) satisfies ∞ d 0 u g(d)du < ∞ in which case the scatter matrix parameter Σ is equal up to a positive real scalar to the complex covariance matrix, i.e. aΣ = cov(z) for a > 0. The circular CN distribution (14) is obtained with g(t) = exp(−t) (and C = π −d ) and in this case Σ = C. Since Σ is proportional to C (provided it exists), a MLE of the scatter matrix Σ provide a competitor to the convenˆ = 1 n zi zH tional sample covariance matrix (SCM) C i i=1 n in applications where an estimate of the covariance matrix C is required up to a constant. For example, the MVDR beamformer weights require the covariance matrix up to a constant, and thus a scatter matrix estimator can be used instead of the conventional sample covariance matrix [38]. Let z1 , . . . , zn be an independent and identically distributed (i.i.d.) sample (n > d) from a circular CES distribution CEd (Σ, g). The MLE of Σ is found by minimizing the negative of the log-likelihood function, L(Σ)  −

n 

where

(14)

We shall denote such an r.v. with the notation z ∼ CNd (C). The p.d.f. closely resembles the classical real multivariate normal distribution which is made possible by imposing an assumption that the composite real r.v. z¯ = (xT , yT )T has real multivariate normal distribution with real covariance matrix of special form (13), i.e. P = 0. A natural extension of the centered circular CN distribution is obtained by allowing x and y to possess a 2d-variate centered real elliptically symmetric distribution (RES) with the restriction as in (13) on its scatter matrix parameter. This class of distributions are called circular complex elliptically symmetric (CES) distributions, the properties of which are studied in [36], [37]. The p.d.f. is then of the form

1 −1 ψML (zH zi )zi zH i Σ i , n i=1

(15)

ψML (t)  −g  (t)/g(t)

(16)

n

Σ=

is a weight function that depends on the density generator g(·) d of the underlying circular CES distribution and g  (t) = dt g(t)  denotes the derivative of g. The MLE Σ solves the estimating equation (15) and thus can be interpreted as a weighted covariance matrix. Note, however, that equation (15) is implicit as the weights on the right hand side depends on Σ. In general, the obtained MLE is robust if the corresponding weight function ψML (·) descends to zero. This is needed so that small weights are given to observations zi that are highly  −1 zi . outlying in terms of the distance zH i Σ In the case of a circular CN distribution, g(t) = exp(−t), which yields ψML ≡ 1. This shows the well-known result that ˆ is also the MLE of the parameter Σ (= C in this case). C Another example is the d-variate circular t-distribution with ν degrees of freedom (d.f), denoted z ∼ Ctd,ν (Σ), obtained with g(t) = (1 + 2t/ν)−(2d+ν)/2 . The value ν = 1 corresponds to the circular Cauchy distribution, and the circular CN distribution is obtained at the limit ν → ∞. Based on (16), the MLE of Σ, labelled MLT(ν), is now obtained with weight function 2d + ν . (17) ν + 2t Note that MLT(1) is the highly robust estimator corresponding to MLE of Σ for the complex circular Cauchy  as ν → distribution (ν = 1), and that MLT(ν) → C ∞. This means that the robustness of MLT(ν) estimators decreases with increasing values of ν (as expected). MLT(ν) estimators can be computed by a simple iterative algorithm outlined in [39]. The weight function (17) of MLT(ν) estimators for selected values of ν is depicted below: ψML (t) =

5 4.5

= =25 =5 =1

4 3.5 3 2.5 2 1.5 1

log fCE (zi |Σ)

i=1

= n log |Σ| −

0.5 0

n 

0

1

2

3

4

5

6

7

8

x

−1 log g(zH zi ), i Σ

i=1

where we have omitted the constant term log(C) since it does not depend on the unknown parameter Σ. Let us now illusrate how the CR-calculus can be used in deriving the MLE’s in this

Note that the larger the value of the d.f. parameter ν is, the closer is the weight function to the unity weight ψML ≡ 1  corresponding to the SCM C. As in the real case, we can generalize the ML estimating equation (15), by defining a complex M -estimator of scatter

 as the PDH d × d matrix solving the [39], denoted by C, estimating equation C=

1 n

n 

−1 ψ(zH zi )zi zH i C i ,

i=1

where ψ is any real-valued weight function on [0, ∞). Hence M -estimators constitute a wide class of scatter matrix estimaˆ of the scatter parameter Σ of tors that include the MLE’s Σ the circular CES distributions as important special cases. M estimators can be calculated by a simple iterative algorithm; see [39] and [38]. For example, the choice ψ(x) = d/x yields a higly robust estimator that is an extension of Tyler’s M estimator to the complex case; See[39].

V. T ESTING FOR CIRCULARITY ASSUMPTIONS Recall that the optimal detection and estimation techniques are often different for circular and non-circular cases. Therefore circularity detectors have been under active research in the recent literature; see [41], [42], [29], [17], [43], [44], [45], [4], [46], [22], [47]. Herein we describe the generalized likelihood ratio test (GLRT) of circularity assuming complex normal data derived in [41] and further studied in [29], [43], [45] and [46]. Also discussed is the adjusted GLRT of circularity [45] If the null hypothesis of circularity is rejected, it is then useful to test the extent of non-circularity, i.e. how many circularity coefficients differ from zero. GLRT for such hypotheses developed in [48] will also be reviewed. A. GLRT of circularity

B. Generalized CES distribution In the seminal works [40] and [26] an intuitive complexvalued expression for the normal density was derived without the unnecessary second-order circularity assumption (13). The extension of the RES distribution for the non-circular case was proposed and studied in [41], [22]. In these works, e.g. in [40], [41] the derived extension is called the generalized CN distribution or generalized CES distribution. Herein we drop the prefix “generalized” for convenience. A key result in deriving the RES distribution into intuitive complex form is the augmented signal model, namely the realto-complex (invertible) linear transformation:      I jI x z ˜ = z= I −jI y z∗ R.v. z = x + jy is said to have a centered CES distribution if its p.d.f. is of the form H ˜ −1/2 g( 1 ˜ z), f (z|Γ) = C det(Γ) 2 z Γ˜

where

 Γ=

Σ Ω∗

Ω Σ∗

 (18)

is a PDH 2d × 2d matrix and, as earlier, g(·) is a nonnegative function, called the density generator and C is a normalizing constant. The parameter Σ is a PDH d×d matrix, called the scatter matrix, and the parameter Ω is a complex symmetric d × d matrix, called the pseudo-scatter matrix. We shall write z ∼ CEd (Σ, Ω, g) to denote this property. Note that the circular CES model is obtained if Ω = 0 and thus CEd (Σ, g) ≡ CEd (Σ, 0, g). The scatter matrix and pseudoscatter matrix are equal to the covariance matrix and pseudocovariance matrix (provided they exist) up to positive real constant, i.e. cov(z) = aΣ and pcov(z) = Ω for some a > 0. As in the circular case, the CN distribution and complex tdistributions with ν d.f. are obtained with density generator being g(t) = exp(−t) and g(t) = (1 + 2t/ν)−(2d+ν)/2 , respectively. The former will be denoted by CNd (C, P) for short.

Let z1 , . . . , zn denote an i.i.d. random sample drawn from CNd (Σ, Ω) distribution amd write Zn = (z1 · · · zn ). We first consider testing hypothesis H0 : P = 0 ⇔ λi = 0, i = 1, . . . , d

(19)

aganst the general alternative H1 : P = 0 ⇔ λi = 0 for atleast one i, i = 1, . . . , d, (20) where λi , i = 1, . . . , d are circularity coefficients of the r.v. z, i.e. the square-roots of the eigenvalues of RR∗ , where R = C −1 P. Under H0 the joint p.d.f. of the sample Zn is simply  f0 (Zn ; C) = π −dn det(C)−n exp{−nTr(C −1 C)},  is the sample covariance matrix. The joint p.d.f. under where C H1 becomes  f1 (Zn ; C, P) = π −dn det(Γ)−n/2 exp{− n2 Tr(Γ−1 Γ)},  is the sample covariance matrix of the augmented where Γ T sample ˜zi = (zTi , zH i ) , i = 1, . . . , n and Γ is defined in (18). Recall from Section IV-A that the MLE of the covariance  Similarly, the matrix C is the sample covariance matrix C.  MLE of the Γ is its

sample estimate Γ. The GLRT statistic is  1 (Zn ; C,  P)  , i.e. the constant then n  −2 log f0 (Zn ; C)/f -2 times the logarithm of the ratios of the p.d.f.’s maximized under the null H0 and the general (unrestricted) alternative H1 . The statistic can be expressed in the form [41], [29]: R  ∗ )] = −n ln n = −n ln[det(I − R

d 

 ˆ2 ) , (1 − λ i

i=1

ˆi , i = 1, . . . , d are the MLE’s of the circularity where λ  It should be noted that the first  =C  −1 P. coefficients and R expression for n is computationally simpler as it avoids the explicit computation of the circularity coefficients. The latter representation nicely reveals that the test statistic is invariant under invertible linear transformations of the data (since it is a function of the circularity coefficients only). Based on the standard ML theory the statistic n possesses an asymptotic chi-squared distribution with p = d(d + 1) degrees of freedom under the null hypothesis [45], [43]. In

[43, Sect. VII-B] it was further shown that a multiplier (n−d) instead of n in n provides a small sample adjustment [43, Sect. VII-B] for the test. The GLRT rejecst the null if n exceeds the (1 − α)th quantile of a chi-squared distribution with p d.f. The above test is then asymptotically valid with probability of false alarm (PFA) (or, type I error) equal to α. The adjusted GLRT statistic [45] of circularity is obtained by dividing the GLRT statistic n by an adjustment factor γ j 1 δn  , d i=1 2 + |ˆ j |2 d

(21)

where ˆj and γˆj are the sample estimates of the circularity coefficient (zj ) and the standardized 4th-order moment γ(zj ) defined in (11) of the jth marginal variable zj , respectively (j = 1, . . . , d). The adjusted GLRT statistic, n,adj  n /δn , possesses the same asymptotic χ2p -distribution under H0 as the non-adjusted test under more general assumption of sampling from an unspecified CES distribution CEd (C, P, g) with finite 4th-order moments. B. Testing the extent of non-circularity If the hypothesis of circularity H0 is rejected, then it is of interest to study the extent of non-circularity (EONC), i.e. how many circularity coefficients differ from zero. Thus we test the null hypothesis that d − k circularity coefficients vanish: (k)

H0

: λk+1 = . . . = λd = 0 (0)

against the general alternative H1 in (20). Note that H0 ≡ H0 . We assume that z1 , . . . , zn is an i.i.d. random sample from CN distribution CNd (C, P). A GLRT statistic becomes [48]

 P)  n (k)  −2 ln max f1 (Zn ; C, P (k) )/f1 (Zn ; C, C,P (k)

= −n ln

d  

 ˆ2 ) (1 − λ i

(22)

i=k+1

where P (k) denotes pseudo-covariance matrix with rank r(P (k) ) = k and f1 is the p.d.f. of the sample under H1 as stated earlier. Based on standard likelihood ratio testing theory, the degrees of freedom of the limiting χ2 distribution of the test statistic n (k) is pk  d(d − 1) − k(2d − k + 1) = (d − k)(d − k − 1). Note that if k = 0, the test statistic (k) reduces to the usual GLRT of circularity. The GLRT for H0 rejects the null if n (k) exceed (1 − α)th quantile of the χ2 distribution with pk d.f. We can now devise a (GLRT-based) detector for the number k of non-circular sources that proceeds as follows [48]: Choose PFA (e.g., α = 0.05) and iterate for k = 0, 1, 2, . . . 1) Calculate n (k) 2) Estimate kˆ is the first value of k for which n (k) ≤ χ2pk ;1−α where pk = (d − k)(d − k + 1) i.e. until the (k) null hypothesis H0 is not rejected. The detector above is referred to as the EONC detector. We note that PFA α = 0.05 is a recommended choise. An estimator based on minimum description length (MDL) principle was also derived in [48].

VI. C OMPLEX - VALUED INDEPENDENT COMPONENT ANALYSIS (ICA) Complex-valued ICA model is needed for source separation of complex-valued data arising e.g. in magnetic resonance imaging or antenna array signal processing of communications and radar signals. We assume that the observed r.v. z = (z1 , . . . , zd )T follows the complex-valued ICA model z = As, where r.v. s = (s1 , . . . , sd )T contain the statistically independent unobserved complex sources (or IC’s) and A represents the unknown complex d × d invertible mixing matrix. Each source r.va. si is assumed to be non-degenerate, at most one source can possess a circular CN density, and there cannot be two complex CN sources having the same circularity coefficient (i.e. (si ) = (si ) ∀i = j). Under these assumptions, the goal of ICA is to estimate the demixing matrix W = A−1 (or, separting matrix) giving the source r.v. as the inverse linear transform s = Wz. It should be noted that under the above assumptions it is possible to identify A (resp. W) up to scaling, phase-shift and permutation of its column (resp. row) vectors [20]. The row vectors wi∗ of W = (w1 · · · wd )H are called as the demixing vectors and si = wiH z, i = 1, . . . , d. There are in general two main approaches to solve the ICA problem: optimization of a contrast function, and algebraic methods. These two approaches are discussed briefly herein. For example ML-estimation [49], [50] and complex FastICA [51], [52] belong to the former group and the FOBI method [53] and DOGMA method [31] to the latter group. These methods will be described next. A. ML approach to ICA Let us first discuss the ML estimation approach studied e.g. in [49] and [50]. This leads to minimimization of the negative log-likelihood function, namely, a real-valued function of a complex-valued matrix parameter. Suppose that z = x + jy follows the ICA model as described above. Then it is easy to verify that the p.d.f. f (z) = f (x, y) of z = x + jy = As can be written as f (z) = | det(W) det(W∗ )|g(Wz) d where g(s) = g(u, v) = i=1 gi (si ) is the p.d.f. of the source r.v. s = u + jv and gi (s) = gi (u, v) denotes the p.d.f. of the ith source r.va. si . Then the negative log-likelihood function becomes (W, W∗ )  − log f (z, z∗ ) = − log | det(W) det(W∗ )| −

d 

log gj (wjH z, wjT z∗ ), (23)

j=1

where we have expressed the function (W) using dual notation (W, W∗ ) for optimization purposes. Straightforward use of the CR-calculus and the related matrix differential rules such as (8) gives ∂ = −W−H + ϕ(Wz)zH ∂W∗

(24)

where ϕ(s) ≡ ϕ(s, s∗ ) = (ϕ1 (s1 ), . . . , ϕd (sd ))T and ∗

∂ log gj (s, s ) ∗  ∂s  log gj (u, v) 1 log gj (u, v) +j =− 2 ∂u ∂v

ϕj (s)  −

(25)

is the score function of the jth source variable, j = 1, . . . , d. Suppose now that z1 , . . . , zn is an i.i.d. random sample from the ICA model. Then the (1/n)× the negative n log-likelihood function of the sample is L(W) = n1 i=1 i (W), where i denotes the log-likelihood function (23) of the ith sample vector zi . Now using (24) the score function of the sample can be expressed as ∂L 1 = −W−H + ϕ(Wzi )zH i . ∗ ∂W n i=1 The ML estimator of the demixing matrix, defined as the minimizer of the negative log-likelihood function of the sample, ˆ ML = arg minW L(W), satisfies the estimating equation W ∂L 1 =0⇔ ϕ(Wzi )(Wzi )H = I ∗ ∂W n i=1 n

(26)

that is, the sample covariance between the estimated sources ˆ ML zi and their score vectors ϕ(W ˆ ML zi ) is equal to an W identity matrix. This is a natural criterion since using a similar approach as in the real case [54, Lemma 1] we have that  1, if i = j ∗ E[si ϕj (sj )] = 0, if i = j where si and sj are the source variables and ϕj the score function (25) of the jth source. The ML-estimate can be found using e.g. the extention of the relative gradient algorithm to the complex case, which iterates for small enough step-size μ [50], ∂L WH W W ←W−μ ∂W∗

1 = W+μ I− ϕ(Wzi )(Wzi )H W, n i=1 until convergence. Comparing the above update rule and the estimating equation (26), one observes that the algorithm stops when the current iterate of W satisfies the estimating equations (26) up to some predetermined numerical accuracy which can be measured by any matrix norm. B. Complex FastICA The complex FastICA algorithm [51], [52] is among the most used methods to solve the complex ICA problem. It is commonly formulated for whitened data but herein we formulate the method without the unnecessary pre-whitening stage as in [55]. Furthermore, we take different approach compared to [51] and [52] and focus on deriving the FastICA estimating equations using the CR-calculus. The estimating equations reveal interesting new properties of the FastICA estimator as well as some suprising connections to algebraic methods such as FOBI and DOGMA.

We start with the review of FastICA 1-unit approach in which only a single demixing vector w ∈ Cd is sought. To simplify the presentation, let us assume now that the r.v. z is centered so that E[z] = 0 holds. The 1-unit FastICA estimator (functional) wg,1 (F ) is defined as a solution to a constrained optimization problem w1 = arg max

wH Cw=1

| E[G(|wH z|2 )] |,

(27)

where G : R → R is a smooth nonlinear, twice continuously differentiable even function and C = cov(z). The constraint on the scale of w is necessary for the problem to be well-posed and the criterion wH Cw = 1 is natural as it also implies that the found source s = wH z is scaled to be of unit variance. Note that more accurate notation would be to write wG,1 instead of w1 to emphasize the dependence of the solution on the choise of G: for example, the robustness properties of the estimator depend heavily on the choise of G. The 1-unit FastICA functional w1 is a vector w optimizing the Lagrangian function L(w)  E[G(|wH z|2 )]−λ(wH Cw− 1), where λ is the Lagrange multiplier. The FastICA functional w1 can be found as the solution of the estimating equation (obtained by calculating the differential of the Lagrangian w.r.t w∗ using CR-calculus and equating to zero): E[g(|wH z|2 )zzH ]w − λCw = 0 ⇔ C −1 [Q(w)]w = λw where

Q(w)  E[g(|wH z|2 )zzH ]

is a d × d PDH matrix. Multiplying both sides of the equation by wH C from the left and recalling the constraint wH Cw = 1 shows that λ = E[g(|wH z|2 )|wH z|2 ] = wH [Q(w)]w and thus the Lagrange multiplier depends on g(·) and on the solution w as well. To conclude, the 1-unit FastICA functional w1 solves the following estimating equation: C −1 [Q(w)]w = (wH [Q(w)]w)w.

(28)

In other words, the 1-unit FastICA solution w1 is the eigenvector of C −1 Q(w1 ) with the corresponding eigenvalue given by λ = w1H [Q(w1 )]w1 , where the eigenvector is assumed to be standardized to satisfy wH Cw = 1 (instead of the conventional unit norm standardization). The estimating equation reveals the connection with the DOGMA method which will be elaborated in the next section. Define a complex scalar as a(w)  E[g(|wH z|2 )] − E[g  (|wH z|2 )|wH z|2 ]. With the above notation, the constrained 1-unit FastICA optimization problem can be solved by 1-unit FastICA algorithm [51] (assuming circular sources) that finds w1 by iterating the steps Step 1. w ← C −1 [Q(w)]w − [a(w)]w Step 2. w ← w/(wT Cw)1/2

until convergence, i.e. until current and previously found vectors are practically parallel. Note that herein we have represented the FastICA algorithm without the (unnecessary) whitening step. Observe now that the FastICA algorithm can be interpreted as a modified power method for finding an eigenvector: the first term in the update in Step 1 is a regular power method update and the second term a(w)w can be interpreted as a correction factor adjusting the direction in order to find the eigenvector corresponding to the optimum of E[G(|wH z|2 )]. Extracting more than a single source is possible using a deflationary approach in which another source is found using additional uncorrelatedness constraints with the previously extracted sources. To be more specific, at the kth deflationstage, a constraint of orthogonality of w measured w.r.t. C with the previously found FastICA demixing vector estimators w1 , . . . , wk−1 is imposed, i.e. wH Cwi = 0

for i = 1, . . . , k − 1.

(29)

This means that w yields a projection s = wH z that is uncorrelated with the previously found sources wiH z, i = 1, . . . , k − 1. The k th FastICA functional wk is defined as    wk = arg max  E G(|wT z|2 )  subject to (29). wT Cw=1

The Lagrangian ofthe problem is L(w) = E[g(|wH z|2 )] − k−1 λk (wH Cw −1)− j=1 λj wH Cwj , where λ1 , . . . , λk are the Lagrange multipliers. Again, using the same approach as in the 1-unit FastICA (i.e. equating the differential of the Lagrangian w.r.t w∗ using the CR-calculus to zero and recalling the constraints) shows that the the kth FastICA functional solves the estimating equation: k−1

 wj wjH [Q(w)]w = (wH [Q(w)]w)w, C −1 − j=1

that is, the FastICA solution wk is the eigenvector of (C −1 − k−1 H j=1 wj wj )Q(wk ) with the corresponding eigenvalue λ = wkH [Q(wk )]wk and the eigenvector is standardized to satisfy wH Cw = 1. C. Algebraic methods In the FOBI method [53], the demixing matrix is calculated algebraically from the matrix product of the inverse of the covariance matrix C(x) and the kurtosis matrix K  E (zH C −1 z)zzH , where z is assumed to be centered. The kurtosis matrix has the properties: (i) equivariance under invertible linear transformations: i.e. K(Gz) = GK(z)GH for any non-singular d × d matrix G (i.e. it is a scatter matrix [31]), and (ii) the IC-property: If s has independent components of zero mean, then K(s) reduces to a diagonal matrix. Let γi = γi (si ) denote the normalized 4th-order moment (11) of the ith source si , i = 1, . . . , d. Then, provided that γi = γj holds for all i = j, it can be shown that [31, Theorem 4] [C −1 K]WH = diag(γ1 + d − 1, . . . , γd + d − 1)WH ,

that is, the demixing vectors w1 , . . . , wd are the eigenvectors of the matrix C −1 K and the corresponding eigenvalues are γi + d − 1 (i = 1, . . . , d). Then the eigenvectors ei of the matrix C −1 K are called FOBI demixing vector estimators (i = 1, . . . , d). Now recall that the 1-unit FastICA demixing estimator w is an eigenvector of the matrix C −1 E[g(|wH z|2 )zzH ]. The difference is that the above equation is implicit since the w as well. matrix E[g(|wH z|2 )zzH ] depends on (eigenvector)  In the case of FOBI, the eigenvector of C −1 E (zH C −1 z)zzH can be calculated explicitly via any standard eigenvector extraction routine. FOBI is arguably among the simplest and computationally among the most efficient methods to solve the ICA problem. The FOBI method was generalized [31] by showing that any two scatter matrices with the IC-property, say C and K, can be used in place of the covariance matrix C and the kurtosis matrix K in obtaining a demixing matrix estimator. Furthermore, in [31] it was shown that the assumption of affine equivariance of the second scatter matrix can be relaxed to equivariance under unitary linear transformations. For example, two distinct robust complex M -estimators of scatter can be used in obtaining a robust demixing matrix estimator for ICA. We refer the reader to [31] for examples and details. VII. C ONCLUSIONS AND DISCUSSIONS Complex random signals play an increasingly important role in array, communications, and biomedical signal processing and related fields. The wider deployment of complexvalued signal processing is often hindered by the fact that the concepts, tools and algorithms for handling complex-valued signals are lacking, or, are simply too scattered in the literature. Due to extensive research in this area during the past few years, as reviewed here, these obstacles no-longer exist, or, are atleast less pronounced. We also wish to point out that due to lack of space, several important topics in complex-valued signal processing were not discussed here, for example widely linear processing and filters; see the seminal papers [13], [15] and the recent textbook [2]. ACKNOWLEDGMENT Esa Ollila and Visa Koivunen would like to thank Academy of Finland for supporting their research. The paper was prepared in part under the support of U.S. Office of Naval Research under Grant N00014-09-1-0342. R EFERENCES [1] P. Schreier and L. Scharf, Statistical Signal Processing of ComplexValued Data: The Theory of Improper and Non-circular signals. Cambridge, UK: University Press, 2010. [2] D. P. Mandic and V. S. L. Goh, Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models. New York: Wiley, 2009. [3] K. Kreutz-Delgado, “The complex gradient operator and the CRcalculus,” Lecture Notes Supplement [online], 2007. [4] J. Eriksson, E. Ollila, and V. Koivunen, “Essential statistics and tools for complex random variables,” IEEE Trans. Signal Processing, vol. 58, no. 10, pp. 5400–5408, 2010. [5] R. Remmert, Theory of Complex Functions. New York: Springer, 1991.

[6] H. Li and T. Adali, “A class of complex ICA algorithms based on the kurtosis cost function,” IEEE Trans. Neural Networks, vol. 19, no. 3, pp. 408–420, 2008. [7] E. Ollila, “Contributions to independent component analysis, sensor array and complex-valued signal processing,” Ph.D. dissertation, Aalto University School of Science and Technology, Espoo, Finland, 2011, ISBN 978-952-60-3030-2. [8] S. M. Kay, Fundamentals of Statistical Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 1993. [9] D. H. Brandwood, “A complex gradient operator and its applications in adaptive array theory,” IEE Proc. F and H, vol. 1, pp. 11–16, 1983. [10] A. van den Bos, “Complex gradient and Hessian,” IEE Proc.-Vis. Image Signal Process., vol. 141, no. 6, pp. 380–382, 1994. [11] K. Kreutz-Delgado and Y. Isukapalli, “Use of the Newton method for blind adaptive equalization based on the constant modulus algorithm,” IEEE Trans. Signal Processing, vol. 56, no. 8, pp. 3983–3995, 2008. [12] J. Eriksson, E. Ollila, and V. Koivunen, “Statistics for complex random variables revisited,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP’09), Taipei, Taiwan, 2009, pp. 3565 – 3568. [13] B. Picinbono and P. Chevalier, “Widely linear estimation with complex data,” IEEE Trans. Signal Processing, vol. 43, no. 8, pp. 2030–2033, 1995. [14] A. Hjorungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,” IEEE Trans. Signal Processing, vol. 55, pp. 2740–2746, 2007. [15] B. Picinbono, “On circularity,” IEEE Trans. Signal Processing, vol. 42, no. 12, pp. 3473–3482, 1994. [16] K.-T. Fang, S. Kotz, and K. W. Ng, Symmetric Multivariate and Related Distributions. London: Chapman and Hall, 1990. [17] E. Ollila, “On the circularity of a complex random variable,” IEEE Signal Processing Letters, vol. 15, pp. 841–844, 2008. [18] F. D. Neeser and J. L. Massey, “Proper complex random processes with applications to information theory,” IEEE Trans. Inform. Theory, vol. 39, no. 4, pp. 1293–1302, 1993. [19] P. J. Schreier and L. L. Scharf, “Second-order analysis of improper complex random vectors and processes,” IEEE Trans. Signal Processing, vol. 51, no. 3, pp. 714–725, 2003. [20] J. Eriksson and V. Koivunen, “Complex random vectors and ICA models: Identifiability, uniqueness and seperability,” IEEE Trans. Inform. Theory, vol. 52, no. 3, pp. 1017–1029, 2006. [21] J.-P. Delmas and H. Abeida, “Asymptotic distribution of circularity coefficients estimate of complex random variables,” Signal Processing, vol. 89, no. 12, pp. 2670–2675, 2009. [22] E. Ollila, J. Eriksson, and V. Koivunen, “Complex elliptically symmetric random variables – generation, characterization, and circularity tests,” IEEE Trans. Signal Processing, vol. 59, no. 1, pp. 58 – 69, 2011. [23] P. Amblard, M. Gaeta, and L. Lacoume, “Statistics for complex variables and signals - part I: variables,” Signal Processing, vol. 53, no. 1, pp. 1–13, 1996. [24] S. C. Douglas, “Fixed-point algorithms for the blind separation of arbitrary complex-valued non-Gaussian signal mixtures,” EURASIP J. Advances in Signal Processing, vol. 2007, no. 1, pp. 83–83, 2007. [25] E. Ollila and V. Koivunen, “Robust estimation techniques for complexvalued random vectors,” in Adaptive Signal Processing: Next Generation Solutions, T. Adali and S. Haykin, Eds. New York: Wiley, 2010. [26] B. Picinbono, “Second order complex random vectors and normal distributions,” IEEE Trans. Signal Processing, vol. 44, no. 10, pp. 2637– 2640, 1996. [27] R. A. Horn and C. A. Johnson, Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [28] E. Ollila and V. Koivunen, “Complex ICA using generalized uncorrelating transform,” Signal Processing, vol. 89, no. 5, pp. 365–377, 2009. [29] P. J. Schreier, L. L. Scharf, and A. Hanssen, “A generalized likelihood ratio test for impropriety of complex signals,” IEEE Signal Processing Letters, vol. 13, no. 7, pp. 433–436, 2006. [30] J. Eriksson and V. Koivunen, “Complex-valued ICA using second order statistics,” in Proc. IEEE Workshop on Machine Learning for Signal Processing (MLSP’04), Sao Luis, Brazil, 2004. [31] E. Ollila, H. Oja, and V. Koivunen, “Complex-valued ICA based on a pair of generalized covariance matrices,” Comp. Stat. & Data Anal., vol. 52, no. 7, pp. 3789–3805, 2008. [32] A. van den Bos, “A Cram´er-Rao lower bound for complex parameters,” IEEE Trans. Signal Processing, vol. 42, no. 10, p. 2859, 1994.

[33] E. de Carvalho, J. Cioffi, and D. T. M. Slock, “Cram´er-Rao bounds for blind multichannel estimation,” in Proc. IEEE Global Telecommunications Conference, San Francisco, CA, USA, Nov. 27 – Dec. 1, 2000. [34] A. K. Jagannatham and B. D. Rao, “Cram´er-Rao lower bound for constrained complex parameters,” IEEE Signal Proc. Lett., vol. 11, no. 11, pp. 875–878, 2004. [35] S. T. Smith, “Statistical resolution limits and the complexified Cram´erRao bound,” IEEE Trans. Signal Processing, vol. 53, no. 5, pp. 1597 – 1609, 2005. [36] P. R. Krishnaiah and J. Lin, “Complex elliptically symmetric distributions,” Comm. Statist. - Th. and Meth., vol. 15, pp. 3693–3718, 1986. [37] C. G. Khatri and C. D. Bhavsar, “Some asymptotic inferential problems connected with complex elliptical distribution,” J. Mult. Anal., vol. 35, pp. 66–85, 1990. [38] E. Ollila and V. Koivunen, “Influence function and asymptotic efficiency of scatter matrix based array processors: Case MVDR beamformer,” IEEE Trans. Signal Processing, vol. 57, no. 1, pp. 247 – 259, 2009. [39] ——, “Robust antenna array processing using M-estimators of pseudocovariance,” in Proc. 14th IEEE Int. Symp. on Personal, Indoor and Mobile Radio Comm. (PIMRC’03), Beijing, China, Sep. 7–10, 2003, pp. 2659–2663. [40] A. van den Bos, “The multivariate complex normal distribution - a generalization,” IEEE Trans. Inform. Theory, vol. 41, no. 2, pp. 537–539, 1995. [41] E. Ollila and V. Koivunen, “Generalized complex elliptical distributions,” in Proc. Third IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM’04), Barcelona, Spain, Jun. 18–21, 2004, pp. 460 – 464. [42] P. J. Schreier, L. L. Scharf, and C. T. Mullis, “Detection and estimation of improper complex random signals,” IEEE Trans. Inform. Theory, vol. 51, no. 1, pp. 306–312, 2005. [43] A. T. Walden and P. Rubin-Delanchy, “On testing for impropriety of complex-valued Gaussian vectors,” IEEE Trans. Signal Processing, vol. 57, no. 3, pp. 835–842, 2009. [44] M. Novey, T. Adali, and A. Roy, “Circularity and Gaussianity detection using the complex generalized Gaussian distribution,” IEEE Signal Processing Lett., vol. 16, no. 11, pp. 993–996, 2009. [45] E. Ollila and V. Koivunen, “Adjusting the generalized likelihood ratio test of circularity robust to non-normality,” in Proc. 19th IEEE Workshop on Signal Processing Advances in Wireless Communications (SPAWC’09), Perugia, Italy, Jun. 21–24, 2009, pp. 558–562. [46] J. Delmas, A. Oukaci, and P. Chevalier, “Asymptotic distribution of GLR for improprierity of complex signals,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP’10), Dallas, TX, USA, 2010, pp. 3594 – 3597. [47] E. Ollila, V. Koivunen, and H. V. Poor, “A robust estimator and detector of circularity of complex signals,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP’11), Prague, Czech Republic, 2011. [48] M. Novey, E. Ollila, and T. Adalı, “On testing the extent of noncircularity,” IEEE Trans. Signal Processing, 2011 (to appear). [49] J.-F. Cardoso and T. Adalı, “The maximum likelihood approach to complex ICA,” in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP’06), vol. V, Toulouse, France, 2006, pp. 673 – 676. [50] T. Adalı, H. Li, M. Novey, and J.-F. Cardoso, “Complex ICA using nonlinear functions,” IEEE Trans. Signal Processing, vol. 56, no. 9, pp. 4536 – 4544, 2008. [51] E. Bingham and A. Hyvarinen, “A fast fixed-point algorithm for independent component analysis of complex-valued signals,” Int. J. of Neural Systems, vol. 10, no. 1, pp. 1–8, 2000. [52] M. Novey and T. Adali, “On extending the complex FastICA algorithm to noncircular sources,” IEEE Trans. Signal Processing, vol. 56, no. 5, pp. 2148–2154, 2008. [53] J. F. Cardoso, “Source separation using higher order moments,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’89), Glasgow, UK, 1989, pp. 2109–2112. [54] E. Ollila, H.-J. Kim, and V. Koivunen, “Compact Cram´er-Rao bound expression for independent component analysis,” IEEE Trans. Signal Processing, vol. 56, no. 4, pp. 1421–1428, April 2008. [55] E. Ollila, “The deflation-based FastICA estimator: statistical analysis revisited,” IEEE Trans. Signal Processing, vol. 58, no. 3, pp. 1527– 1541, 2010.