Lagrange Programming Neural Network for ... - IEEE Xplore

0 downloads 0 Views 2MB Size Report
Sep 15, 2017 - and the constraints should be twice differentiable. Since sparse ... Index Terms— Lagrange programming neural networks. (LPNNs), locally ...
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

2395

Lagrange Programming Neural Network for Nondifferentiable Optimization Problems in Sparse Approximation Ruibin Feng, Chi-Sing Leung, Senior Member, IEEE, Anthony G. Constantinides, Life Fellow, IEEE, and Wen-Jun Zeng

Abstract— The major limitation of the Lagrange programming neural network (LPNN) approach is that the objective function and the constraints should be twice differentiable. Since sparse approximation involves nondifferentiable functions, the original LPNN approach is not suitable for recovering sparse signals. This paper proposes a new formulation of the LPNN approach based on the concept of the locally competitive algorithm (LCA). Unlike the classical LCA approach which is able to solve unconstrained optimization problems only, the proposed LPNN approach is able to solve the constrained optimization problems. Two problems in sparse approximation are considered. They are basis pursuit (BP) and constrained BP denoise (CBPDN). We propose two LPNN models, namely, BP-LPNN and CBPDN-LPNN, to solve these two problems. For these two models, we show that the equilibrium points of the models are the optimal solutions of the two problems, and that the optimal solutions of the two problems are the equilibrium points of the two models. Besides, the equilibrium points are stable. Simulations are carried out to verify the effectiveness of these two LPNN models. Index Terms— Lagrange programming neural networks (LPNNs), locally competitive algorithm (LCA), optimization.

I. I NTRODUCTION

U

SING analog neural networks for solving nonlinear constrained optimization problems has been studied over many decades [1]–[5]. The analog neural network approach is more effective, when realtime solutions are required [1]–[3], [6]. The use of neural networks for optimization could be dated back at least to the 1980s [2], [7]. In [7], the Hopfield model was demonstrated to have the ability for solving several optimization problems. In [3], a canonical nonlinear programming circuit was proposed to solve nonlinear programming problems with inequality constraints. A recurrent neural network model [8] was proposed for quadratic optimization with bound constraints, and its convergent proof was given in [9].

Manuscript received November 27, 2015; revised May 28, 2015, January 12, 2016, and April 9, 2016; accepted May 19, 2016. Date of publication November 27, 2015; date of current version September 15, 2017. This work was supported by the Research Grants Council, Hong Kong, under Grant Number, CityU 115612. R. Feng, C.-S. Leung, and W.-J. Zeng are with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong (e-mail: rfeng4-c@ my.cityu.edu.hk; [email protected]; [email protected]). A. G. Constantinides is with Imperial College London, London SW7 2AZ, U.K. (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2016.2575860

Based on the concept of variational inequalities [10]–[12], a number of projection neural network models [13]–[16] for constrained optimization problems were proposed, in which a projection circuit is required. For simple constraints, such as a box set, the projection circuit is very simple. However, when complicated constraints are considered, the projection circuit is difficult to be implemented. Recently, a projection neural network model [17] was proposed for handling complex variables. In [18]–[20], the concept of projection neural networks was extended to handle l1 -norm problems. When we use these models (in which the objective function contains an l1 -norm term), the number of neurons is doubled. Many existing models are designed for solving a particular form of optimization problems. For example, in [5], the model was designed for the quadratic programming problem with equality constrains. The Lagrange programming neural network (LPNN) approach [21]–[25] provides a general framework for solving various nonlinear constrained optimization problems. Furthermore, with the augmented term concept, the LPNN approach is able to solve nonconvex optimization problems. Although the LPNN approach was developed in the early 1990s, the formal proof [26], [27] of the global convergence for convex problems was given in the early 2000s. Recently, some new applications of using the LPNN approach [24], [25], including target localization in Multi-input Multi-output and waveform design in radar systems, were reported. In these signal processing applications, the optimization problems are nonconvex, and the LPNN approach is superior to the traditional numerical approaches. However, the major limitation of LPNN is that it cannot handle nondifferentiable objective functions and constraints. Sparse approximation [28], [29] aims at recovering a unknown sparse signal. Its objective function or the constraint usually is not differentiable. There are many digital algorithms (numerical algorithms) for sparse approximation. For example, log-barrier and spectral projected gradient (SPG) [30] are two representative methods in some sparsity approximation packages [31], [32]. In [33], the locally competitive algorithm (LCA) was proposed for solving the unconstrained basis pursuit denoise problem [34]. Unlike the conventional model, the analysis of the LCA is difficult because of using a nonsmooth, unbounded and not strictly increasing activation function. The LCA

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2396

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

properties were reported in [35] and [36]. To escape from the differentiable requirement, the LCA introduces the internal state concept. It then defines the dynamics for the internal states. However, the LCA is able to handle unconstrained problems only. In sparse approximation, there are many constrained problems. There are some neural-based optimization algorithms for nondifferential functions [37]–[40]. In these algorithms, the dynamics are directly related to the subdifferential of the objective function. However, there is no discussion on the way to select the appropriate subgradient when the system state is at a nondifferentiable point. In addition, choosing a fixed subgradient is not an option, because this way cannot ensure that an optimal solution is a stable equilibrium. This paper develops two LPNN models for handling two sparse approximation problems: BP and constrained BP denoise (CBPDN). Since the l1 -norm term in these problems is not differentiable, we adopt the LCA concept to escape from the differentiable requirement. We proposed two models: BP-LPNN for the BP-problem and CBPDN-LPNN for the CBPDN problem. For these two models, we show that the equilibrium points of the networks are the optimal solutions of these two problems. Furthermore, we show that the equilibrium points are stable. This paper is organized as follows. Section II presents the backgrounds of sparse approximation and LPNNs. Sections III and IV present the BP-LPNN model and the CBPDN-LPNN model, respectively. Section V presents our simulation results. Section VI discusses some existing analog network algorithms and how our approach can handle other sparse approximation problems. Section VII concludes our results.

A. Subdifferential, Sparse Approximation, and LCA The definition of subdifferential [41], [42] is given by the following. Definition 1: Let f : Rn → R be a convex function. A vector ρ is a subgradient of f at x ∈ dom f , if (1)

Definition 2: The subdifferential ∂ f (x) at x is the set of all subgradients, given by ∂ f (x) = {ρ|ρ ( y − x) ≤ f ( y) − f (x), ∀ y ∈ dom f }. T

We use two examples to explain Example 1: For f (x) = |x|: x  [−1, 1], ∂|x| = sign(x),

(2)

the subdifferential concept. ∈ R, we have for x = 0 for x = 0.

(3)

Example 2: For f (x) = x1 : x ∈ Rn , we have ∂x1 = [∂|x 1 |, . . . , ∂|x n |]T .

Threshold function.

is the measurement matrix with a rank of m, and m < n. The estimation problem can be stated as min x0 , s.t. x = b. x

(5)

Unfortunately, the problem stated in (5) is NP-hard. Therefore, we usually replace the l0 -norm with the l1 -norm. The problem becomes the well-known BP problem [34], [43] min x1 , s.t. x = b. x

(6)

When residue is allowed in the constraint of (6), the problem becomes the CBPDN min x1 , s.t. b − x22 ≤  x

(7)

where  > 0. In many situations, the measurement process may contain noise, given by b = x + ε, where ε = [ε1 , . . . , εm ]T , and εi values are independently identically random variables with zero mean and variance σ 2 . In this case, we can set  = mσ 2 . That means, (7) becomes min x1 , s.t. b − x22 ≤ mσ 2 . x

(8)

An LCA network [33], [35], [36] contains n neurons. Their internal states and outputs are denoted by u and x, respectively. The LCA aims at minimizing the following objective function:

II. BACKGROUND

f ( y) ≥ f (x) + ρ T ( y − x) ∀ y ∈ dom f.

Fig. 1.

(4)

In sparse approximation [28], [29], we need to estimate an unknown sparse vector x ∈ Rn from the observation b = x, where b ∈ Rm is the observation vector,  ∈ Rm×n

Llca = (1/2)b − x22 + κx1

(9)

where κ is a tradeoff parameter. The mapping from u to x is defined by a threshold function, given by  0, for |u i | ≤ κ x i = Tκ (u i ) = (10) u i − κsign(u i ), for |u i | > κ. The LCA embeds the tradeoff parameter into the threshold function Tκ (·). Some threshold functions with different values of κ are shown in Fig. 1. For |u i | > κ, the mapping from u i to x i is one-to-one. For |u i | ≤ κ, the mapping is many-to-one. In [33] and [36], the relationship among ∂|x i |, x i , and u i is established. For x i = 0, the inverse mapping Tκ−1 is one-to-one. For x i = 0, the inverse mapping is one-to-many, i.e., Tκ−1 (0) is equal to a set, given by [−κ, κ]. From (10), given a x i not equal to zero, u i − x i = κsign(u i ) = κsign(x i ). Also, from (10), given x i = 0, u i is equal to a set, given by [−κ, κ]. On the other hand, from (3), given a x i , ∂|x i | = sign(x i ) for x i = 0, and ∂|x i | = [−1, 1] for x i = 0. Hence, given an x, we have u − x = κ∂x1 .

(11)

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

2397

If κ = 1, then u − x = ∂x1 . Of course, given a known u (u − x) ∈ ∂x1 .

(12)

The generalized gradient of (9) is given by

III. BP-LPNN A. Properties of the BP Problem Recall that the BP problem is given by min x1 , s.t. x = b. x

∂ x Llca = κ∂x1 −  (b − x). T

(13)

The LCA defines the dynamics on u, given by du = −∂ x Llca = −∂x1 + T (b − x). dt

(14)

Practically, it is impossible to implement ∂x1 , since ∂x1 may be equal to a set. However, with introducing the internal state u, from (11), we can replace ∂x1 with u − x. Then, the dynamics become du = −u + x + T (b − x). dt

B. LPNN The LPNN approach is able to solve a general nonlinear constrained optimization problem given by x

(16)

where f : Rn → R is the objective function, and h : Rn → Rm (m < n) describes the m equality constraints. The two functions f and h are assumed to be twice differentiable. In the LPNN approach, we first set up a Lagrangian function, given by Lep = f (x) + λT h(x)

(17)

where λ = [λ1 , . . . , λm ]T is the Lagrange multiplier vector. There are two kinds of neurons: variable neurons and Lagrange neurons. The variable neurons hold the variable vector x. The Lagrange neurons hold the Lagrange multiplier vector λ. The dynamics of the neurons are given by τ0

∂Lep ∂Lep dx dλ =− , and τ0 = dt ∂x dt ∂λ

The BP problem is nondifferentiable but convex. We have the following result for the BP problem [41], [44]–[47]. Proposition 1: A point x is an optimal solution of the BP problem, if and only if, there exists a λ (Lagrange multiplier vector) and 0 ∈ ∂x 1 + T λ x = b.

(20a) (20b)

In Proposition 1, (20) summarizes the Karush–Kuhn– Tucker (KKT) conditions. Since the problem is convex, the KKT conditions are sufficient and necessary.

(15)

The advantage of introducing the internal state is that we do not need to implement ∂x1 directly. However, the limitation of the LCA is that it is designed for solving the unconstrained optimization problem only. One may suggest that we can directly implement (d x/dt) = −∂ x Llca . However, in this direct approach, we have the subgradient selection problem that will be discussed in Section V-A.

min f (x), s.t. h(x) = 0

(19)

(18)

where τ0 is the time constant of the circuit. Without loss of generality, we consider that τ0 is equal to 1. With (18), the network will settle down at a stable state [21] if the network satisfies some mild conditions. Although the LPNN approach is able to provide a general framework for various kinds of optimization problems, the objective function and the constraints should be differentiable.

B. BP-LPNN Dynamics One may suggest that we can consider the following Lagrangian function, given by Lbp = x1 + λT (b − x), to construct the BP-LPNN dynamics. However, the Lagrangian function Lbp is a first-order function of x, and it may create the stability problem around an equilibrium. To solve it, we introduce an augmented term (1/2)b − x22 into the Lagrangian function 1 Lbp = x1 + b − x22 + λT (b − x). (21) 2 Introducing the augmented term does not affect the objective value at an equilibrium x , because b − x = 0. The gradients of Lbp are given by (22a) ∂ x Lbp = ∂x1 − T (b − x) − T λ ∂Lbp = b − x. (22b) ∂λ Following the concept of the LCA, we introduce the internal state vector u. The relationship between u and x is given by  0, for |u i | ≤ 1 x i = T1 (u i ) = u i − sign(u i ), for |u i | > 1. From (11), (18), and (22), we obtain the BP-LPNN dynamics du = −u + x + T (b − x) + T λ dt dλ = b − x. dt

(23a) (23b)

C. Property of the Dynamics In the BP-LPNN, we are interested in two issues. The first one is whether an equilibrium point of (23) satisfies the KKT conditions of the BP problem. The second one is whether the equilibrium is stable or not. The optimal solutions of the BP problem are in term of x, while the equilibrium points of (23) involve the hidden

2398

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

state u. Hence, we need Theorem 1 to explicitly show that the equilibrium points are the optimal solutions. Theorem 1: Let {u , λ } be an equilibrium point of the BP-LPNN dynamics (23). At the equilibrium point, the KKT conditions (20) of the BP problem are satisfied. Since the KKT conditions of the BP problem are necessary and sufficient, the equilibrium point of (23) is equilivant to the optimal solution of the BP problem. Proof: According to the definition of equilibrium points dλ d u = 0, and = 0. dt dt

(24)

From (23) and (24), we have −u + x + T (b − x ) + T λ = 0

b − x = 0.

(25a) (25b)

Clearly, (25b) is identical to (20b). With (25b), (25a) becomes −u + x + T λ = 0.

(26)

On the other hand, from (12) −u + x + T λ ∈ −∂x 1 + T λ .

(27)

Hence, from (26) and (27), we obtain 0 ∈ −∂x 1 + T λ .

(28)

That means, (20a) is also satisfied. Using the similar way, one can prove that (20) leads to (25). The proof is completed.  In order to discuss the stability, we introduce the concepts of active neurons and inactive neurons in [35] and [36]. Definition 1: For the active neurons, the magnitudes of their internal state u i values are greater than 1. The collection of indices of the active neurons is denoted by = {i ∈ [1, n], |u i | > 1}. Also, we define  as the matrix composed of the columns of  indexed by . Definition 2: For the inactive neurons, the magnitudes of their internal state u i values are less than or equal to 1 and the corresponding outputs x i ’s are equal to 0. The collection c of indices of the inactive neurons is denoted by c = {i ∈ [1, n], |u i | ≤ 1}. Also, we define  c as the matrix composed of the columns of  indexed by c . Let {u , x = T1 (u ), λ } be an equilibrium point (optimal solution). Furthermore, let and c be the active set and the inactive set of x , respectively. We define two constants for them, given by γ = min |x i | and α = minc 1 − |u i |. i∈

i∈

(29)

V2 (u˜ c (t)) = u˜ c (t)22

(34) (35)

 T c 2 .

and ω = where ς = Theorem 2 tells us about the stability of the equilibrium points. Theorem 2: For the BP-LPNN model, when there is a small perturbation on an equilibrium point, the state converges to the optimal solution of the BP problem. Proof: Since the proof is complicated, we give an overview first. The proof contains three parts. In Part 1, we ˜ show that if the initial state { x˜ (0), λ(0)} is inside Bx , then for t ≥ 0, V1 (t) decreases with time, as long as there are no inactive neurons switching to be active. Furthermore, as V1 (t) < V1 (0) for t > 0, we have 2 2 ˜  x˜ (t)22 + λ(t) 2 0, there are no inactive { x˜ (0), λ(0)} neurons switching to be active. In the proof, we first show that for t > 0 u˜ c (t) remains inside Bu . When u˜ c (t) is inside Bu , we have −α ≤ u i (t) − u i ≤ α ∀i ∈ c .

(37)

From (29), (37) becomes −1 ≤ u i (t) ≤ 1 ∀i ∈ c .

(38)

That means, there are no inactive neurons switching to be active. Based Parts 1 and 2, in Part 3, we will prove that if the initial state is close to an equilibrium point (optimal point), ˜ and u˜ c (t) converge. then limt →∞ x˜ (t) = 0, and λ(t) (Proof of Part 1): The dynamics of the inactive neurons and active neurons can be rewritten as d u˜ c ˜ = −u˜ c (t) −  T c  x (t) +  T c λ(t) (39) dt d x˜ ˜ = −u˜ (t)+ x˜ ∗ (t) −  T  x˜ (t)+ T λ(t) (40) dt d λ˜ = − x˜ (t). (41) dt ˜ inside Bx For a point { x˜ (t), λ(t)}   d V1 =2 − x˜i (t)u˜ i (t) + x˜i2 (t) dt ∗

− 2( x˜ ∗ (t))T  T ∗  ∗ ( x˜ ∗ (t)). (30)

We define two energy functions, given by 2 ˜ ˜ V1 ( x˜ (t), λ(t)) =  x˜ (t)22 + λ(t) 2

 T c 2  2

(33)

i∈

Also, we introduce the notations u˜ = u − u , x˜ = x − x , and λ˜ = λ − λ .

and three balls around the equilibrium point, given by   ˜ :  x˜ 22 + λ ˜ 22 ≤ γ 2 Bx = ( x˜ , λ)   Bu = u˜ c : u˜ c 22 ≤ α 2   ˜ :  x˜ 22 + λ ˜ 22 ≤ α 2 /(ς 2 + ω2 ) Bλ = ( x˜ , λ)

(31) (32)

(42)

Notice that x˜i and u˜ i are with the same sign and |u˜ i | > |x˜i |. Hence, we have d V1 /dt ≤ 0. In sparse approximation, we usually use random matrices as the measurement matrix, and the number of nonzero elements in a sparse solution is much less than the number m of measurements. Let n a be the number of active neurons in

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

the equilibrium point and n a ≤ m. If rank( ∗ ) = n a , then  T ∗  ∗ is positive definite and rank( T ∗  ∗ ) = n a . From the fact [48] that for n a ≤ m, when the elements of  are independently identical Gaussian random variables or ±1 random variables, the probability that rank( ∗ ) = n a tends to 1. Hence,  T  is positive definite and d V1 /dt < 0 for x˜ ∗ (t) = 0. Hence as long as there are no inactive neurons switching to be active, V1 (t) strictly decreases with time, i.e., V1 (t) < V1 (0) for t > 0. Furthermore, we have 2 2 ˜ V1 (t) =  x˜ (t)22 + λ(t) 2 0. Inequality (43) means that, |x˜i (t)| < γ , ∀i ∈ , i.e., no active neurons switches to be inactive. (Proof of Part 2): First at all, we state an important fact. For the inactive neurons d V2 ˜ ≤ −u˜ ∗c (t)22 + u˜ ∗c (t)2 (ς  x˜ ∗ (t)2 + ωλ(t) 2 ). dt (44) ˜ ˜ ∗c (t)2 , then Clearly, if ς  x˜ ∗ (t)2 + ωλ(t) 2 is less than u d V2 /dt < 0. ˜ Consider that u˜ c (0) is inside Bu , and that { x˜ (0), λ(0)} is inside Bx and Bλ , that is 2 2 2 2 2 ˜  x˜ ∗ (0)22 + λ(0) 2 < min(γ , α /(ς + ω ).

(45)

In the following, we will show that u˜ c (t) is inside Bu . From Part 1, at the beginning 2 ˜ V1 (t) =  x˜ ∗ (t)22 + λ(t) 2

inactive neurons [36]. From (23b) and Part 1 (V1 (t) < V1 (0)), when limt →∞ x(t) = x , we have limt →∞ λ(t) = λo . Define x˜ (t) = x (t) − x ¯ λ(t) = λ(t) − λo o

u =

˜ b )22 < min(γ 2 , α 2 /(ς 2 + ω2 ) V1 (tb ) =  x˜ ∗ (tb )22 + λ(t (47) because V1 (t) decreases with time. Notice that for any positive η1 , η2 , μ1 , and μ2  2   η1 + η22 μ21 + μ22 ≥ (η1 μ1 + η2 μ2 )2 . (48) Hence, V1 (tb ) < α 2 /(ς 2 + ω2 ) implies that

(50) (51)

− T c  x

+

 T c λo

+  T c b.

(52)

From (23a) and (50)–(52), the dynamics of inactive neurons are d u c ¯ = −u c (t) + uo −  T c  x˜ (t) +  T c λ(t). (53) dt The solution of the dynamics [36] is given by  t o −t o −t c c es  T c  x˜ (s)ds u (t) = u + e (u (0)−u )−e 0  t ¯ + e−t es  T c λ(s)ds. (54) 0

Define ϑ(t) = e

−t

θ (t) = e−t

 

t 0 t 0

es Q x˜ (s)ds, where Q =  T c 

(55)

¯ es  T c λ(s)ds.

(56)

t Consider ϑ(t)2 ≤ e−t  Q2 0 es  x˜ (s)2 ds. Since we have limt →∞ x (t) = x (limt →∞ x˜ (t) = 0), given any ξ > 0, there exists a tc ≥ 0, such that ∀t ≥ tc ,  x˜ (t)2 ≤ ξ . Let κ be the maximum of  x˜ (t)2 for tc > t. For ∀t ≥ 2tc , we have

 tc  t ϑ(t)2 ≤ e−t  Q2 κ es ds + ξ es ds (57) 0

(46)

strictly decreases with time, and it is less than min(γ 2 , α 2 /(ς 2 + ω2 ) for t > 0. For V2 (t) = u˜ c (t)22 , it may decrease or increase with time. However, u˜ c (t) must be inside Bu . It is because even though V2 (t) may reach α 2 at time tb , we still have

˜ b )2 < α. ς  x˜ ∗ (tb )2 + ωλ(t

2399

≤  Q2 (κ(e

−t /2

−t

tc

− e ) + ξ ).

(58)

Clearly, as t → ∞, the first term tends to 0. Besides, ξ is an arbitrarily small value. Hence, lim t →∞ ϑ(t) = 0. Similarly, we can prove that lim t →∞ θ (t) = 0. From (54), we obtain limt →∞ u ∗c (t) = uo . Now, we have limt →∞ x(t) = x , limt →∞ u ∗ (t) = u ∗ , limt →∞ λ(t) = λo , and limt →∞ u ∗c (t) = uo . Since every equilibrium corresponds to the optimal solution, {x , u ∗ , uo , λo } is an optimal solution too. If there is one optimal solution only, then {x , u ∗ , uo , λo } = {x , u ∗ , u c , λ∗ }. ∗ The proof is complete.  IV. CBPDN-LPNN A. Property of the CBPDN Problem

(49)

From (44), if (49) holds, then V2 (t) starts to decrease with time from tb . Hence, for t > 0, u˜ c (t) is inside Bu , i.e., there are no inactive neurons switching to be active. (Proof of Part 3): Suppose u˜ c (0) is inside Bu , and ˜ { x˜ (0), λ(0)} is inside Bx and Bλ . From Parts 1 and 2, V1 (t) is strictly decreasing with time. Since V1 (t) is low bounded, it converges to a state with V1 (t) = 0, i.e., lim t →∞ x˜ (t) = 0 and limt →∞ x (t) = x . Furthermore, from Part 2, no inactive nodes switch to active. Hence, lim t →∞ x(t) = x . To complete the analysis, we would like to know the behavior of the Lagrangian neurons and the hidden state of the

Recall that the CBPDN problem is given by min x1 , x

s.t. b − x22 ≤ mσ 2 .

(59)

Again, the problem is convex. From the well-known convex optimization result [41], [45]–[47], we have the following proposition for the CBPDN. Proposition 2: A point x is an optimal solution of (59), if and only if, there exists a β (Lagrange multiplier) and 0 ∈ ∂x 1 − 2β T (b − x )

(60a)

b − x 22

(60b) (60c)

− mσ ≤ 0 2

β ≥0   β b − x 22 − mσ 2 = 0.

(60d)

2400

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

For the the CBPDN problem, we do not need to consider b22 ≤ mσ 2 . It is because if b22 ≤ mσ 2 , then b − 022 ≤ mσ 2 and the constraint in (59) is satisfied. The trivial solution of x is a zero vector. Hence, we only need to consider b22 > mσ 2 . Then, the KKT conditions can be simplified and the result is stated in Theorem 3. Theorem 3: Given that b22 > mσ 2 , the optimization problem (59) becomes min x1 , s.t. b − x22 = mσ 2 . x

β > 0.

(62a) (62b) (62c)

Proof: We first show that given that b22 > mσ 2 , β in Proposition 2 must be greater than zero. From Proposition 2 [see (60)], β is either greater than zero or equal to zero. We will use contradiction to exclude the possibility of β = 0 when b22 > mσ 2 . From (60a), if β = 0, then 0 ∈ ∂x 1 . That implies x = 0.1 It follows that b22 ≤ mσ 2 [see (60b)]. That contradicts our earlier assumption b22 > mσ 2 . Hence, β must be strictly greater than zero. As β is strictly greater than zero, from (60d), we obtain that b − x 22 − mσ 2 = 0. Hence, if b22 > mσ 2 , then the KKT conditions of Proposition 2 (necessary and sufficient) become 0 ∈ ∂x 1 − β T (b − x ) b − x 22 − mσ 2 = 0

β > 0. b − x 22

Furthermore, since problem (59) can be rewritten as

= 0, the optimization

min x1 , s.t. b − x22 = mσ 2 . x

(64)

The proof is complete.  With Theorem 3, the inequality constrain becomes an equality constraint. Hence, we can directly use the LPNN concept to solve the CBPDN problem. B. CBPDN-LPNN Dynamics For the CBPDN problem, the Lagrangian function is   Lbpdn = x1 + λ2 b − x22 − mσ 2 . (65) Introducing λ2 rather than β in the second term is to ensure that the convexity property holds at an equilibrium point. The gradients of Lbpdn are given by ∂ x Lbpdn = ∂x1 − 2λ2 T (b − x)   ∂Lbpdn = 2λ b − x22 − mσ 2 . ∂x

(67a) (67b)

Again, the optimal solutions of the CBPDN problem, stated in Theorem 3, are in term of x, while the equilibrium points of (67) involve the hidden state u. Hence, we need Theorem 4 to explicitly show that the equilibrium points of the CBPDN-LPNN are the optimal solutions. Theorem 4 does not tell us that the equilibrium points are achievable. Hence, we need to investigate the above issue. Theorem 5 tells us that the equilibrium points are stable, i.e., the equilibrium points are achievable. Theorem 4: Let {u , λ } be an equilibrium point of (67) and u = 0. At the equilibrium point, the KKT conditions (62) of Theorem 3 are satisfied. Since the KKT conditions of Theorem 3 are necessary and sufficient, the equilibrium point of (67) is equilivant to the optimal solution of the CDBPDN problem. Proof: Denote x as the output of u . First at all, if {u , λ } is an equilibrium point, then from (67), we obtain −u + x + 2λ 2 T (b − x ) = 0



(b − x 22

2

− mσ ) = 0.

(68) (69)

Besides, from (11) and (12) (63)

− mσ 2

du = −u + x + 2λ2 T (b − x) dt   dλ = 2λ b − x22 − mσ 2 . dt C. Properties of CBPDN-LPNN

(61)

Besides, x is an optimal solution, if and only if, there exists a β (Lagrange multiplier) and 0 ∈ ∂x 1 − β T (b − x ) b − x 22 − mσ 2 = 0

Following the method used in BP-LPNN, we obtain the CBPDN-LPNN dynamics:

(66a) (66b)

1 Given a x , ∂|x | either is equal to sign(x ) or belongs to the i i i interval [−1, 1]. This means, 0 ∈ ∂|xi | implies xi = 0.

−u + x + 2λ 2 T (b − x ) ∈ −∂x 1 + 2λ 2 T (b − x ).

(70)

Since −u + x + 2λ 2 T (b − x ) = 0, we obtain 0 ∈ −∂x 1 + 2λ 2 T (b − x ).

(71)

This means, (62a) is satisfied. Since λ (b − x 22 − mσ 2 ) = 0, λ is either equal to 0 or not equal to 0. we will show that λ∗ = 0 by contradiction. Assume that λ = 0. From (68), we obtain u = x . The solution for u = x is u = x = 0 (see Fig. 1). That contradicts our earlier assumption u = 0. Therefore, we have λ = 0, and then from (69), we obtain (62b). As λ = 0, we conclude that λ 2 > 0. This means, there exists a β = λ 2 greater than zero. Hence, (62c) is satisfied. In addition, the KKT conditions of Theorem 3 are sufficient and necessary. Hence, any equilibrium point (u , λ ) with u = 0 is an optimum solution of the CBPDN problem. Using similar way, we can also show that (62) leads to (68) and (69). The proof is complete.  Another issue in the LPNN approach is that if the equilibrium points are stable or not. Theorem 5 summarizes the stability of the equilibrium points. Theorem 5: Given that b22 > mσ 2 and an equilibrium point {u , λ } of (67) with u = 0, the equilibrium point is an asymptotically stable point.

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

Proof: As the proof is a bit complicated, we first give a brief introduction of the proof. Since the inactive neurons have no effect on the active neurons and the Lagrange neurons, the dynamics can be rewritten as d u = −u + x + 2λ2  T (b −  x ) (72a) dt   dλ = 2λ b −  x 22 − mσ 2 (72b) dt d u c (72c) = −u c + 2λ2  T c (b −  x ). dt From Appendix A, the linearization of (72) around the equilibrium point is ⎡ ⎤ d u   ⎤ ⎡ ⎢ dt ⎥ u − u ⎢ dλ ⎥ ⎢ ⎥ = −H ⎣ λ − λ ⎦ (73) ⎢ ⎥ ⎢ dt ⎥ u c − u c ⎣ d u c ⎦

  dt (u ,λ ) where ⎡ H=

T  2λ 2 

⎢ ⎢ 4λ (b −  x )T  ⎢







T  2λ 

c

T (b −  x ) −4λ 





0

T b −  x −4λ 

c









⎥ ∅⎥ ⎥. ⎦ I (74)

From the classical optimization theory, if all the eigenvalues of H are with positive real parts, then the equilibrium point is an asymptotically stable point. In the following, we will show that all the eigenvalues of H are with positive real parts. We first define ⎡ ⎤ G=⎣

T  2λ 2 



T (b −  x ) −4λ 



4λ (b −  x )T 

0

Then, H can be rewritten as   G ∅ H= B I

⎦.

(75)

(76)

where B = [2λ  T c  | − 4λ  c (b −  x )]. The proof consists of two parts. In the first part, we show that all the eigenvalues of G are with positive real parts. Based on the first part, we then show that all the eigenvalues of H are with positive real parts too. Eigenvalues of G: Clearly, 2λ 2  T  is positive/ semipositive definite. In the proof of Theorem 2, we already discuss that the probability that rank( ) = n a tends to 1 for large m. When  T  is positive definite, the matrix G is full rank, i.e., rank(G)= n a + 1 (see Appendix B). Denote v˜ be the conjugate of v. Let ζ be an eigenvalue of G, and (χ T , τ )T = (0T , 0)T be the corresponding eigenvector. If (χ T , τ )T is an eigenvector of G, it cannot be a zero vector. Now, we are going to show that χ = 0. We use contradiction to show χ = 0. Assume that χ = 0. From the definition of eigenvector, we have     0 0 G =ζ . (77) τ τ

2401

From the definition of G [see (75)], we also have     0 −4τ λ  T (b −  x ) . G = τ 0 At the equilibrium point (u , λ ), from (72), we have   −u + x = −2λ2  T b −  x .

(78)

(79)

For the active neurons, we have x i < u i for i ∈ . Besides, λ = 0, and then, we have   (80)  T b −  x = 0. Therefore, from (77), (78), and (80), we obtain τ = 0. That contradicts the eigenvector assumption. That means, χ = 0. Now, we will show that the real part of the eigenvalue is positive. Since (χ T , τ )T is an eigenvector, we have      χ T Re [χ˜ (81) τ˜ ]G = Re(ζ ) χ 22 + τ 22 . τ From the definition of G [see (75)], we also have      χ T Re [χ˜ τ˜ ]G = Re 2λ 2 χ˜ T  T  χ . τ That means, we have     Re 2λ 2 χ˜ T  T  χ = Re(ζ ) χ 22 + τ 22 .

(82)

(83)

Since λ 2  T  is positive definite, the left-hand side must be greater than zero, i.e., Re(ζ ) > 0. That means, all the eigenvalues of G are with positive real parts. Eigenvalues of H: As G is full rank, it can be diagonalized G = V ϒ V −1

(84)

where ϒ is a diagonal matrix whose diagonal element ϒi values are the eigenvalues of G, the column vectors of V are right eigenvectors of G, and the row vectors of V −1 are the left eigenvectors of G. As shown in the first part, the real parts of the eigenvalues of G are positive. Consider a matrix

    −1 V ∅ −1 V ∅ . (85) , and =

= ∅ I ∅ I ˜ as H ˜ = H −1 . From (76), we obtain Define H   ϒ ∅ ˜ = . H BV −1 I

(86)

˜ is a lower triangular matrix with diagonal elements Clearly, H {ϒ1 , . . . , ϒna +1 , 1, . . . , 1}    (n−n a

(87)

)1 s

are the eigenvalues. Besides, their real parts are greater than ˜ are zero (from the first part). From Appendix C, H and H with the same set of eigenvalues. This means, the real parts of the eigenvalues of H are positive. Hence, the equilibrium point {u , λ } is an asymptotically stable point. The proof is complete.  Note that the global convergent properties of the CBPDNLPNN model are not known yet. However, it does not limit its application or performance. The experimental results, in Section V, show that the performance of the CBPDN-LPNN model is identical to that of the two numerical methods.

2402

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

Fig. 2. Simulation results for the BP-problem, where n = 512 and n = 4096. (a) and (d)–(f) MSE performances among the three methods. The experiments are repeated 100 times using different random matrices. (b), (c), and (g)–(i) Some dynamics examples for the BP-LPNN model.

V. S IMULATIONS A. Setting We use standard configures [31], [49] to test the proposed two models. We consider two signal lengths, n = 512 and n = 4096. For n = 512, 15 data points have nonzero values (±5). For n = 4096, the numbers of nonzero values (±5) are {75, 100, 125}. The measurement matrix  is a ±1 random matrix and it is further normalized with the signal length. We repeat our experiment 100 times with different random matrices, initial states, and sparse signals. B. BP-LPNN We compare our analog method with two digital numerical approaches. They are the primal-dual interior point method from the L1Magic package [31] and the SPG method from the SPGL1 package [32]. We would like to investigate whether the analog BP-LPNN method produces the same MSE performance that the two digital numerical approaches do. Fig. 2(a) and (d)–(f) shows the MSE performances. When the number of measurements reaches a threshold, the reconstruction errors become very small. This phenomenon agrees with the well-known property of the sparse approximation. The MSE performance of the BP-LPNN is quite similar to that of two traditional digital methods. All the three methods have the similar threshold. There is no significant difference

among the BP-LPNN, the L1Magic package, and the SPGL1 package. For n = 512, when the number of measurements is less than or equal to 70, the reconstruction errors are greater than 0.2. When 95 or more measurements are used, the reconstruction errors of the three models are much less than 0.001, as shown in the zoomed-in subfigure in Fig. 2(a). For n = 4096, when there are 75 nonzero elements, around 450 measurements are required. When 465 or more measurements are used, the reconstruction errors of the three methods are very small. They are much smaller than 0.001, as shown in the zoomedin subfigure in Fig. 2(d). Note that when the number of measurements is greater than the threshold, there are some small differences in the reconstruction errors among the three methods. This is because the two numerical methods have some tuning parameters that affect the accuracy of the solution. Fig. 2(b), (c), and (g)–(i) shows the output x i (t) values of the active set of the equilibrium point under different settings, where i ∈ . We would like to see when the output x i (t) values settle down. Since the nonzero elements of the original signal are equal to ±5, the outputs converge to the values close to ±5. For n = 512 and the selected settings, the outputs can settle down within around 50–150 characteristic time units. After these amounts of time, there are no big changes in the outputs. For n = 4096 and the selected settings, there are no big changes in the outputs after 150 characteristic time units.

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

2403

Fig. 3. Simulation results for the CBPDN-problem, where n = 512. First row: MSE performances among the three methods. The experiments are repeated 100 times using different random matrices. Secondnd row: some dynamics examples.

Fig. 4. Simulation results for the CBPDN-problem, where n = 4096. First–third rows: MSE performances among the three methods. The experiments are repeated 100 times using different random matrices. Fourth row: some dynamics examples.

C. CBPDN-LPNN Two digital numerical approaches, the log barrier method from the L1Magic package and the SPG method from the SPGL1 package, are used for comparison. We expect that the three methods would have a similar performance. Three noise

levels are considered: σ 2 = {−26 dB, −32 dB, −46 dB}. Figs. 3 and 4 show their performances. From Figs. 3 and 4, when the number of measurements reaches a threshold, the reconstruction errors drop to a very small value. The threshold mainly depends on the number

2404

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

Fig. 5.

Some recovered signals from nonsparse signals. Note that there are no obvious visual differences among the three reconstruction methods.

TABLE I MSE OF THE R ECOVERY S IGNALS F ROM N ONSPARSE S IGNALS . T HE E XPERIMENTS A RE R EPEATED 100 T IMES U SING D IFFERENT R ANDOM M ATRICES

D. Recovery for Nonsparse Signal Sparse approximation can be extended to handle nonsparse signals [50]. Let z be a nonsparse signal, and let Q be an invertible transform. By setting x = Q z, we can use the LPNN to recover nonsparse signals. The measurement signal is given by b =  Q z. The problem becomes min  Q z1 , s.t. b −  Q z22 ≤  x

of nonzero elements and is not sensitive to the noise level. This phenomenon agrees with the well-known property of the sparse approximation. The MSE performance of the CBPDN-LPNN is quite similar to that of two traditional digital methods. All the three methods have the similar threshold. From Fig. 3, for n = 512, around 95 measurements are required for all noise levels. The noise level affects the reconstruction quality only and does not much affect the number of required measurements. For instance, with noise level equal to −26 dB, when 100 or more measurements are used, the MSEs of the three methods are ∼0.005–0.015. With a smaller noise level σ 2 = −46 dB, when 100 or more measurements are used, the MSEs are much smaller than 0.001. For n = 4096, when there are 75 nonzero elements, from the first row of Fig. 4, around 450 measurements are required, regardless the noise level. When we increase the number of non-elements to 125, we should use 650 measurements, as shown in the third row of Fig. 4. Again, the noise level affects the reconstruction quality only. For instance, with 75 nonzero elements and noise level equal to σ 2 = −26 dB, when 600 or more measurements are used, the MSEs of the three methods are ∼0.005–0.015. With a smaller noise level σ 2 = −46 dB, when 600 or more measurements are used, the MSEs of the three methods are much smaller than 0.0005. There are small MSE differences among the three methods, because the two numerical methods have some tuning parameters that affect the accuracy of the solution. The convergent behavior of the CBPDN-LPNN model is shown in the second row of Fig. 3 and the fourth row of Fig. 4. Figs. 3 and 4 show the output x i (t) values of the active set of the equilibrium point under different settings, where i ∈ . For n = 512, the outputs of the active set can settle down within around 7–15 characteristic time units. For n = 4096, the outputs of the active set can settle down within 10–60 characteristic time units.

(88)

where  > 0 is the residual. The 1-D signal that we considered is extracted from one row of image CameraMan. The transform used is the Haar transform. We vary the number of measurements. The MSE values of the reconstruction signals are summarized in Table I. Besides, some reconstruction signals are shown in Fig. 5. From Table I, when more measurements are considered, we obtain a better reconstruction. Again, the performances of the CBPDN-LPNN are very similar to those of the L1Magic and SPGL1 packages. Also, there are no obvious visual differences among the three reconstruction signals from the three methods, as shown in Fig. 5. VI. D ISCUSSION A. Other Recurrent Neural Network Models There are some recurrent neural network models [37]–[40] for nonsmooth convex problems. However, they may not be suitable for sparse approximation. Since their dynamics are directly related to the subdifferential, there is no a simple way to select a proper subgradient from the subdifferential, such that the optimal solution is an equilibrium point. We use the algorithm in [40] to illustrate that these methods are not suitable for sparse approximation. Let us consider a simplified version of [40], given by min f (x), s.t. h(x) ≥ 0. x

(89)

According to (8) in [40], the dynamics are given by dλ dx ∈ −2(x − x˜ ) and = −(λ − λ˜ ) (90) dt dt where x˜ = (x − ∂ f (x) + λ∂h(x)) and λ˜ = max(0, λ − h(x)). Recall that in our case, the problem can be rewritten as min x1 , s.t. mσ 2 − b − x22 ≥ 0. x

(91)

Let us consider the following two cases. Case 1:  = [1, 5/8], b = 5.5, σ 2 = 0.25 with optimal solution x = [5, 0]T . Case 2:  = [−1, −5/8], b = 5.5, σ 2 = 0.25 with optimal solution x = [−5, 0]T . The contour plot of these two cases is shown in Fig. 6.

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

Fig. 6.

Contour of x1 and the feasible regions in the two situations.

From the approach of [40], the dynamics are dx ∈ −2(∂|x| − 2λT (b − x)) dt dλ ˜ = −(λ − λ) dt

(92a) (92b)

where λ˜ = (λ − mσ 2 + |b − x|22 )+ . According to (92a), at the optimal solution x , we should choose a suitable subgradient η from the differential ∂x 1 , such that η = 2λ T (b − x ). Otherwise, the optimal solution is not stable. In Case 1, the optimal point x is at [5, 0]T . We should choose η = [1, 5/8]T , i.e., we set ∂|x 2 | to 5/8 when x 2 = 0. On the other hand, in Case 2, the optimal point x is at [−5, 0]T . We should choose η = [−1, −5/8]T , i.e., we set ∂|x 2 | to −5/8 when x 2 = 0. Clearly, the subgradient should be chosen carefully. Otherwise, an optimal point may not be stable. The above example means that when the optimal solution contains some zero elements, we have the subgradient selection problem. Similarly, when an optimal point contains multiple zero elements, we cannot use a fixed subgradient. Besides, an equilibrium point with a particular subgradient may become unstable when we select another subgradient. In contrast, our method does not have this selection problem. B. Least Absolute Shrinkage and Selection Operator Problem This section briefly discusses the way to use the LPNN approach for the least absolute shrinkage and selection operator (LASSO) problem [51]–[53], given by min b − x22 , s.t. x1 ≤ ψ x

(93)

where ψ > 0. Under some conditions, the LASSO problem becomes min b − x22 , s.t. x1 − ψ = 0. x

(94)

Although the constraint in the LASSO problem is not differentiable, we can use the LPNN framework to solve the problem. Its dynamics are given by dλ du = T (b−)x − λ(u − x), and = x1 − ψ. dt dt

2405

points of the network correspond to the optimal solution of the BP-problem, and that the equilibrium points are stable. For the CBPDN-LPNN model, the equilibrium points of the network correspond to the optimal solution of the CBPDN-problem. Besides, the CBPDN-LPNN model is locally stable. The experimental results showed that in term of MSE performance, there are no significant differences between our analog approach and the traditional numerical methods. Besides, we briefly discuss the way to use the LPNN approach to solve the LASSO problem. Although we only prove that the CBPDN-LPNN model is locally stable, it does not mean that the CBPDN-LPNN model is not global stable. The global convergent property is not known yet. Hence, for completeness, one of the future directions is to theoretically investigate the global convergence of the CBPDN-LPNN model. Another important extension is to investigate the nonconvex problems, which involves the l p norm with p < 1 in the objective function and constraints, in sparse approximation. A PPENDIX A L INEARIZATION OF THE DYNAMICS Consider a vector valued function F : n → m F( y) = [F1 ( y), . . . , Fm ( y)]T , and a system, given by dy = F( y). ˙y = dt The linearized system at y is given by dy ≈ F( y ) + J ( y )( y − y ) dt where J( y ) is the Jacobian matrix, given by ⎡ ∂F ∂ F1 ⎤ 1 ···  ⎢ ∂y1 ∂yn ⎥ ⎢ ⎥ . .. ⎥ .. J( y ) = ⎢ . . . ⎥ ⎢ ..  ⎣ ∂F ⎦ ∂ Fm  m ···  ∂y1 ∂yn y= y

and (96)

(97)

(98)

If y is an equilibrium point, then F( y ) is equal to a zero vector and the linearized system becomes dy (99) ≈ J ( y )( y − y ). dt For our case, the Jacobian matrix at an equilibrium point is ⎡ ⎤ d u˙ d u˙ d u˙  ⎢ ⎥ ⎢ d u c dλ d u c ⎥ ⎢ ⎥ ⎢ ˙  ˙ ⎥ ˙ ⎢ ⎥ d λ d λ d λ ⎢ ⎥  J (u , λ ) = ⎢ . (100) ⎥ ⎢ d u dλ ∂ u c ⎥ ⎢ ⎥ ⎢ d u˙ c d u˙ c d u˙ c ⎥ ⎦ ⎣ d u dλ d u c  (u ,λ )

(95)

VII. C ONCLUSION This paper proposed two LPNN models for solving optimization problems in sparse approximation. The BP-LPNN model is designed for the BP problem, while the CBPDN-LPNN is designed for the CBPDN problem. For the BP-LPNN model, we showed that the equilibrium

− J(u , λ ).

Define H = J(u , λ ), we obtain ⎡

T  2λ 2 



After evaluating the elements in

⎢ ⎢ H = ⎢ 4λ (b −  x )T  ⎣ T  2λ 

c s t ar

T (b −  x ) ∅ −4λ 





⎥ ∅⎥ ⎥. ⎦ T −4λ  c (b −  x c ) I 0





(101)

2406

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 10, OCTOBER 2017

In the derivation of the Jacobian matrix, we use the facts that for active neurons (d x i /du i ) = 1, and that for nonactive neurons, (d x i /du i ) = 0. Also, at an equilibrium point, |b −  x |22 − mσ 2 = 0. A PPENDIX B P ROOF OF THE R ANK OF G E QUAL TO n a + 1 Recall that ⎡ G=⎣

T  2λ 2 



T (b −  x ) −4λ 



4λ (b −  x )T 

0

⎤ ⎦. (102)

Since −u + x = 0, we have −4λ  T (b −  x ) = 0. Without loss of generality, we consider that   A υ G= (103) −υ T 0 where A is n a × n a symmetric positive definite matrix and υ is nonzero column vector. Since A is invertible, we have      A ∅ A υ I − A−1 υ . (104) = −υ T 0 ∅ 1 −υ T υ T A−1 υ Taking determinant of both sides, we obtain    A υ  T −1   −υ T 0  = | A| · |υ A υ|.

(105)

Since A−1 is positive definite, |υ T A−1 υ| is nonzero. This means, the determinant of G is nonzero, and then, the rank  of G is equal to n a + 1. The proof is complete. A PPENDIX C P ROPERTY OF E IGENVALUES OF M ATRICES Given a full rank square H (symmetric or nonsymmetric), it can be diagonalized, given by H = U U −1 , where is a diagonal matrix whose diagonal element i values are the eigenvalues of H, the column vectors of U are right eigenvectors of H, and the row vectors of U −1 are the left ˜ = H −1 , where be eigenvectors of H. Consider H ˜ can be diagonalized too, an invertible matrix. The matrix H −1 −1 ˜ U˜ −1 . ˜ given by H = U U = U ( U)−1 = U ˜ Considering the normalization of the column vectors of U, ˜ ¨ we obtain U = U, where  is a diagonal matrix whose elements are greater than zero, and the length of the column vectors of U¨ is equal to one. With the normalization −1 ¨ −1 ¨ ¨ U¨ −1 . ˜ = U ˜ U˜ −1 = U  U = U H

(106)

˜ are with the same set of eigenvalues. Hence, H and H R EFERENCES [1] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. London, U.K.: Wiley, 1993. [2] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Nat. Acad. Sci. USA, vol. 79, no. 8, pp. 2554–2558, Jan. 1982. [3] L. O. Chua and G.-N. Lin, “Nonlinear programming without computation,” IEEE Trans. Circuits Syst., vol. 31, no. 2, pp. 182–188, Feb. 1984. [4] Y. Xia, G. Feng, and J. Wang, “A novel recurrent neural network for solving nonlinear optimization problems with inequality constraints,” IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1340–1353, Aug. 2008.

[5] Q. Liu and J. Wang, “A one-layer recurrent neural network with a discontinuous hard-limiting activation function for quadratic programming,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 558–570, Apr. 2008. [6] S. Bharitkar, K. Tsuchiya, and Y. Takefuji, “Microcode optimization with neural networks,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 698–703, May 1999. [7] D. Tank and J. Hopfield, “Simple ‘neural’ optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541, May 1986. [8] A. Bouzerdoum and T. R. Pattison, “Neural network for quadratic optimization with bound constraints,” IEEE Trans. Neural Netw., vol. 4, no. 2, pp. 293–304, Mar. 1993. [9] X.-B. Liang, “A complete proof of global exponential convergence of a neural network for quadratic optimization with bound constraints,” IEEE Trans. Neural Netw., vol. 12, no. 3, pp. 636–639, May 2001. [10] M. Fukushima, “Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems,” Math. Program., vol. 53, no. 1, pp. 99–110, Jan. 1992. [11] T. L. Friesz, D. H. Bernstein, N. J. Mehta, R. L. Tobin, and S. Ganjalizadeh, “Day-to-day dynamic network disequilibria and idealized traveler information systems,” Oper. Res., vol. 42, no. 6, pp. 1120–1136, Jun. 1994. [12] B. He and H. Yang, “A neural network model for monotone linear asymmetric variational inequalities,” IEEE Trans. Neural Netw., vol. 11, no. 1, pp. 3–16, Jan. 2000. [13] X. Hu and J. Wang, “Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1487–1499, Nov. 2006. [14] X. B. Gao, “Exponential stability of globally projected dynamic systems,” IEEE Trans. Neural Netw., vol. 14, no. 2, pp. 426–431, Mar. 2003. [15] Y. Xia, “An extended projection neural network for constrained optimization,” Neural Comput., vol. 16, no. 4, pp. 863–883, Apr. 2004. [16] X. Hu and J. Wang, “A recurrent neural network for solving a class of general variational inequalities,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 3, pp. 528–539, Jun. 2007. [17] S. Zhang, Y. Xia, and J. Wang, “A complex-valued projection neural network for constrained optimization of real functions in complex variables,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3227–3238, Dec. 2015. [18] Y. Xia, “A compact cooperative recurrent neural network for computing general constrained L 1 norm estimators,” IEEE Trans. Signal Process., vol. 57, no. 9, pp. 3693–3697, Sep. 2009. [19] Y. Xia, C. Sun, and W. X. Zheng, “Discrete-time neural network for fast solving large linear L 1 estimation problems and its application to image restoration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp. 812–820, May 2012. [20] Y. Xia and J. Wang, “Low-dimensional recurrent neural networkbased Kalman filter for speech enhancement,” Neural Newtw., vol. 67, pp. 131–139, Jul. 2015. [21] S. Zang and A. G. Constantinides, “Lagrange programming neural networks,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 7, pp. 441–452, Jul. 1992. [22] X. Zhu, S.-W. Zhang, and A. G. Constantinides, “Lagrange neural networks for linear programming,” J. Parallel Distrib. Comput., vol. 14, no. 3, pp. 354–360, Mar. 1992. [23] V. Sharma, R. Jha, and R. Naresh, “An augmented Lagrange programming optimization neural network for short-term hydroelectric generation scheduling,” Eng. Optim., vol. 37, pp. 479–497, Jul. 2005. [24] J. Liang, H. C. So, C. S. Leung, J. Li, and A. Farina, “Waveform design with unit modulus and spectral shape constraints via Lagrange programming neural network,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 8, pp. 1377–1386, Dec. 2015. [25] J. Liang, C. S. Leung, and H. C. So, “Lagrange programming neural network approach for target localization in distributed MIMO radar,” IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1574–1585, Mar. 2016. [26] Y. Xia, “Global convergence analysis of Lagrangian networks,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 6, pp. 818–822, Jun. 2003. [27] X. Lou and J. A. K. Suykens, “Stability of coupled local minimizers within the Lagrange programming network framework,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 2, pp. 377–388, Feb. 2013.

FENG et al.: LPNN FOR NONDIFFERENTIABLE OPTIMIZATION PROBLEMS

[28] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–2862, Nov. 1999. [29] D. L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization,” Proc. Nat. Acad. Sci. USA, vol. 100, no. 5, pp. 2197–2202, Mar. 2003. [30] E. van den Berg and M. P. Friedlander, “Sparse optimization with leastsquares constraints,” SIAM J. Optim., vol. 21, no. 4, pp. 1201–1229, 2011. [31] E. Candès and J. Romberg. (Oct. 2005). 1 -MAGIC: Recovery of Sparse Signals via Convex Programming. [Online]. Available: http://users.ece.gatech.edu/justin/l1magic/downloads/l1magic.pdf [32] E. van den Berg and M. P. Friedlander. (Jun. 2007). SPGL1: A Solver for Large-Scale Sparse Reconstruction. [Online]. Available: http://www. cs.ubc.ca/labs/scl/spgl1 [33] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, “Sparse coding via thresholding and local competition in neural circuits,” Neural Comput., vol. 20, no. 10, pp. 2526–2563, Oct. 2008. [34] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, Jan. 1998. [35] A. Balavoine, C. J. Rozell, and J. Romberg, “Global convergence of the locally competitive algorithm,” in Proc. IEEE Digit. Signal Process. Workshop, IEEE Signal Process. Edu. Workshop (DSP/SPE), Sedona, AZ, USA, Jan. 2011, pp. 431–436. [36] A. Balavoine, J. Romberg, and C. J. Rozell, “Convergence and rate analysis of neural networks for sparse approximation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 9, pp. 1377–1389, Sep. 2012. [37] Q. Liu and J. Wang, “A one-layer recurrent neural network for constrained nonsmooth optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 5, pp. 1323–1333, Oct. 2011. [38] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural network for nonsmooth nonlinear programming problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, Sep. 2004. [39] L. Cheng, Z. G. Hou, M. Tan, X. Wang, Z. Zhao, and S. Hu, “A recurrent neural network for non-smooth nonlinear programming problems,” in Proc. IEEE IJCNN, Aug. 2007, pp. 596–601. [40] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W. C. Zhang, and F.-X. Wu, “Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 714–726, May 2011. [41] G. Gordon and R. Tibshirani, “Karush–Kuhn–Tucker conditions,” in Proc. Optim. Fall Lecture Notes, 2012, pp. 1–26. [Online]. Available: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf [42] B. Guenin, J. Könemann, and L. Tunçel, A Gentle Introduction to Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2014. [43] E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008. [44] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA: Cambridge Univ. Press, 2004. [45] J. J. Fuchs, “Convergence of a sparse representations algorithm applicable to real or complex data,” IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 598–605, Dec. 2007. [46] J. Dutta and C. S. Lalitha, “Optimality conditions in convex optimization revisited,” Optim. Lett., vol. 7, no. 2, pp. 221–229, 2013. [47] A. Dhara and J. Dutta, Optimality Conditions in Convex Optimization. New York, NY, USA: Taylor & Francis, 2011. [48] X. Feng and Z. Zhang, “The rank of a random matrix,” Appl. Math. Comput., vol. 185, no. 1, pp. 689–694, Jan. 2007. [49] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. [50] Y. Tsaig and D. L. Donoho, “Extensions of compressed sensing,” Signal Process., vol. 86, no. 3, pp. 549–571, Mar. 2006. [51] H. Zhang, W. Yin, and L. Cheng, “Necessary and sufficient conditions of solution uniqueness in 1-norm minimization,” J. Optim. Theory Appl., vol. 164, no. 1, pp. 109–122, 2015. [52] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Rev., vol. 51, no. 1, pp. 34–81, Feb. 2009. [53] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005.

2407

Ruibin Feng is currently pursuing the Ph.D. degree with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. His current research interests include neural networks and machine learning.

Chi-Sing Leung ((M’05–SM’15) received the Ph.D. degree in computer science from the Chinese University of Hong Kong, Hong Kong, in 1995. He is currently a Professor with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He has authored over 120 journal papers in the areas of digital signal processing, neural networks, and computer graphics. His current research interests include neural computing and computer graphics. Dr. Leung was a member of the Organizing Committee of ICONIP2006. He received the 2005 IEEE Transactions on Multimedia Prize Paper Award for his paper titled The Plenoptic Illumination Function in 2005. He was the Program Chair of ICONIP2009 and ICONIP2012. He is/was the Guest Editor of several journals, including Neural Computing and Applications, Neurocomputing, and Neural Processing Letters. He is a Governing Board Member of the Asian Pacific Neural Network Assembly (APNNA) and the Vice President of APNNA. Anthony G. Constantinides (S’68–M’74–SM’78– F’98) is currently the Professor of Communications and Signal Processing with Imperial College London, London, U.K. He has been actively involved in research in various aspects of digital signal processing for more than 45 years. He has authored several books and over 400 articles in digital signal processing. Prof. Constantinides is a fellow of the Royal Academy of Engineering, the Institute of Electrical and Electronics Engineers, USA, and the Institution of Electrical Engineers, U.K. He has served as the First President of the European Association for Signal Processing and has contributed in this capacity to the establishment of the European Journal for Signal Processing. He received the Medal of the Association, Palmes Academiques in 1986, and the Medal of the University of Tienjin, Shanghai, China, in 1981. He received honorary doctorates from European and Far Eastern Universities. Among these, he values highly the honorary doctorate from the National Technical University of Athens, Athens, Greece. He has organized the first international series of meetings on Digital Signal Processing, London, initially in 1967, and in Florence with Prof. V. Cappellini at the University of Florence, Florence, Italy, since 1972. In 1985, he was decorated by the French government with the Honour of Chevalier, Palmes Academiques, and in 1996 with the elevation to Officer, Palmes Academiques. His life work has been recorded in a series of audio and video interviews for the IEEE (USA) Archives as a Pioneer of Signal Processing. He has acted as an Advisor to many organizations and governments on modern technology and development. He has served on the Professorial Selection Committees around the world (15 during the last five years) and the EU University Appraising Panels, and as a member of IEE/IEEE Awards Committees and the Chair (or Co-Chair) of international conferences. Wen-Jun Zeng (S’10–M’11) received the M.S. degree in electrical engineering from Tsinghua University, Beijing, China, in 2008. He was a Research Assistant with Tsinghua University, from 2006 to 2009. From 2009 to 2011, he was a Faculty Member with the Department of Communication Engineering, Xiamen University, Xiamen, China. He is currently a Senior Research Associate with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. His current research interests include mathematical signal processing, including convex optimization, array processing, sparse approximation, and inverse problem, with applications to wireless radio, and underwater acoustic communications.