Norm Descent Conjugate Gradient Methods for ... - Semantic Scholar

Norm Descent Conjugate Gradient Methods for Solving Symmetric Nonlinear Equations Yunhai Xiao∗, Chunjie Wu∗, and Soon-Yi Wu†

Abstract Nonlinear conjugate gradient method is very popular in solving large-scale unconstrained minimization problems due to its simple iterative form and lower storage requirement. In the recent years, it was successfully extended to solve higher-dimension monotone nonlinear equations. Nevertheless, the research activities on conjugate gradient method in symmetric equations are just beginning. This study aims to developing, analyzing, and validating a family of nonlinear conjugate gradient methods for symmetric equations. The proposed algorithms are based on the latest, and state-of-the-art descent conjugate gradient methods for unconstrained minimization. The series of proposed methods are derivative-free, where the Jacobian information is needless at the full iteration process. We prove that the proposed methods converge globally under some appropriate conditions. Numerical results with differentiable parameter’s values and performance comparisons with another solver CGD to demonstrate the superiority and effectiveness of the proposed algorithms are reported.

Key words. unconstrained optimization, symmetric equations, conjugate gradient method, backtracking line search AMS subject classifications. 65F22, 65J22, 65K05, 90C06, 90C25, 90C30

1.

Introduction

In this study, we consider the following symmetric nonlinear system F (x) = 0,

(1.1)

where F : Rn → Rn is a continuously differentiable mapping, and the symmetry means that the Jacobian J(x) := ∇F (x) is symmetric, i.e., J(x) = J(x)> . The type of the problem is considered as finding a stationary point for unconstrained minimization where F is the corresponding gradient of the objective function, or finding a KKT point for equality constrained minimization problems. Moreover, it has many practical backgrounds in scientific and engineering computation, such as the saddle points problem, the discredited two-point boundary value problem, and the discredited elliptic boundary value problem. A large number of efficient solvers for large-scale symmetric nonlinear equations have been proposed, analyzed, and tested in the last decade. Among them, the most classic one entirely due to Li & Fukushima ∗ Institute

of Applied Mathematics, Henan University, Kaifeng 475000, China (Email: [email protected], [email protected]). † National Center for Theoretical Sciences (South), National Cheng Kung University, Tainan 700, Taiwan (Email: [email protected]).

1

[11], in which a Gauss-Newton-based BFGS method is developed, and the global and superlinear convergence are also established. Subsequently, its performance is further improved by Gu et al. [8], where a norm descent BFGS method is designed. Since then, norm descent type BFGS methods especially cooperating with trust region strategy are presented regularly in the literature and showed their moderate effectiveness experimentally [15, 14]. However, matrix storage and even linear system are involved at all reviewed algorithms. The just recently designed nonmonotone spectral gradient algorithm [3] falls within the framework of matrix-free, and it used the well-known spectral gradient method [2] to determine its search direction. More recently, the conjugate gradients method for symmetric nonlinear equations has received everincreasing attention and taken good progress. Li & Wang [12] proposed a modified Fletcher-Reeves conjugate gradient method which based on the work of Zhang et al. [17]. The reported comparison results illustrate that their reposed conjugate gradient method exhibits as a powerful algorithmic tool. Just because of this, further studies on conjugate gradient are inspired for solving large-scale symmetric nonlinear equations. Zhou & Shen [18] extended the descent three-term Polak-Ribièere-Polyak method of Zhang et al. [16] to solving (1.1) by combining with the work of Li & Fukushima [11]. Meanwhile, the classic Polak-Ribièere-Polyak methd is also successfully used to solve symmetric equations (1.1) by Zhou & Shen [19]. Extensive numerical experiments showed that each reviewed conjugate gradient method performs quite well. Additionally, the conjugate gradient methods for general nonlinear equations or convex constrained monotone equations can be found [4] and [13], respectively. The main contribution of this paper is to construct a series of fast conjugate gradient methods for solving (1.1). These proposed methods are based on the well-known conjugate gradient descent method of Hager & Zhang [9], or, more preciously, mainly based on its latest general formulation of Dai & Kou [7]. In other words, our proposed methods can be thought of as an extension of the state-of-the-art conjugate gradient descent method to solving symmetric nonlinear equations. We show that the proposed methods own some attractive properties, that is, they don’t require the Jacobian information, even though they don’t store any matrix at each iteration. Hence, they have immense potentiality to solve large-scale problems. Under quite reasonable technical assumptions, we establish the convergence theorem. We present experimental results and performance comparisons with CGD [13] to illustrate that the proposed methods are effective, promising. The rest of this paper is organized as follows. In Section 2., we simply recall the latest conjugate gradient descent method for unconstrained minimization and construct our algorithm subsequently. In Section 3., we show that the proposed method converges globally. In Section 4., we provide computational experiments to show its practical performance. Finally, we conclude the paper in Section 5.. Throughout this paper, k · k denotes the Euclidean norm of a vector. For the sake of simplicity, we abbreviate F (xk ) and J(xk ) as Fk and Jk , respectively, in the context.

2.

Algorithm

In this section, we briefly review the latest family of conjugate gradient methods of Dai & Kou [7] for unconstrained optimization, later the Gauss-Newton method of Li & Fukushima [11] for symmetric nonlinear system, and then construct our method step by step. Consider min f (x),

2

x ∈ Rn ,

(2.1)

where f : Rn → R is a nonlinear smooth function. The recent designed method of Dai & Kou [7] generates a sequence {xk } which satisfies the form xk+1 = xk + αk dk ,

(2.2)

where αk > 0 is a steplength. The search direction dk is generated by ( −∇f (x0 ), if k = 0, dk = DK −∇f (xk ) + βk (τk−1 )dk−1 , if k ≥ 1,

(2.3)

where βkDK (τk−1 ) is defined as βkDK (τk−1 ) =

s> yek−1 ∇f (xk )> sk−1 ∇f (xk )> yek−1 ke yk−1 k2 − τk−1 + > − k−1 2 , > ksk−1 k dk−1 yek−1 sk−1 yek−1 d> ek−1 k−1 y

(2.4)

where yek−1 = ∇f (xk ) − ∇f (xk−1 ), sk = xk − xk−1 , and τk−1 is a parameter corresponding to the scaling parameter in the scaled memoryless BFGS method. As shown by Dai & Kou that the formula (2.4) contains as a special case the well-known Dai-Liao method [6] and Hager-Zhang method [9]. It is also shown that, if s> ek−1 6= 0, the following inequality always holds k−1 y n ksk−1 k2 3 o k∇f (xk )k2 . ∇f (xk )> dk ≤ − min τk−1 > , sk−1 yk−1 4 Additionally, if τk−1 is specially chosen as A τk−1 =

ke yk−1 k2 > sk−1 yek−1

B and τk−1 =

s> ek−1 k−1 y , ksk−1 k2

(2.5)

or their particular variants, it is proved that ∇f (xk )> dk ≤ −ck∇f (xk )k2

(2.6)

for some positive constant c > 0 provided that the gradient of f is Lipschitz continuous on a certain A B bounded level set. Furthermore, it is shown that if τk−1 is a convex combination of τk−1 and τk−1 , i.e., A B τk−1 = θτk−1 + (1 − θ)τk−1 with θ ∈ [0, 1], then (2.6) holds with c = 3/4. Incorporating with an improved

line search and a dynamic restarts strategy, their proposed conjugate gradient method is experimentally illustrated very promising and performs dramatically better then the distinguished solver CG DESCENT [9, 10]. Now we are ready to turn our attention to the symmetric nonlinear system (1.1). Generally, if we let f (x) = 12 kF (x)k2 , problem (1.1) can be converted into the global optimization problem (2.1). Starting at x0 , the Newton method generates a sequence of iterate {xk } by the iterative process xk+1 = xk + αk dk , where dk is named Newton direction, and it is a solution of the following linear system Jk d + Fk = 0.

(2.7)

It is shown that if Jk is nonsingular, then ∇f (xk )> dk = −Fk> Jk> Jk−> Fk = −kFk k2 < 0, which implies that dk is a descent direction of f at xk . The general quasi-Newton method solves the following linear system to determine the search direction Bk d + Fk = 0, 3

where Bk is an approximation of Jk . The attractive property of quasi-Newton method is that some nice properties are preserved without requiring Jabobian information. Nevertheless, dk is not necessarily a descent direction even if Bk is nonsingular. Similar to (2.7), the novel approximated descent method of Li & Fukushima [11] is based on the following equivalent linear system Jk Jk> d + Jk Fk = 0.

(2.8)

Specially, if Jk Jk> is replace by Bk , then the search direction determined above satisfies ∇f (xk )> dk = −Fk> Jk> Bk−> Jk Fk < 0 provided that Bk is positive definite, which indicates that dk is descent. Moreover, for sufficiently small scalar αk−1 , it holds that J k Fk ≈

F (xk + αk−1 Fk ) − Fk . αk−1

(2.9)

Subsequently, the linear system (2.8) reduces to (note that Bk := Jk Jk> ) Bk> d +

F (xk + αk−1 Fk ) − Fk = 0. αk−1

(2.10)

Clearly, the diction obtained from (2.10) without computing the Jacobian matrix. It thus belongs to the derivative-free framework. Also from [11], letting γk = Fk+1 − Fk and noting sk = xk+1 − xk , it holds that γk ≈ Jk+1 sk . Moreover, letting yk = F (xk + γk ) − Fk ,

(2.11)

it yields that yk ≈ Jk+1 γk ≈ Jk+1 Jk+1 sk , or, equivalently, yk ≈ Bk+1 sk

(2.12)

Noting that yk = ∇f (xk+1 ) − ∇f (xk ), in optimization literature, the quasi-Newton equations show that yk = Bk+1 sk .

(2.13)

Comparing (2.12) with (2.13), it concludes that yk defined in (2.11) is the best surrogate for ∇f (xk+1 ) − ∇f (xk ) at the case of f (x) = 21 kF (x)k2 . On the basis of the above analysis, it is natural to define the following search direction in solving (1.1) ( −g0 , if k = 0, dk = (2.14) −gk + βkDK (τk−1 )dk−1 , if k ≥ 1, where βkDK (τk−1 ) =

g> s s> gk> yk−1 kyk−1 k2 k−1 yk−1 k k−1 − τ + − , k−1 > y > y 2 ks k d> y s d k−1 k−1 k−1 k−1 k−1 k−1 k−1

(2.15)

where yk−1 is defined in (2.11), and gk = [F (xk + αk−1 Fk ) − Fk ]/αk−1 with the previous steplength αk−1 A which will be given later (the evaluation g0 = F0 is followed). The τk−1 is a convex combination of τk−1 and B τk−1 i.e., A B τk−1 = θτk−1 + (1 − θ)τk−1 ,

(0 ≤ θ ≤ 1)

(2.16)

where τkA and τkB are defined similarly as in (2.5). Note that Lemmas 2.1—2.3 in [7] hold independent of the definition of yk . Hence, the following assertion follows directly provided that d> k−1 yk−1 6= 0, 3 gk> dk ≤ − kgk k2 . 4 4

(2.17)

This inequality shows that dk is a descent direction of f (e.g. kF (x)k) at point xk , which is very important for an iterative method to be global convergence [1]. In light of (2.14), we define the following line search rule to find a stepsize αk = max{ξk ρi |i = 0, 1, 2, · · · } such that f (xk + αk dk ) − f (xk ) ≤ δαk gk> dk + ηk , or, equivalently, 3δ αk kgk k2 + 2ηk , 2 where δ, ρ ∈ (0, 1), ξk = ξkgk k2 /kdk k2 with ξ > 0, and {ηk } be a given positive sequence such that kF (xk + αk dk )k2 − kF (xk )k2 ≤ −

∞ X

ηk ≤ η < ∞.

(2.18)

(2.19)

k=0

In light of all derivations above, we now formally state the steps of the Conjugate Gradient algorithm for solving Symmetric Nonlinear Equations (1.1) as follows (short for CGSNEq). Algorithm 2.1. Step 0. Choose an arbitrary initial point x0 , constants ξ > 0, δ ∈ (0, 1), ρ ∈ (0, 1), positive sequence ηk satisfying (2.19). Set k := 0. Step 1. Stop if F (xk ) = 0. Otherwise, determine dk by (2.14). Step 2. Compute αk by line search (2.18). Step 4. Set k := k + 1. Go to Step 1.

3.

Convergence Analysis

In this section, we establish the global convergence of Algorithm 2.1. For this purpose, we first define the level set S = {x : kF (x)k ≤ kF (x0 )k + η}.

(3.1)

It is clear to see that the sequence {xk } generated by Algorithm 2.1 belongs to S, i.e., xk ∈ S for all k. We assume that F satisfies the following assumptions, which have been used in the previous works such as in [11, 12, 18]. Assumption 3.1. The mapping F is continuous differentiable on an open convex set containing the level set S. Assumption 3.2. The Jacobian J(x) is symmetric and bounded on S, that is, there exists a positive constant M such that kJ(x)k ≤ M for all x ∈ S. Assumption 3.3. The Jacobian J(x) is uniformly nonsingular on S, that is to say, there exists a constant m > 0 such that ∀ x ∈ S, d ∈ Rn .

mkdk ≤ kJ(x)dk,

Obviously, Assumptions 3.1 — 3.3 imply that there exist positive constants M ≥ m > 0 such that mkx − yk ≤ kF (x) − F (y)k = kJ(θx + (1 − θ)y)(x − y)k ≤ M kx − yk, where θ ∈ [0, 1]. 5

∀ x ∈ S,

Lemma 3.1. Suppose that Assumptions 3.1—3.3 hold. Then we have lim αk kgk k2 = 0.

(3.2)

k→∞

Proof. Plugging both sides of (2.18) yields ∞ X k=0

∞ ∞ X 3δ X kF (xk + αk dk )k2 − kF (xk )k2 ≤ − αk kgk k2 + 2 ηk , 2 k=0

or, equivalently, kF (x∗ )k2 − kF (x0 )k2 ≤ −

k=0

∞ 3δ X αk kgk k2 + 2η, 2 k=0

which shows that

∞ X

αk kgk k2 < ∞.

k=0

Hence, the desired result (3.2) holds. The following result is from [11, Lemma 2.3], and its proof reappeared here only for completeness of the paper. Lemma 3.2. Suppose that Assumptions 3.1—3.3 hold. Then we have kyk k ≤ M 2 ksk k.

(3.3)

Additionally, if sk → 0, then there exists a constant m e > 0 such that for all k sufficiently large yk> dk ≥ mkd e k kksk k. Proof. By the definitions of yk and γk and Assumption 3.2, we see that

Z 1

Z 1

kyk k = J(xk + tγk )γk dt ≤ M kγk k = M J(xk + tsk )sk dt ≤ M 2 ksk k. 0

0

6

(3.4)

By the mean-value theorem we have yk> sk =[F (xk + γk ) − F (xk )]> sk Z 1 J(xk + tγk )dtγk =s> k 0 1

=s> k

Z

=s> k

hZ

Z

1

J(xk + tsk )dtsk

J(xk + tγk )dt 0

0 1

i2 J(xk + tγk )dt sk

0

+

s> k

Z

1

Z [J(xk + tγk ) − J(xk + tsk )]dt

J(xk + tsk )dtsk

0

Z

≥

0

1

1

0

2

J(xk + tγk )dtsk

Z

Z 1

[J(xk + tγk ) − J(xk + tsk )]dtsk

−

1

0

0

J(xk + tsk )dtsk

2

=kF (xk+1 ) − F (xk )k

Z 1

Z

− [J(xk + tγk ) − J(xk + tsk )]dtsk

0

0

≥m2 ksk k2 − M ksk k2

Z

1

J(xk + tsk )dtsk

1

kJ(xk + tγk ) − J(xk + tsk )kdt 0

1

Z = m2 − M

kJ(xk + tγk ) − J(xk + tsk )kdt ksk k2 .

0

If sk → 0, then γk = F (xk+1 ) − F (xk ) → 0. By the continuity of J, we get (3.4). e such that Lemma 3.3. Suppose that Assumptions 3.1—3.3 hold. Then there exists C 3 e k k. kgk k ≤ kdk k ≤ Ckg 4

(3.5)

Proof. By (2.17), it is easy to see that 3 −kgk kkdk k ≤ gk> dk ≤ − kgk k2 . 4 Therefore, the left side of (3.5) holds. On the other hand, from (3.3) and (3.4), it is easy to deduce that A τk−1 =

kyk−1 k2 M 4 ksk−1 k2 ≤ , c1 mks e k−1 k2 s> k−1 yk−1

B and τk−1 =

s> M 2 ksk−1 k2 k−1 yk−1 ≤ , c2 . ksk−1 k2 ksk−1 k2

A B Subsequently, we obtain τk−1 ≤ c and τk−1 ≤ c with c = max{c1 , c2 }. Moreover, by (2.16), the formula

(2.15) can be reformulated as βkDK (τk−1 ) =

h i g> s gk> yk−1 A B k k−1 − (1 + θ)τk−1 + θτk−1 . > dk−1 yk−1 d> k−1 yk−1

7

Consequently, we have DK kdk k ≤kgk k + |βk−1 (τk−1 )| · kdk−1 k g> y h i g> s k−1 A B k k−1 − (1 + θ)τk−1 + θτk−1 =kgk k + >k · kdk−1 k dk−1 yk−1 d> k−1 yk−1 M 2 ksk−1 kkdk−1 k ksk−1 kkdk−1 k ≤ 1+ + (1 + 2θ)c kgk k d> d> k−1 yk−1 k−1 yk−1 M 2 + (1 + 2θ)c kgk k. ≤ 1+ m e

Hence, the conclusion of the lemma holds. Lemma 3.4. Suppose that Assumptions 3.1—3.3 hold. Then we have n ρξkg k2 ( 3 − ρ 3 δ)kg k2 − 2ρζ kF kkd k o k k k k k 2 , 4 , kdk k2 M 2 kdk k2

αk ≥ min where Z

1

kJ(xk + tαk ρdk ) − J(xk + tαk−1 Fk )kdt.

ζk =

(3.6)

(3.7)

0

Proof. If αk = ξk , then (3.6) holds clearly by the definition of ξk . If αk 6= ξk , then αk0 = αk /ρ does not satisfy (2.18), that is kF (xk + αk0 dk )k2 − kF (xk )k2 > −

3δ 0 α kgk k2 + 2ηk . 2 k

On the other hand, we have kF (xk + αk0 dk )k2 − kF (xk )k2 =[F (xk + αk0 dk ) + F (xk )]> [F (xk + αk0 dk ) − F (xk )] =kF (xk + αk0 dk ) − F (xk )k2 + 2F (xk )> [F (xk + αk0 dk ) − F (xk )] Z 1 ≤M 2 kαk0 dk k2 + 2αk0 F (xk )> J(xk + tαk0 dk )dk dt. 0

Combining with (3.8), we get R1 −ρ 23 δkgk k2 − 2ρFk> dk 0 J(xk + tαk0 dk )dt αk ≥ M 2 kdk k2 R1 3 −ρ 2 δkgk k2 − 2ρFk> dk 0 J(xk + tαk0 dk )dt + gk> dk − gk> dk = M 2 kdk k2 R1 ( 43 − ρ 32 δ)kgk k2 − 2ρFk> dk 0 J(xk + tαk0 dk )dt + gk> dk ≥ M 2 kdk k2 R1 ( 34 − ρ 32 δ)kgk k2 − 2ρFk> dk 0 (J(xk + tαk0 dk ) − J(xk + tαk−1 Fk ))dt = M 2 kdk k2 R 1 ( 3 − ρ 32 δ)kgk k2 − 2ρFk> dk 0 kJ(xk + tαk0 dk ) − J(xk + tαk−1 Fk )kdt ≤ 4 M 2 kdk k2 ( 3 − ρ 32 δ)kgk k2 − 2ρFk> dk ζk , 4 M 2 kdk k2 3 3 ( − ρ 2 δ)kgk k2 − 2ρζk kFk kkdk k ≥ 4 , M 2 kdk k2 which implies the desirable result. 8

(3.8)

Now, we are ready to establish the global convergence of Algorithm 2.1. Theorem 3.1. Suppose that Assumptions 3.1—3.3 hold and 2ρδ < 1. Then, the sequence {xk } is generated by Algorithm 2.1 satisfy lim kFk k = 0.

(3.9)

k→∞

Proof. Assume that there exists a constant > 0 such that kFk k ≥ ,

∀k ≥ 0,

(3.10)

then it is easy to see that

F (x + α

Z 1

k k−1 Fk ) − Fk J(xk + tαk−1 Fk )Fk dt ≥ mkFk k = m. kgk k =

= αk−1 0 Hence, (3.2) implies that lim αk = 0,

(3.11)

k→∞

which furthermore indicates that by the definition of ζk in (3.7), lim ζk = 0.

k→∞

Since the sequence {xk } is bounded, then there exists a positive constants M1 and M2 such that ≤ kFk k ≤ M1 ,

and m ≤ kgk k ≤ M2 ,

and, by (3.5), 3 e 2. m ≤ kdk k ≤ CM 4 These together with (3.6) obtains that lim αk ≥ min

k→∞

n ρξm2 2 ( 3 − ρ 3 δ)m2 2 o 16 n 4 2 , = min ρξ, 9 9 2 2 2 2 2 9 16 m 16 M m

3 4

− ρ 32 δ o > 0, M2

which contradicts with (3.11).

4.

Numerical Experiments

In this section, we use some available examples to illustrate the feasibility and effectiveness of Algorithm CGSNEq. Moreover, we also test against the recently designed solver CGD [13] to show the superiority of the proposed method. The algorithm is implemented by Matlab 2009a in double precision arithmetic. All runs are performed under Windows XP Home Services Pack 1 and running on a PC with Intel Core 2 Quad CPU at 2.5 GHz and 4GB SDRAM. For each test problem, the termination condition is kF (xk )k ≤ 10−3 .

(4.1)

The process is also stopped if the number of iterations exceeds 2000. Our experiments are performed on a set of the nonlinear unconstrained minimization problems from the CUTEr [5] library that have second derivatives available. Here, we only reformulate the unconstrained minimization problems as their gradient and subsequently solve ∇f (x) = 0 via our proposed method.

9

Table 1: Test results of CGD and CGSNEq with different θ value CGD Prob

θ = .2

Dim

Iter

Time

Fnorm

ARWHEAD

1000

14

0.31

5.73742e-4

DIXMAANA

1000

28

1.01

7.24421e-4

DIXMAANB

1000

19

0.61

1.80377e-4

DIXMAANC

1000

26

0.69

9.17289e-4

EDENSCH

1000

258

5.42

9.98743e-4

ENGVAL1

1000

833

22.13

COSIN DENSCHNB

1000 1000

27

GENROSE

1000

PENALTY

1000

Iter

θ = .4

Time

Fnorm

40

0.84

9.94612e-04

140

2.09

9.79743e-04

93

1.21

8.76145e-04

132

2.15

9.31946e-04

211

4.27

9.62738e-04

9.96913e-4

228

7.91

0.17

5.31761e-4

313 175

2000

27.00

3.10239e+3

12

0.10

5.85195e-4

Iter

Time

Fnorm

40

0.70

9.94612e-04

147

1.87

9.38396e-04

98

0.89

8.86076e-04

140

1.07

9.44098e-04

199

1.76

9.70733e-04

9.97498e-04

251

3.06

9.89039e-04

3.96 38.81

9.80812e-04 9.98217e-04

224 253

0.89 17.65

9.25674e-04 9.84724e-04

1806

77.12

9.92889e-04

2000

26.49

5.14059e-03

148

1.19

9.04860e-04

148

0.44

9.04860e-04 9.01485e-04

DIXON3DQ

1000

9

0.06

6.88239e-4

60

0.67

9.01485e-04

60

0.32

GENHUMPS

1000

328

2.33

5.25158e-4

278

8.03

9.77566e-04

256

2.09

9.89529e-04

BDEXP

1000

1724

4.19

9.99716e-4

1

0.01

6.64126e-11

1

0.00

6.64126e-11

Prob

Dim

Iter

Time

Fnorm

Iter

Time

Fnorm

Iter

Time

Fnorm

ARWHEAD

1000

40

0.82

9.94612e-04

40

0.70

9.94612e-04

40

0.82

9.94612e-04

DIXMAANA

1000

141

2.93

9.29292e-04

141

1.95

9.46500e-04

141

1.89

9.59384e-04

DIXMAANB

1000

89

1.60

9.65381e-04

87

0.80

9.94213e-04

98

1.84

9.09805e-04

θ = .6

θ = .8

θ = 1

DIXMAANC

1000

140

2.12

9.88205e-04

140

1.10

9.64333e-04

146

2.43

9.28887e-04

EDENSCH

1000

208

5.95

9.52548e-04

202

1.80

9.44337e-04

214

4.36

9.22746e-04

ENGVAL1

1000

229

8.11

9.95088e-04

231

2.81

9.92377e-04

262

9.82

9.29494e-04

COSIN DENSCHNB

1000 1000

2000 262

14.32 57.27

2.05232e-02 9.89876e-04

2000 232

4.50 16.01

1.62608e-02 9.85453e-04

2000 270

14.89 58.80

1.98393e-02 9.93775e-04

GENROSE

1000

450

19.31

9.97657e-04

1800

23.91

9.86762e-04

286

9.27

8.89324e-04

PENALTY

1000

148

1.51

9.04860e-04

148

0.44

9.04860e-04

148

0.80

9.04860e-04

DIXON3DQ

1000

60

0.81

9.01485e-04

60

0.32

9.01485e-04

60

1.12

9.01485e-04

GENHUMPS

1000

276

7.51

9.63161e-04

254

2.11

9.76499e-04

252

6.01

9.98443e-04

BDEXP

1000

1

0.01

6.64126e-11

1

0.00

6.64126e-11

1

0.01

6.64126e-11

We implemented CGSNEq with the following parameters’ values: ξ = 10, ρ = 0.3, δ = 0.001, which have been verified are the best choice in the experiments’ preparing. Additionally, we set ηk =

1 k2

in this

test. The numerical results are reported in Table 1, which contains the names of the problems (Prob), the dimension of the problem (Dim), the Number of iterations (Iter), the CPU time required in seconds (Time), and the final norm of equations (Fnorm). Observing Table 1, it clearly see that the proposed algorithm worked successfully to derive final solutions for almost problems in each case. Specifically, on problem BDEXP, only one step is required for each θ, because the initial point also the solution as can be seen from CUTEr. On the other hand, we also see that the proposed algorithm performed successfully on problem GENROSE except for the case of θ = .4, and successfully on problem COSIN only at the case of θ = 0.2, 0.4. Finally, it is also can be seen that our proposed method is competitive with and even slightly better than the algorithm CGD. Taking in all, the limited numerical experiments illustrate that the proposed algorithm provided an efficient approach to solve large-scale symmetric nonlinear equations.

5.

Conclusions

In this paper, we proposed, analyzed, and tested a family of efficient sufficient descent algorithms to solve large-scale symmetric nonlinear equations. The type of the problem contained as an equivalent form for finding stationary point for unconstrained minimization or finding KKT point for equality constrained minimization, or others. Additionally, the problem may arise from in unconstrained minimization where its objective function’s formulation is unavailable. The earliest method to solve this type of problem due to the Newton-type scheme, where the Jacobian information or solving a resulting linear system is required at each iteration. Hence, these methods are not suitable for high dimensional problems. Hence, the derivative-free type of method is urgent. In this paper, we extended the latest and the state-of-the-art algorithm of Dai & Kou [7] to solve symmetric nonlinear equations. Our motivation is natural, and it largely cames from 10

the fact that the conjugate gradient method is very effective to solve large-scale unconstrained optimization problems, and faster than the well-known method CG DESCETN [9, 10]. The nice properties of the proposed method is that it does’t require the Jacobian information of equations or even does not store any matrix at each iteration. Hence, it has the potential to solve non-smooth equations. Under some technical conditions, we showed that the proposed method converges globally. Numerical experiments on a series of modified problems from CUTEr [5] library with different parameter values of θ illustrated that the proposed method is competitive with the recently designed solver CGD [13].

Acknowledgement We would like to thank two anonymous referees for their constructive suggestions which improved the paper greatly. This work is supported by Natural Science Foundation of Henan Province grant 13HASTIT050 and 2011GGJS030.

References [1] X.M. An, D.H. Li, and Y. Xiao, Sufficient descent directions in unconstrained optimization, Comput. Optim. Appl. 48 (2011), 515-532. [2] J. Barzilai and J.M. Borwein, Two point step size gradient method, IMA J. Numer. Anal. 8 (1988) 141-148. [3] W. Cheng and Z. Chen, Nonmonotone spectral method for large-scale symmetric nonlinear equations, Numer. Algor., 62 (2013), 149-162. [4] W. Cheng, Y. Xiao, and Q. Hu, A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations, J. Comput. Appl. Math., 224 (2009), 11-19. [5] A.R. Conn, N.I.M. Gould, Ph.L. Toint, CUTE: constrained and unconstrained testing environment, ACM Trans. Math. Softw. 21 (1995) 123-160. [6] Y.-H Dai and L.-Z. Liao, New conjugacy conditions and related nonlinear conjugate gradient methods, Appl. Math. Optim., 43 (2001), 87-101. [7] Y.-H. Dai and C.-X. Kou, A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search, SIAM J. Optim., 23 (2013), 296-320. [8] G.-Z. Gu, D.-H. Li, L. Qi, and S.-Z. Zhou, Desecent direction of quasi-Newton methods for symmetric nonlinear equations. SIAM J. Numer. Anal., 40 (2002), 1763-1774. [9] W.W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM J. Optim. 16 (2005) 170-192. [10] W.W. Hager and H. Zhang, Algorithm 851: CG DESCENT, a conjugate gradient method with guaranteed descent, ACM Trans. Math. Soft. 32 (2006) 113-137. [11] D.-H. Li, and M. Fukushima, A globally and superlinearly convergent Gauss-Newton-based BFGS methods for symmetric nonlinear equations, SIAM J. Numer. Anal., 37 (1999), 152-172. 11

[12] D.-H. Li and X. Wang, A modified Fletcher-Reeves-type derivative-free method for symmetric nonlinear equations, Numer. Alge. Cont. Optim., 1 (2011), 71-82. [13] Y. Xiao and H. Zhu, A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing, J. Math. Anal. Appl., 405 (2013), 310-319. [14] G. Yuan and X. Lu, A new backtracking inexact BFGS method for symmetric nonlinear equations, Comput. Math. Appl., 55 (2008), 11-129. [15] G. Yuan, X. Lu, and Z. Wei, BFGS trust-region method for symmetric nonlinear equations, J. Comput. Appl. Math., 230 (2009) 44-58. [16] L. Zhang, W. Zhou and D. Li, A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence, IMA J. Numer. Anal., 26 (2006) 629-640. [17] L. Zhang, W. Zhou and D.-H. Li, Global convergence of a modied Fletcher-Reeves conjugate gradient method with Armijo-type line search, Numer. Math., 104 (2006), 561-572. [18] W. Zhou and D. Shen, Convergence properties of an iterative method for solving symmetric nonlinear equations, J. Optim. Theory Appl., DOI 10.1007/s10957-014-0547-1. [19] W. Zhou and D. Shen, An inexact PRP conjugate gradient method for symmetric nonlinear equations, Numer. Func. Anal. Opt., 35(2014), 370-388.

12