A Modified BFGS Algorithm for Unconstrained Optimization

IMA Journal of Numerical Analysis (1991) 11, 325-332

A Modified BFGS Algorithm for Unconstrained Optimization YA-XIANG YUAN

Computing Centre, Academia Sinica, Beijing 100080, China [Received 25 April 1989 and in revised form 16 October 1990] In this paper we present a modified BFGS algorithm for unconstrained optimization. The BFGS algorithm updates an approximate Hessian which satisfies the most recent quasi-Newton equation. The quasi-Newton condition can be interpreted as the interpolation condition that the gradient value of the local quadratic model matches that of the objective function at the previous iterate. Our modified algorithm requires that the function value is matched, instead of the gradient value, at the previous iterate. The modified algorithm preserves the global and local superlinear convergence properties of the BFGS algorithm. Numerical results are presented, which suggest that a slight improvement has been achieved. 1. Introduction

metric algorithms for unconstrained optimization are a class of numerical algorithms for solving the following problem: VARIABLE

min/(jt).

(1.1)

jreR"

They are iterative. On the kth iteration an approximation point xk and an n x n matrix Bk are available. A search direction dk = -Bkl

Vf(xk)

(1.2)

is calculated, then a step-length ak > 0 is calculated to satisfy certain line search conditions, and the next iterate xk+l is set to be xk + akdk. One of the important features of the method is the choice of matrices Bk. Variable metric algorithms require Bk positive definite and satisfying the quasi-Newton equation Bk+l6k

= yk,

(1.3)

where 0, because -f'(xk)/ck iterate be

-f'(xk)(xk^-xk)] ^

•

(2-5)

is the minimum of 4>k(-), we let the next

xk+i = xk —f'(xk)/ck f'(xk)(xk-xk^) z

(2.6)

\ 7 y.xk) — lf(xk) —j(xk^i)\/(xk

—xk-i))

which is a variant of formula (2.2). It is always true that ck>0 if the objective function f(x) is strictly convex. Both iteration formulae (2.2) and (2.6) can be viewed as approximations to Newton's iteration xk+i=xk-f'(xk)/f"(xk).

(2.7)

In other words, (2.2) is a modified Newton's method where the approximation (2.8)

is used, and in (2.6) one uses 2{/'(**)-[/(**)-J Xk ~ Xk-\

=/"(**)•

(2-9)

The leading errors in (2.8) and (2.9) are -hf'"(xk){xk -xk_x) and -hf'"(xk)(xk xk-i) respectively. Hence (2.6) is a better approximation to (2.7) than (2.2). It can be shown that the iterate scheme (2.6) also has local (2-superlinear convergence properties. Assuming xk converges to x* at which f'(x*) = 0, f"(x*)=£0, and/"'(**)#(), from (2.6) we can prove that \xk+i-x*\ =

/'"(**)

3f"(x*)

\xk-x*\

!**_, -x*\ +o(\xk -x*\ |*k_, -x*\),

(2.10)

which implies that f'"(x*) w

llr •

(

2

1

1

)

where again r = !(/5 + l). The convergence rates (2.11) and (2.3) indicate that (2.6) is slightly faster than (2.2). We consider a simple example that minimizes the function f(x) = -xe~x, which has the unique minimum x* = 1. Initial points xx = 0-0 and x2 = 0-1 are chosen, and the numerical results of both methods (2.2) and (2.6) are presented in Table 1. Only the errors 10 - xk (k^ 10) are given to compare the performances of both methods.

328

YA-XIANG YUAN TABLE 1

The methods (2.2) and (2.6) for f(x) = -xeT Error 1-0-x, 10-x 2 10-jtj

10-^4 10-^5 10-*6 10-*7 10 - x 8 1 0 -x9 1-0-* I0

(2.2)

(2.6)

1000000000 x 10° 0-900000000 x 10° 0-461341340 x 10° 0-244721116 X 10° 0-832019761 x 10"' 0-174604885 x 10"' 0138265830 x 10" 2 0-239160474 X io-4 0-330444768 X io-7 0-790284505 x io- 2

1000000000 x 10° 0-900000000 x 10° 0-450000000 x 10° 0-211038490 x 10° 0-606665134 x io-' 0-881302355 x io-2 0-373191911 x 10"3 0-223267244 x io-5 0-557076149 x 10" 9 0111022302 x 1 0 - M

The results in Table 1 show that in this simple example the numerical performance of (2.6) is better than that of (2.2). This motivates us to investigate updating formulae based on (1.10) for n-dimensional unconstrained optimization. 3. A modified BFGS algorithm The BFGS algorithm for unconstrained optimization problem (1.1) uses the search direction (1.2), and the matrices Bk (k = 1, 2, . . .) are updated by the BFGS formula Bk6k6TkBk — Bk — T 6 kkBBkk66k

ykyJk

(3.1)

which satisfies the quasi-Newton equation (1.3). The BFGS algorithm is one of the most efficient algorithms for solving the unconstrained optimization problem (1.1). More details can be found in Fletcher (1980). This algorithm is globally convergent when the objective function f(x) is convex, if the inexact line search conditions f(xk + akdk) d[ Vf(xk H akdk)

l Vf(xk), T

c2d kVf(xk)

(3.2) (3.3)

are satisfied, where c t and c2 are two constants such that 0 < cx *s c2 < 1 and cx < 0-5. Assume the sequence xk (A: = 1, 2, . . .) generated by the BFGS algorithm converges to a solution x* where V2f(x*) is positive definite. If the stepsize ak = 1 is chosen whenever it satisfies the line search conditions (3.2)-(3.3), it can be shown that xk converges to x* Q-superlinearly, that is More details can be found in Powell (1976) and Dennis & More (1977). We consider the case when the search direction dk is a solution of the quadratic subproblem (1.6), but this subproblem satisfies the interpolation condition (1.10), instead of (1.9). (We recall from Section 1 that (1.9) is equivalent to the

A MODIFIED BFGS ALGORITHM

329

quasi-Newton equation (1.3).) Using definition (1.6), the interpolation condition (1.10) with k increased by one is 6TkBk+l6k = 2[f(xk) -f{xk+x)

+ 61 Vf(xk+l)].

(3.5)

The positive definiteness of Bk+l requires

f(xk) -f(xk+l)

> -61 Vf(xk+l).

(3.6)

Inequality (3.6) is trivial if the objective function/(x) is strictly convex, and it is also true if the step-length ak is chosen by an exact line search which requires 6\ Vf(xk+i) = 0. We update Bk+1 by the formula R

_

R

Bk6k6JkBk

YkYl

,~ _ .

'*;Ff7~> (3-7) 6kBk6k 6kyk which is a slight modification of the BFGS formula. Now relation (3.5) implies Bk+1-Bk-

+

T

(3.8)

°kYk

It is easily seen that tk e [0,2] if f(x) is convex. Furthermore, tk = l if f{x) is quadratic on the line segment between xk and xk+i. Since Bk+1 is positive definite if and only if tk > 0, we require tk > 0 which is equivalent to condition (3.6). For a general nonlinear objective function / ( * ) , one can modify the line search conditions so that (3.6) is satisfied. Assuming the line search conditions (3.2)-(3.3) are used, we restrict tk to the interval 001 =s f* =£ 100.

(3.9)

Now we can state a modified BFGS algorithm with inexact line searches as follows. 3.1. (A modified BFGS algorithm) Step 0: Given xx e W, S, positive definite, k := 1. Step 1: Calculate Vf(xk), ALGORITHM

dk = -B~kl

Vf(xk).

Step 2: Calculate ak > 0 such that conditions (3.2)-(3.3) are satisfied,

Step 3: Calculate tk by (3.8), if tk < 001 then set tk = 0 0 1 , if tk > 100 then set tk = 100, update Bk+i by (3.7), set k := k + 1, and go to Step 1. Because tk is truncated so that inequality (3.9) is satisfied, by slightly modifying the proof in Powell (1976), it can be shown that Algorithm 3.1 converges globally for convex objective functions with inexact line searches. Assume xk converges to a strict local minimum x* where Vf(x*) = 0 and V2f(x*) is positive definite, and

330

YA-XIANG YUAN

that f(x) is twice continuously differentiate. Then it can be proved that limr* = l.

(3.10)

Thus it is reasonable to hope that local Q-superlinear convergence of the BFGS algorithm can be extended to the modified algorithm where updating formula (3.7) is used. Details of local analyses of the BFGS algorithm can be found in Dennis & More (1977). 4. Numerical results and discussions

A FORTRAN subroutine was programmed to test the modified BFGS algorithm presented in the previous section. The following test problems are used. Problem 1. (Rosenbrook's function) / ( * „ x2) = 100(*2 -x2)2 + (1-Xl)2

(4.1)

T

starting point: (-1-2,10) , solution: (1,1) T . Problem 2. (Powell's function of four variables) f(xu

2

2

x2, x3, x4) = Ot, + 10x2) + 5(x3 - x4)

4

+ (x2 - 2JC3) + l O ^ - x4)\

(4.2)

T

starting point: (3,-1,0,1) , solution: (0,0,0,0) T . Problem 3. (Wood's function) / ( * „ x2, x3, x4) = 100(x2 -x2)2

+ (1 -Xl)2

+ 90(*4 -x2)2

+ (1 - x3)2

+ 10-l[(x 2 - I) 2 + (x4 - I) 2 ] - 19-8(JC 2 - 1)(JC4 - 1),

(4.3)

starting point: (—3, —1, —3, — 1) T , solution: ( 1 , 1 , 1 , 1 ) T . Problem 4. (A quartic function) 4

/(*i> X2> x3, x4) = 2 (ICC"1*4 + x] + 101"'*2),

(4.4)

starting point: (1,1,1,1) T , solution: (0,0,0,0) T . Problem 5. (A sine-valley function) f(xu x2) = 100[x2 - sin (xt)]2 + 0-25*2,

(4.5)

starting point: (^Jt, — 1)T, solution: (0,0) T . Problems 1-3 are well-known test problems for unconstrained optimization (for example, see Fletcher & Freeman, 1977; Fletcher, 1980). Problem 4 is so constructed that the Hessian of the objective function at the starting point is very

A MODIFIED BFGS ALGORITHM

331

different from that at the solution, namely V2f(l, 1, 1, 1) = DIAG [20,126-2,1206-02,12006-002],

(4.6)

V2f(0, 0, 0, 0) = DIAG [2,0-2,002,0-002].

(4.7)

but Problem 5 is a sine-valley function having a valley along the curve x2 = sin The calculations were carried out on an IBM 4341 machine with doubleprecision arithmetic. The convergence criterion is e. 8

(4-8) 12

For each problem we run for both e = 10" and e = 10" and the initial matrix Bx is chosen to be the unit matrix /. The step-length ak > 0 satisfying conditions (3.2)-(3.3), with c t = 0-01 and c2 = 0-9, is calculated by quadratic approximation with bracketing techniques. More details can be found in Fletcher (1980). All problems are solved successfully, and the numbers of iterations and function evaluations are given in Table 2. We also solved these problems by the BFGS algorithm, and the numerical results of the BFGS algorithm are also given in Table 2. In Table 2, NI and NF are the numbers of iterations and function evaluations respectively. The numerical results show that the modified algorithm is slightly better than the original BFGS algorithm on this collection of test problems. One possible disadvantage of the modified algorithm is that the calculation of tk in formula (3.8) may lose accuracy due to rounding errors. That is, when xk converges to a solution x* superlinearly, it follows that /(**) -/(** + i) =