cm(x)]T :

A QUASI-NEWTON QUADRATIC PENALTY METHOD FOR MINIMIZATION SUBJECT TO NONLINEAR EQUALITY CONSTRAINTS THOMAS F. COLEMANy , JIANGUO LIUz AND WEI YUANx

Abstract. We present a modi ed quadratic penalty function method for equality constrained optimization problems. The pivotal feature of our algorithm is that at every iterate we invoke a special change of variables to improve the ability of the algorithm to follow the constraint level sets. This change of variables gives rise to a suitable block diagonal approximation to the Hessian which is then used to construct a quasi-Newton method. We show that the complete algorithm is globally convergent. Preliminary computational results are reported.

Key words.

nonlinearly constrained optimization, equality constraints, quasi-Newton methods, BFGS, quadratic penalty function, reduced Hessian approximation

AMS(MOS) subject classi cations. 65K05, 65K10, 65H10, 90C30, 90C05, 68L10 1. Introduction. One of the great success stories in continuous optimization is

the development of eective quasi-Newton methods for unconstrained minimization (at least for problems of moderate size). Three important reasons for this success are: Line search rules that ensure global convergence are consistent with positive de nite quasi-Newton updates: the approximating matrix can be updated at every iteration (even at points far from the solution) and positive de niteness can be maintained. Ultimately the line search rules allow for unit step sizes which facilitates rapid local convergence. The true Hessian matrix is positive de nite in a neighborhood of a strong local minimizer (thus \justifying" the preservation of positive de niteness of the approximating matrices). Unfortunately, adaptation of the unconstrained quasi-Newton technology to the nonlinearly constrained problem, minimize ff (x) : c(x) = 0g;

(1.1)

where f : 0; 0 < < 1 ? p12 ; 0 < < 1, and < ! < 1. Choose a point x0 2 Rn and an n n positive de nite matrix B1(0) . Set i ( i is not suciently small ) Set k 0; x(0) xi?1 ; i either of the criteria in (2.21) does not hold (2.21 b) does not hold Solve (Ri(k) )T dv (ik) = ?c(x(ik) ); acktrack: Find a i(k) > 0 satisfying (2.19) x(ik+) x(ik) + i(k) Yi(k) dv (ik) ; x(ik+) x(ik) ;

1.

while

while

if

else

end;

Solve Bi(k) dh (ik) = ?(Zi(k+) )T rf (x(ik+) ); Path search: Find an (ik) > 0 satisfying (2.14) and (2.15); x(ik+1) u((ik) dh(ik) ); yi(k) rhp (x(ik+1) ) ? rhp (x(ik+) ), Bi(k+1) BFGS(Bi(k) ; (ik) d(hk) ; yi(k) ); k k + 1; i

i

i

end;

xi x(ik) ; i i + 1; end;

Set x

maxf6i =5 ; k(Zi )T rfi k2 g;

i+1

xi and STOP; Figure 2:1 : Algorithm 2:1

A monotone decrease result, for xed i , is easy to establish. Lemma 2.5. Assume a sequence fx(ik) g is generated by Algorithm 2.1 with index i xed. Then,

p (x(ik+1) ) ? p (x(ik) ) 0:

(2.23)

kc(x(ik) )k > (ik) i

(2.24)

i

i

Furthermore, if where (ik) = maxf k(x )k ; 1g, then (k) i

rp (x(ik) )T Yi(k) d(vk) = (x(ik) )T c(x(ik) ) ? 1 kc(x(ik) )k2 : i

i

9

i

(2.25)

Proof. The direction d(hk) satis es Bi(k) d(hk) = ?rhp (u(0)); this, along with (2.14), and the positive de niteness of Bi(k) implies that i

i

i

p (x(ik+1) ) ? p (x(ik+) ) ? rhp (u(0))T (Bi(k) )?1 rhp (u(0)) 0: i

i

i

i

(2.26)

If (2.24) does not hold, then xi(k+) = x(ik) and (2.23) follows from (2.26). If (2.24) holds, equation (2.25) follows from (2.11) and d(vk) = ?(Ri(k) )?T c(ik) : Since < 1, it follows from (2.20) that p (x(k+) ) ? p (x(k) ) [(x(k) )T c(x(k) ) ? 1 kc(x(k) )k2 ] i

i i

i i

i

i

i

i

i

i

[k(x(ik) )k kc(x(ik) )k ? 1 kc(x(ik) )k2 ] i ? 1 ( k ) )kc(x )k2 0: (

(2.27)

3. Global Convergence. In this section we analyze the global convergence of Algorithm 2.1. We call x 2 ~[rp (x(ik) )]T Yi(k) d(vk) p (x(ik) + Y i

i

i

i

i

(3.30)

and ~ i(k) . Since matrices r2 ci (x) and r2 f (x) are bounded and m c (x) X 2 2 r pi (x) = r f (x) + i r2 ci(x) + 1 A(x)A(x)T ; i i=1 i

Taylor's theorem yields that for some x~(ik) near x(ik) ~ i(k) d(vk) ) ? p (x(ik) ) p (xi(k) + Y i

i

i

~2 ~[rp (x(ik) )]T Yi(k)d(vk) + 2 (Yi(k) d(vk))T [r2p (~x(ik) )](Yi(k)d(vk) ) i

i

i

i

i

i

~[rp (x(ik) )]T Yi(k)d(vk) ) + ~2 K kYi(k) d(vk) )k2 i

i

(3.31)

i

where K is a constant independent of i . Noting that Yi(k) dv (ik) = ?Yi(k) (Ri(k) )?T c(xi(k) ), it then follows, using (3.30) and (3.29), that 2

0 kc(x(k) )k2 : ?(1 ? )[rp (x(ik) )]T Yi(k)d(vk) < ~ K kYi(k)d(vk) ik2 ~ KK i i

i

i

i

(3.32)

On the other hand, since ki(k) k (ik) < kc(x )k whenever i(k) is computed, equation (2.25) implies that (k) i

i

rpi (xi(k) )T Yi(k) d(vki )

kc(x(ik) )kk (ik)k ?

kc(x(ik) )k2 ?(1 ? ) kc(x(ik) )k2 : (3.33) i

i

Combining (3.32) and (3.33), we obtain that

? ) : i(k) ~ > (1KK 2 2

0

Next we show that for any xed i > 0 the criteria in Algorithm 2.1, i.e., (2.21), can be satis ed after a nite number of iterations. Similar to most convergence analysis for quasi-Newton methods, we need to make some boundedness assumptions on the matrices fBi(k) g. It should be noted that such assumptions may not always hold, say, for the matrices generated by the BFGS update. Lemma 3.2. Let Assumption 3.1 hold, suppose that sequence fx(ik) g is generated by Algorithm 2.1, and i is held at a constant value (by Algorithm 2.1). Furthermore, assume that there exists a constant M > 0 such that

eigmax (Bi(k) ) M ; eigmin(Bi(k) ) M ? ; 1 2

1 2

11

(3.34)

where eigmax and eigmin denote the greatest and the least eigenvalues, respectively. Then there exists an integer k, such that for k k

kc(x(ik) )k i(k) i and kZ (xi(k) )T rf (x(ik))k2 i:

(3.35)

Proof. First we prove, by contradiction, that there exists an integer k > 0 such that for all k k,

kc(x(ik) )k i(k) i:

(3.36)

If (3.36) does not hold for all k suciently large, there exists a subsequence fks g such that

kc(xi(k ) )k > (ik ) i i: s

s

Thus, it follows from (2.23), (2.26), and (2.27) that

p (xi(k +1)) ? p (xi(k ) ) p (x(ik +) ) ? p (xi(k )) (k ) 2 ?(1 ? ) i(k ) kc(xi )k i ( k ) < ?(1 ? ) i i : s

s

i

s

i

s

i

i

s

s

s

Thus Lemma 3.1 implies that

p (xi(k +1) ) ? p (xi(k )) ?(1 ? ) i s

s

i

i

which contradicts the fact that p (x) is bounded below for any xed i > 0. Since i is not further decreased by Algorithm 2.1, by assumption, and (3.36) holds for all k k, it must be that i

k(Zi(k) )T rf (x(ik))k > 1i =2

(3.37)

for all k k. Inequality (2.14) and the positive de niteness of Bi(k) yield that

p (x(ik+1) )?p (xi(k) ) [Z (xi(k) )T rp (x(ik) )]T ((ik) d(hk) ) ? (ik) (d(hk) )T Bi(k) d(hk) < 0: i

i

i

i

i

i

Since p (x) is bounded below, it follows that i

1 X k=0

j[(Zi(k) )T rf (x(ik))]T ((ik) d(hk) )j < +1: i

Therefore, limk!1 j[(Zi(k) )T rf (xi(k) )]T ((ik) d(hk) )j = 0. Combining with (3.37), we get limk!1 ki(k) dh(k) k = 0. Inequality (2.15) implies that i

i

?[Z (xi(k) )T rf (xi(k))]T ((ik) d(hk) ) i

12

[rh p (u((ik) d(hk) )) ? rh p (u(0))]T [(ik) dh(k) ] 1?! ( k ) ( k ) krhp (u(i dh )) ? rhp (u(0))kk(ik) d(hk) k : (3.38) 1?! It follows from (3.37), (3.38), and the uniform continuity assumption that, as k ! 1,

i

i

i

i

i

i

i

i

j [Z (xi(k) )T rf (xi(k))]T ((ik) d(hk) ) j krhp (u((ik) d(hk) )) ? rhp (u(0))k ! 0: (3.39) kZ (x(ik) )T rf (x(ik))k k(ik) d(hk) k 1i =2 (1 ? !) i

i

i

i

i

On the other hand, since Z (x(ik) )T rf (x(ik) ) = ?Bi(k) d(hk) , it follows from (3.34) that for any k 0 there exists an integer k k such that i

?

[Z (xi(k) )T rf (x(ik) )]T ((ik) d(hk) ) (d(hk) )T Bi(k) dh(k) 1 > 0: ( k ) ( k ) ( k ) ( k ) ( k ) ( k ) ( k ) k(Zi )T rf (xi )k ki dh k kBi dh kkdh k M i

i

i

i

i

i

This inequality contradicts (3.39). It clearly follows from Lemma 3.2 that Algorithm 2.1 generates an in nite sequence of nite sequences: (k ?1) ; x(k ) = x(0) ; x(1) ; :::; x(k ?1) ; :::; x(0) ; x(1) ; :::; x(k ) = x(0) ; :::g (1) fx(ik) g = fx(0) 2 2 2 1 1 ; x1 ; :::; x1 i i i i+1 2

1

1

i

Lemma 3.3. Let Assumption 3.1 hold and assume that there exists a positive constant M such that (3.34) is valid. Then (k ) )T rf (x(k ) )k + kc(x(k ) )k] = 0: lim [ k Z ( x i i i i!1 i

i

i

(3.40)

Proof. By Lemma 3.2, Algorithm 2.1 generates an in nite sequence of iterates satisfying (2.21) for values of = i converging to zero. But by our assumptions fj(ik) jg is bounded: the result follows.

Before we show that all limit points are stationary points we establish a required boundedness result. Lemma 3.4. Suppose that the assumptions in Lemma 3.3 hold. Then 1 kX i ?1 X i=1 k=0

[p (x(ik) ) ? p (x(ik+1) )] < +1: i

i

(3.41)

Proof. It follows from Assumption 3.1 that there exists a constant N1 > 0 such that for all integers i > 0 and 0 k ki , ji(k) j N1 . Thus (k) kc(x(0) i )k i i?1 N1 i?1 :

13

Notice that since x(ik ) = x(0) i+1 , it follows that i

1 kX i ?1 X i=1 k=0

1 X

[p (x(ik) ) ? p (xi(k+1) )] = i

i

i=1

(0) [p (x(0) i ) ? p (xi+1 )]

1 X

=

i=1

i

i

(0) [p (x(0) i ) ? p ? (xi )] i

i

1 X

+

1

(0) [p ? (x(0) i ) ? p (xi+1 )] i

1

i

i=1 1 X (0) [pi (x(0) i ) ? pi?1 (xi )] + N2 ; i=2

where N2 = p (x(0) 1 ) ? inf fp (x)g is a constant since p (x) is bounded below. It follows from Algorithm 2.1 and (2.22) that 1

i

i

2

1 1 (0) 2 i?1 2 4=5 2 (0) p (x(0) i ) ? p ? (xi ) = [ i ? i?1 ]kc(xi )k i N1 i?1 N1 : i

i

1

Therefore, (2.22) implies that 1 kX i ?1 X

1

X [pi (xi(k) ) ? pi (x(ik+1) )] N12 4i?=51 + N2 i=2 i=1 k=0

2 4=5 N 1 1 ? 14=5 + N2 :

We can now prove that every limit point of the sequence fx(ik) g is a stationary point of problem (1.1). Theorem 3.1. Suppose that the conditions in Assumptions 3.1 and 3.2 are satis ed. De ne the sequence fxk g to be the entire sequence, relabeled; i.e., (1) (k ?1) ; x(k ) = x(0) ; x(1) ; :::g: fxk g = fx(0) 1 ; x1 ; :::; x1 1 2 2 1

1

Then

lim kc(xk )k = 0;

(3.42)

lim kZ (xk )T rf (xk )k = 0:

(3.43)

k!1

and k!1

Proof. To prove (3.42), we de ne

(x) =

(

kc(x)k2

? c(x)T (x) if kc(x)k > i 0

otherwise.

Therefore, k 0. It is obvious from Lemma 3.1 and Lemma 2.2 that

i(k) p (x(ik) ) ? p (x(ik+) ) p (x(ik) ) ? p (x(ik+1) ) i

i

14

i

i

which, with Lemma 3.3, implies that ?1 ?1 1 kX 1 1 kX X X X

k

i(k) 1 [p (x(ik) ) ? p (x(ik+1) )] < +1: i

i

i=1 k=0

k=k0

i=1 k=0

i

i

Thus, limk!1 k = 0. Notice that since i ! 0 and ki k is bounded, it follows from the de nition of fdk g that (3.42) holds. To prove (3.43), note that from Lemma 3.3 it follows that for all 0 k ki ? 1 p (x(ik) ) ? p (x(ik+1) ) ! 0 as i ! 1: Similar to the proof of Lemma 3.2, it follows that for all 0 k ki ?[Z (x(ik))T rf (x(ik))]T ((ik) h) ! 0 as i ! 1: Assuming (3.43) does not hold, then there exist an > 0 and a subsequence ki 2 S such that kZ (xk )T rf (xk )k for ki 2 S ; then kk hk k ! 0 for ki 2 S : And since yk = k Bk +1 hk , similar to the proof of Lemma 3.2, it follows that for i

i

i

i

i

ki 2 S

i

i

i

i

i

T T ? [kZZ((xxk ))Trrff((xxk )])kk(k hhk k) krhp (u(k h(1k ))? ?!)rhp (u(0))k k k k k kyk k M kk hk k ! 0: = (1 ? !) (1 ? !) i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

Finally, if there are only a nite number of limit points to problem (1.1), then the sequence fx(ik) g converges. Theorem 3.2. Suppose that the conditions in Assumptions 3.1 and 3.2 are satis ed and that the sequence fxk g is the one described in Theorem 3.1. Moreover, suppose that every stationary point of (1.1) is isolated. Then lim x = x (3.44) k!1 k holds, where x is a stationary point of (1.1). Proof. Since fxk g is bounded, there exists a subsequence xk of fxk g such that lim x = x ; j !1 k j

j

where x 2 D is an accumulation point of fxk g. But by Lemma 3.5, (3.28) holds at x . That is, x is a stationary point of (1.1). Therefore, x is an isolated accumulation point of fxk g. Now we prove (3.44) by contradiction. Suppose fxk g does not converge. Since x is an isolated accumulation point of fxk g, there exists a subsequence fxk g of fxk g and an > 0 such that kxk +1 ? xk k (Lemma 4.10, [16]). But using Theorem 3.1 it follows from (3.41) that limk!1 khk k = 0 and limk!1 kvk k = 0. Hence, limk!1 kxk+1 ? xk k = 0. j

j

j

15

3.1. Remarks. In [4] Byrd and Nocedal propose algorithms based on reduced Hessian methods. Byrd and Nocedal prove that, for their algorithms, lim [kZ (xk )T rf (xk )k + kc(xk )k] = 0 (3.45) k!1

under an assumption stronger than condition (3.34). In particular, Byrd and Nocedal assume that there exists a > 0 such that eigmin(ZkT r2 L(x; k )Zk ) ; 8x in the line search segment: (3.46) Moreover, algorithms in [4] cannot preserve the positive de niteness of Bk without assumption (3.46). However, assumption (3.46) is rarely satis ed when xk is far away from the solution. Therefore, in contrast to Algorithm 2.1, algorithms in [4] may fail when applied to general nonlinear functions. 4. Numerical Results. In this section we present results of numerical experiments illustrating the performance of Algorithm 2.1. The problem set consists of a number of nonlinear equality constrained problems selected from the CUTE collection [3] and two problems generated by the authors. All numerical experiments discussed in this section were performed in MATLAB Version 4.1 on a Sun 4/670 workstation. problems BT6 BT11 DIPIGRI DTOC2 DTOC4 DTOC6 GENHS28 HS100 MWRIGHT ORTHREGA ORTHREGC ORTHREGD TEST1 TEST2

Table 1:

n 5 5 7 58 29 21 300 7 5 517 505 203 200 200

m nnz(A) constraints 2 5 nonlinear 3 8 nonlinear 4 19 nonlinear 36 144 nonlinear 18 65 nonlinear 10 31 nonlinear 298 894 linear 4 19 nonlinear 3 8 nonlinear 256 1792 nonlinear 250 1750 nonlinear 100 500 nonlinear 160 dense quadratic 160 dense nonlinear

Description of Problems

All test problems are brie y described in Table 1. Most problems in Table 1 (all except TEST1 and TEST2) are from the CUTE collection [3]. Problem TEST1 is minimization of a Rosenbrock function [9] with quadratic equality constraints, i.e., Pn?1 2 22 minimize i=1 [(1 ? xi ) + 100(xi+1 ? xi ) ] T T subject to ai x + :5x Mi x = 0; i = 1; : : : ; m; where ai 2