On the superlinear convergence of a trust region algorithm for ... - LSEC

Mathematical Programming 31 (1985) 269-285 North-Holland

ON

THE

SUPERLINEAR

CONVERGENCE

REGION ALGORITHM FOR N O N S M O O T H

OF

A TRUST

OPTIMIZATION

Y. Y U A N

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 9E~; England Received 3 November 1983 Revised manuscript received 15 September 1984 It is proved that the second order correction trust region algorithm of Fletcher [5] ensures superlinear convergence if some mild conditions are satisfied.

Key words: Trust Region Algorithms, Nonsmooth Optimization, Superlinear Convergence.

1. Introduction Fletcher [5] presents a trust region algorithm with a second order correction to solve the following composite optimization problem: rain 4,(x) ==f ( x ) + h( c(x) ),

(1.1)

x

w h e r e f ( x ) from R" to N and c(x) from ~" to N'" are twice continuously differentiable functions and h(c) from N'~ to N is a polyhedral convex function of the form h(e) = max (hive+ b~),

(1.2)

hi and b~ being given vectors and constants respectively. The algorithm is based on a model algorithm of Fletcher [3] and it is iterative. At the begining o f the k-th iteration, x (k~, A ~k and pk are available, where x (k! is an estimate of the solution o f (1.1), A ~k~ is an estimate of the Lagrangian multipliers at the solution and p~k~ is a trust region bound. The algorithm requires the solution of the following subproblem: min OIk)( d) ~-- q~k~(d) + h( c( x ~kl) + A(x(k')d)

(1.3)

subject to ildl I < p(k,,

(1.4)

where

q(k~( d) = f ( x ~k)) + v r f(xtk~)d + 89 v W~k) d, m

W l k ' = V2f(x~k') + 2 X',k'V2ci(x(k)), i=1

269

(1.5)

270

~': Yuan / Trust region algorithms

]]' II is any given norm and A = VTCCI~ ..... is the J a c o b i a n of c. Let d ~k~ be a solution of (1.3) and (1.4)~ then r ~k _ 6 ( x ' k ~ ) - - & ( x ~ k ' + d ~ ) ~//k j(0) _ ~, kl(d,k,)

(1.6)

is calculated, which is the ratio between the actual reduction and the predicted reduction of the objective function. On some iterations the algorithm also solves the following 'second order correction' subproblem: minlb'k'(d)=q'k'(d'k'+d)+hlc(x

k~+d'~)+A(x'k~)d)

(1.7)

subject to

IId'k' + dll 0 . 9 then p ~ l * : = 4 p ~ elsz-p~k'~:=2p ~" go to Step 9.

Y. Yuan / Trust region algorithms

271

Step 8. p~k~l):=pr~ Step 9. Set x ~k+,, := x' k ~+ d rk 7, generate A ~k+ ~~. end o f k-th iteration. The value of o~x in Step 5 is an estimate of the n u m b e r o~* which satisfies r c~*d ~k~) = mino~ ..... o.5 r 'k~+ ad ck') Fletcher [5] gives a specific choice of c~k, which does not require any line searches but only depends on the value of r ~k~ Further, he lets JL~k*'~ be X ~k~ in Step 6 and l~k.~:, be the multipliers from either the subproblem (1.3)-(1.4) or the subproblem (1.7)-(1.8) in Step 9. Fletcher [5] shows that if {x~} (k = 1,2 . . . . ) are all in a b o u n d e d set in ~" then {xk} is not b o u n d e d away from the Stationary Points, where a stationary point x* means a point at which

.f( x*) + h ( c(x*) ) ~./( x*) + VTf ( x*)d + h (c(x*) + VT c( x*)d) holds for all d c [~". The condition that {xk } is b o u n d e d is usually satisfied, specifically if x, is so chosen that

x:

r

0

for all d ~ G*

1.11)

where

dV(g*+(A*)VA)=O,d~O}

G*={d: AGDh* max

1.12)

w* = v:fIx*)+ ~ A;*V%Ix*).

i=1

Under the above assumptions, Fletcher [5] proves that, if the trust region b o u n d is inactive for all large k, then the algorithm converges quadratically. Yuan [16], however, gives examples o f trust region methods that converge only linearly, so it is important to investigate the effect of the trust region b o u n d when k is large. Since our result is stongly dependent on the condition A~k)-~A *, and since we are not able to prove that the original choice o f X ~k~ in Fletcher [5] ensures this condition, t h r o u g h o u t this paper we assume that the generation o f A ~k+~ gives the

272


limit: A'k'->A *.

(1.13)

Many suitable methods for estimating Lagrangian multipliers are known, for example Murray and Overton [10, 11]. The condition (1.13) admits many estimation techniques, such as A ~k+~'= argmin I[g' k ' - ( A 'k')~A[I.

(1.14)

Some lemmas are stated and the main theorem is proved in the following section. The proofs of the lemmas are given in the Appendix.

2. The result

In this section, we demonstrate superlinear convergence of the algorithm by showing that the trust region bound is eventually inactive. To make the proof of the main theorem straightforward, we give some lemmas without proofs, and only prove the main theorem. The proofs of the lemmas can be found in the Appendix. The following Lemma 2.1 is due to Fletcher [5], Lemma 2.2 is a generalization of Lemma 4.3 of Yuan [15], and kemma 2.3 is the main result which indicates that the trust region bound is bounded away from zero. Lemma 2.1 (Fletcher [5]). There exists a positive constant c, and a neighbourhood o f x*, such that Jbr all x in the neighbourhood the inequali o, ,/~(x)- ~b(x*) >/c, Ilx- x*l[ -~

(2.1)

holds.

Lemma 2.2. For any given ao~ (0, I), there exists a neighbourhood q f x * such that, Jot all x in the neighbourhood, oh(x)-

min [/(x)+g(x)rd+h(r Ilall--IIx-x~ll (2.2)

holds. Lemma 2.3. Let d ~k~ solve (1.3)-(1.4) and d ~k~ soh,e (l.7)-(1.8), then

(I) IId'~'ll=O(Hx'~}-x*]l):

(2) I[,/r

= O(l[x'k'- x*ll):

(3) a ~ , ' k ' ~> e=lld~'ll = Jar some positive constant c=; (4) r~f ' ~ 1 as k ~ + o e :


273

(5) For any subsequence {kJ}, (f r~k?]+o(lld'k'~ll 2) i=l

= oh(x*) +f(x~k?+ dr

-f(x*)

1

+ Z A*~Ec,(x'~"+d~")-c,(x*)]+o(lld'~"ll~) i=1

= ~b(x*) +~(x'k,'+ d (k?- x*) wW*(x(k,'+ d ' k p - x*) + o( IId~k'~llz) ~(~)(X*)

I ,T +~d W * d , HdC",~ll~+o(lld~)ll-~)

(A.29)

for all sufficiently large k. (A.29) contradicts (A.21) (using (A. 19)) since d'TW*d'> O, which proves that (l) is true.

278

}': Yuan / Trust region algorithms

The p r o o f of (2) is similar to that of ( 1 ). Assuming that (2) is not true, there exist kJ ( J = 1 , 2 , . . . ) such that

(A.30)

IIx'~,'- x*ll = o([b,i'~,'ll), and consequently, from (I),

(1.31)

ild,~.,ll = o(lld, k,'ll).

Recalling lid'S,'+ d~"l/i ~ ( x * ) + '4(d' )' W * d I IId'",'l12+

o(ll,b~,'ll-~),

(A.38)

for all large j, which contradicts (A.32) (using (A.30)). Therefore (2) is true. We now prove (3). By (1), there exists Mt/> I such that I[d'"'ll 89min[A ~, (A ~)2/Moa2] for all cr > 0, where Mo is an u p p e r b o u n d on

(A.40)

{[dTWtk~dl: I]dll

= 1}, and where

,'l,k~ = t P ~ ' ( 0 ) - min [ t 0 ' k ~ ( d ) - ' a ~ w ' ~ ' d ] Ildll~-

= ~ k ' ( 0 ) - rain Ildll~

[f(x~k~)+g~k'Vd+h(ctk~+A~k)d)].

(A.41)

Since for a general convex function F ( - ) , max[F(x) -

Ildll~--

F(x+d)]>~min[l , -~] x [lariat3 max [F(x)-F(x+d)]

holds for any c~,/3/>0, the convexity of ~ptg~(d)

Ila'k'll/llx '~'- x*ll],~tL,,,

A ffa'"~ll ~ mini1,

-89

gives

~*~j,

consequently L e m m a s 2.1, L e m m a 2.2 (using ao = ~), (A.39), and the fact that M o ~ I give the following inequality, which holds for all large k,

A kH~.~,,~ - ~Cl IIx'"'-x*ll IId'k'll.

(A.42)

It follows from (A.40) and (A.42) that there exists M 2 > 0 such that 4,~k'(0)-tP~k'(d'~')~>

aM,

min

k t, 2M,,M, IId'k'IIJ 11x' -x*lllla'~'ll (A.43)

for all k >/M2. Thus (3) follows from (A.43) and (A.39). We now prove (4). For all sufficiency large k, since

c(x'k'+ d 'k)) -

c(x 'k') - A' k'd'k' = O(

lid' ~'112),

(A.44)

and since R a n k ( A ck~) = m for all large k, there exists d~kl such that r

- d 'k ~) - r

tk)) -

AIk)dCgl= Atk)~l ik)

(A.45)

and

lld~k~ll= o(lld~k'll=).

(A.46)

/3k = maxEIla'k'll, IId'k'+ d'~ll],

(A.47)

Define

280

Y. Y u a n / Trust region algorithms

then by the definition (1.9) of re,,k', it follows that r(k~ 1 { &(Xlk~)--&(xlkl+dlk')+qlk~(d~k~)+h(c(x(g~+d(k~)) e =~o~k>

-

min

-

[q~k~(d~k~+d)+h(c(xCkl+d~k~)+A(k~d)]}

IId'~ '+ dill- ~

1 { ~, (x'k')+ q~kl(dtk,) -.l(x 'k ,+dlkl )

-~t~,k~

-

min

[q~kl(dlk'+d)+h(c'k~+Atkl(d+dlk~+~llkl))]}

![d ~~ ",- dll~ ~z

1 [ q~g~(d,k~ ) --At0'k/ LqS(X'k')+ -f(x'kl "-['-g ( k 'Ta~' J" ' --

rain I l a - , i ' " 'I1~ 13,,

_t_d,k!)

~'"'(d)+o(lld'':~ll")]. J

(A.48)

By the definition (1.5) of q~k~(.) we have that

qlkl( d~kl)-- f(x'k' + d'k~) = ~ 'd 'k''

tll

r=. a',"'V"C'?'d'k'+O(lld'~'ll~),

(A.49)

i-I

and from (1.10), (1.13), (A.45) and (A.46), it follows that m

g 'k'~'~'k'=- E a',"'vrc'/"d'k'+o(lld'"ll

2)

i--I m

= -

v A?,Ec,(x,~,+d ,k, ) - c,(x ,~,) -VTc',',d'",]+otlld'"'ll

~)

i--I

=-

~d 'k'~ E A ',k'V~-C',k'd'k' +O(lld'k'l]=)" i

(A.50)

[

Hence from (A.47)-(A.50) and (3), noticing /3k 1 ~,';~(o) -,P'-"(d `j') + o( II d'J' II -')

/> ,~'-~'(-a'"' - ,~"') - ~(-,?~')+ o(ll a*" I12).

(A.68)

Thus,

~u,(~(ii)