Steepest descent algorithm on orthogonal Stiefel manifolds

2 downloads 0 Views 257KB Size Report
Sep 19, 2017 - Embedding a gradient vector field on an orthogonal Stiefel manifold in ... Keywords: Steepest descent algorithm, Optimization, Constraint ...
arXiv:1709.06295v1 [math.OC] 19 Sep 2017

Steepest descent algorithm on orthogonal Stiefel manifolds Petre Birtea, Ioan Ca¸su and Dan Com˘anescu Department of Mathematics, West University of Timi¸soara Bd. V. Pˆ arvan, No 4, 300223 Timi¸soara, Romˆ ania [email protected], [email protected], [email protected]

Abstract Considering orthogonal Stiefel manifolds as constraint manifolds, we give an explicit description of a set of local coordinates that also generate a basis for the tangent space in any point of the orthogonal Stiefel manifolds. We show how this construction depends on the choice of a submatrix of full rank. Embedding a gradient vector field on an orthogonal Stiefel manifold in the ambient space, we give explicit necessary and sufficient conditions for a critical point of a cost function defined on such manifolds. We explicitly describe the steepest descent algorithm on the orthogonal Stiefel manifold using the ambient coordinates and not the local coordinates of the manifold. We point out the dependence of the recurrence sequence that defines the algorithm on the choice of a full rank submatrix. We illustrate the algorithm in the case of Brockett cost functions.

MSC: 53Bxx, 65Kxx, 90Cxx Keywords: Steepest descent algorithm, Optimization, Constraint manifold, Orthogonal Stiefel manifold, Brockett cost function.

1

Introduction

In Section 2 we construct an atlas for the orthogonal Stiefel manifolds Stnp “ tU P Mnˆp pRq | U T U “ Ip u following an idea from [14]. The local charts that we introduce crucially depend on the choice of a full rank submatrix of the elements in the orthogonal Stiefel manifolds. More precisely, for U P Stnp , if Ip is the set of row indexes that form a full rank submatrix of the matrix U , then we define the vector subspace WIp “ tΩ “ rωij s P Skewnˆn pRq | ωij “ 0 for all i R Ip and j R Ip u. The local charts are defined by ϕU : WIp Ñ Stnp , ϕU pΩq :“ CpΩqU , where CpΩq “ pIn ` ΩqpIn ´ Ωq´1 is the Cayley transform. These local charts provide us with a basis for the tangent spaces to the orthogonal Stiefel manifolds. In Section 3 we present necessary and sufficient conditions for a critical point of a cost function defined on an orthogonal Stiefel manifold using the embedded vector field method [5], [4], and [6]. We describe necessary and sufficient conditions for critical points in the case of Procrustes and Penrose regression cost functions, sums of heterogeneous quadratic forms, and Brockett cost functions. We also discuss our findings in comparison with existing results in the literature [25], [11], and [8]. In the last section we give an explicit description of the steepest descent algorithm taking into account the specificity of the orthogonal Stiefel manifold. On a general Riemannian manifold pS, gS q the iterative scheme of steepest descent algorithm is given by r k qq, xk`1 “ Rxk p´λk ∇gS Gpx

r : S Ñ R is the cost function that we want to minimize, R : T S Ñ S is a smooth retraction where G and λk P R is a chosen step length. For the case of orthogonal Stiefel manifolds, we write the vector 1

r k q as a vector in the ambient space Txk M using the embedded gradient vector field BGpxk q ∇gS Gpx r k q “ BGpxk q. The explicit description of the vector BGpUk q on an (see [4] and [6]), i.e., ∇gS Gpx orthogonal Stiefel manifold depends on the chosen basis for TUk Stnp , which in turn depends on the chosen full rank submatrix of Uk . In order to write the vector ´λk BGpUk q as a tangent vector in TUk Stnp we have to solve the matrix equation ´λk BGpUk q “ Ωk Uk for the unknown skew-symmetric matrix Ωk P WIp pUk q . Once we have solved for Ωk , we construct the next term of the iterative sequence ` ˘` ˘´1 as Uk`1 “ In ` 12 Ωk In ´ 12 Ωk Uk . Moreover, using an appropriate permutation matrix for each step of the algorithm we give an explicit elegant solution of the matrix equation ´λk BGpUk q “ Ωk Uk , which makes the steepest descent algorithm more implementable. We exemplify the form of the steepest descent algorithm that we have constructed on orthogonal Stiefel manifolds for the case of two Brockett cost functions. Another method to construct numerical algorithms in the presence of orthogonal constraints of the Stiefel manifolds is presented in [17] and the authors use the so called Alternating Direction Method of Multipliers (ADMM), see [9] and [26] for a general description. ADMM is a variant of the Augmented Lagrangian Method of Multipliers introduced in [16], see also [15] for a historical presentation of the method. A deep convergence result for the extension of ADMM to multi-block convex minimization problems is proved in [10].

2

Local charts on the orthogonal Stiefel manifolds

In this section we will construct a local chart around every point U P Stnp and a basis for the tangent space TU Stnp . We will follow the idea presented in [14], where the authors have constructed a local “ ‰T chart around points closed to Ip Opn´pqˆp P Stnp . This corresponds to the particular situation when the full rank submatrix of the point U P Stnp is formed with the first p rows. For a general U P Stnp a modification of the construction presented in [14] is necessary. Let U P Stnp and 1 ď i1 ă ... ă ip ď n be the indexes of the rows that form a full rank submatrix ¯ of U . We denote Ip “ ti1 , ..., ip u. Let Skewnˆn pRq be the npn´1q -dimensional vectorial space of the U 2 -dimensional vectorial real skew-symmetric n ˆ n matrices. We introduce the following np ´ ppp`1q 2 subspace of Skewnˆn pRq: , $ ˇ ˇ . & ÿ ÿ ˇ ωij pei b ej ´ ej b ei q , WIp :“ Ω P Skewnˆn pRq ˇˇ Ω “ ωij pei b ej ´ ej b ei q ` % ˇ iPIp ; jRIp iăj; i,jPIp

where the vectors e1 , ... ,en form the canonical basis in the Euclidean space Rn . The n ˆ n matrix ei b ej has 1 on the i-th row and j-th column and 0 on all remaining positions. An equivalent description of the vectorial subspace WIp is given by WIp “ tΩ “ rωij s P Skewnˆn pRq | ωij “ 0 for all i R Ip and j R Ip u . Around the point U P Stnp chosen above we construct the local chart ϕU : WIp Ñ Stnp , ϕU pΩq :“ CpΩqU,

(2.1)

where CpΩq “ pIn ` ΩqpIn ´ Ωq´1 is the Cayley transform. We notice that ϕU is a smooth map with ϕU p0q “ U . In order to prove that ϕU is a local chart it is sufficient to prove that ϕU is locally injective around 0 P WIp , which in turn is implied by injectivity of the linear map dϕU p0q. The later BϕU condition is equivalent with the vectors p0q being linearly independent. Bωij In what follows we use the notation: Λij :“ ei b ej ´ ej b ei P Mnˆn pRq, for any 1 ď i, j ď n, i ‰ j. 2

An easy computation shows that (see [21]∗ ) ` ˘ BϕU pΩq “ Λij pIn ´ Ωq´1 ` pIn ` ΩqpIn ´ Ωq´1 Λij pIn ´ Ωq´1 U Bωij ` ˘ “ In ` pIn ` ΩqpIn ´ Ωq´1 Λij pIn ´ Ωq´1 U ` ˘ “ pIn ´ ΩqpIn ´ Ωq´1 ` pIn ` ΩqpIn ´ Ωq´1 Λij pIn ´ Ωq´1 U “2pIn ´ Ωq´1 Λij pIn ´ Ωq´1 U.

Consequently, we have

BϕU p0q “ 2Λij U. Bωij BϕU p0q, we consider the equation Bωij

For proving the linear independence of the vectors ÿ

αij

iăj; i,jPIp

which is equivalent with† ÿ αij Λij U ` Onˆp “ iăj; i,jPIp



˚ βrs Λrs ˚ ˝

ÿ

˚ αij pei b ej ´ ej b ei q ˚ ˝

ÿ

˚ βrs per b es ´ es b er q ˚ ˝

iăj; i,jPIp

`

rPIp ; sRIp



βrs Λrs U

ÿ

ÿ

¨

p

rPIp ; sRIp

˚ αij Λij ˚ ˝

rPIp ; sRIp



p

ÿ

iăj; i,jPIp

`

¨

ÿ

ÿ BϕU BϕU βrs p0q ` p0q “ Onˆp , Bωij Bω rs rPI ; sRI

ÿ

ukb ek b fb `

ÿ

ukb ek b fb `

ÿ

‹ uqa eq b fa ‹ ‚`

ÿ

‹ uqa eq b fa ‹ ‚

qPIp aPt1,...,pu

kRIp bPt1,...,pu

kRIp bPt1,...,pu

qPIp aPt1,...,pu

¨

ÿ

ukb ek b fb `

¨

ÿ

ukb ek b fb `

kRIp bPt1,...,pu

ÿ

iăj; i,jPIp aPt1,...,pu

ÿ

αij ukb pδjk ei b fb ´ δik ej b fb q `

˛

ÿ

‹ uqa eq b fa ‹ ‚`

ÿ

‹ uqa eq b fa ‹ ‚

qPIp aPt1,...,pu

kRIp bPt1,...,pu

qPIp aPt1,...,pu

˛

αij uqa pδjq ei b fa ´ δiq ej b fa q `

iăj; i,jPIp qPIp ; aPt1,...,pu

βrs ukb pδsk er b fb ´ δrk es b fb q `

rPIp ; sRIp kRIp ; bPt1,...,pu



˛

ÿ

iăj; i,jPIp kRIp ; bPt1,...,pu

`

˛

ÿ

αij puja ei b fa ´ uia ej b fa q `

ÿ

βrs uqa pδsq er b fa ´ δrq es b fa q

rPIp ; sRIp qPIp ; aPt1,...,pu

βrs pusa er b fa ´ ura es b fa q .

rPIp ; sRIp aPt1,...,pu ´1

have used the following formula for the derivative of the inverse of a matrix: BA “ ´A´1 BA A´1 Bx Bx † The vectors f , ... ,f form the canonical basis in the Euclidean space Rp . We use the rule for matrix multiplication p 1 pu b v‘ q ¨ pvd b wq “ δ‘,d u b w, where v‘ and vd belong to the same vectorial space. ∗ We

3

Decomposing the above matrix equality on the subspaces Spantes bfa | s R Ip u and Spantel bfa | l P Ip u, we have ÿ ÿ βrs usa er b fa “ Onˆp ; (2.2) αij puja ei b fa ´ uia ej b fa q ` iăj; i,jPIp aPt1,...,pu

ÿ

sRIp aPt1,...,pu

¨ ˝

ÿ

rPIp

˛

rPIp ; sRIp aPt1,...,pu

βrs ura ‚es b fa “ Onˆp . ÿ

Considering now the matrix rβs P Mpˆpn´pq pRq, rβs :“

(2.3)

βτ ´1 paqs fa b hσpsq ‡ , we can

aPt1,...,pu; sRIp

rewrite the equality (2.3) in a condensed matrix form ¯ “ Opn´pqˆp . rβsT U Indeed, we have ¨

˚ ¯ “˚ rβsT U ˝

˛¨

ÿ

sRIp bPt1,...,pu



ÿ

ÿ

‹˚ ˚ βτ ´1 pbqs hσpsq b fb ‹ ‚˝

˛

rPIp aPt1,...,pu

βτ ´1 pbqs ura δbτ prq hσpsq b fa

‹ ura fτ prq b fa ‹ ‚

rPIp ; sRIp a,bPt1,...,pu



ÿ

βrs ura hσpsq b fa

rPIp ; sRIp aPt1,...,pu



ÿ

sRIp aPt1,...,pu

¨ ˝

“Opn´pqˆp .

ÿ

rPIp

˛

βrs ura ‚hσpsq b fa

¯ is invertible, we obtain that rβs “ Opˆpn´pq , which implies that βrs “ 0, for all Since the matrix U r P Ip and all s R Ip . Substituting these last equalities in (2.2), it simplifies to ÿ αij puja ei b fa ´ uia ej b fa q . (2.4) Onˆp “ iăj; i,jPIp aPt1,...,pu

We introduce now the matrix rαs P Mpˆp pRq, rαs :“

ř

iăj; i,jPIp

˘ ` αij fτ piq b fτ pjq ´ fτ pjq b fτ piq . We

‡ We relabel the set t1, ..., nuzI using the unique strictly increasing function σ : t1, ..., nuzI Ñ t1, ..., n ´ pu. p p Analogously, we relabel the set Ip using the unique strictly increasing function τ : Ip Ñ t1, ..., pu. The vectors n´p h1 , ..., hn´p form the canonical basis of R .

4

have the following computations:

¯“ rαsU

ÿ

iăj; i,jPIp



ÿ

¨

` ˘˚ αij fτ piq b fτ pjq ´ fτ pjq b fτ piq ˚ ˝

kPIp aPt1,...,pu

iăj; i,jPIp kPIp ; aPt1,...,pu



ÿ

iăj; i,jPIp aPt1,...,pu

ÿ

` ˘ αij uka δjk fτ piq b fa ´ δik fτ pjq b fa

˛

‹ uka fτ pkq b fa ‹ ‚

` ˘ αij uja fτ piq b fa ´ uia fτ pjq b fa .

¯ “ Opˆp , By selecting from the matrix equality (2.4) the rows with indexes in Ip , we obtain that rαsU ¯ and since the matrix U is invertible it follows that rαs “ Opˆp and therefore αij “ 0 for all i, j P Ip with i ă j. BϕU Thus, we have proved the linear independence of the vectors p0q, that also form a basis for Bωij n the tangent space TU Stp . Proposition 2.1. Let U P Stnp and 1 ď i1 ă ... ă ip ď n be the indexes of the rows that form a full ¯ of U . Then the vectors: rank submatrix U Γi1 j 1 pU q :“ Λi1 j 1 U, i1 , j 1 P Ip , i1 ă j 1 , Γi2 j 2 pU q :“ Λi2 j 2 U, i2 P Ip , j 2 R Ip , form a basis for the tangent space TU Stnp . As a consequence, we have the following description for the tangent space to an orthogonal Stiefel manifold. Theorem 2.2. Let U P Stnp . Then TU Stnp “ tΩU | Ω “ rωij s P Skewnˆn pRq, where ωij “ 0 for all i R Ip and j R Ip u .

3

Critical points of smooth functions defined on orthogonal Stiefel manifolds

In this section, we give necessary and sufficient conditions for critical points of a smooth cost function defined on orthogonal Stiefel manifolds using the embedded vector field method introduced and used in [5], [4], and [6]. We apply these results to well-known cost functions as Procrustes and Penrose regression cost functions, sums of heterogeneous quadratic forms, and Brockett cost functions. We also discuss our results in comparison with previous results existing in the literature. For a matrix U P Mnˆp pRq, we denote by u1 , ..., up P Rn the vectors formed with the columns of the matrix U and consequently, U has the form U “ ru1 , ..., up s. If U P Stnp “ tU P Mnˆp pRq | U T U “ Ip u, then the vectors u1 , ..., up P Rn are orthonormal. We identify Mnˆp pRq with Rnp using the not isomorphism vec : Mnˆp Ñ Rnp defined by vecpU q “ u :“ puT1 , ..., uTp q. np The constraint functions Faa , Fbc : R Ñ R that describe the Stiefel manifold as a preimage of a regular value are given by: 1 ||ua ||2 , 1 ď a ď p, 2 Fbc puq “ hub , uc i , 1 ď b ă c ď p.

Faa puq “

5

(3.1) (3.2)

More precisely, we have F : Rnp Ñ R

ppp`1q 2

, F :“ p. . . , Faa , . . . , Fbc , . . . q, ˆ ˙ 1 Stnp » F´1 . . . , , . . . , 0, . . . Ă Rnp . 2

r : Stn Ñ R. In what follows we will address the problem of Consider a smooth cost function G p r defined on the Stiefel manifold. In order to solve this finding the critical points of the cost function G r “ G| n and we use the problem, we consider a smooth extension G : Rnp Ñ R of the cost function G Stp embedded gradient vector field method presented in [5], [4], and [6]. The embedded gradient vector field is defined on the open set formed with the regular leaves of the constraint function and it has the formula: ÿ ÿ σaa puq∇Faa puq ´ σbc puq∇Fbc puq, BGpuq “ ∇Gpuq ´ 1ďaďp

1ďbăcďp

where σaa , σbc are the Lagrange multiplier functions. r proved in [5] and [4], we have the following necessary and Using the property pBGq|Stnp “ ∇ Stnp G gind

r sufficient conditions for a critical point of the cost function G.

r if and only if BGpuq “ 0. Theorem 3.1. An element U P Stnp is a critical point of the cost function G

In the case of orthogonal constraints the Lagrange multiplier functions, see [5], are given by the formulas:   BG puq, ua ; σaa puq “ h∇Gpuq, ∇Faa puqi “ Bua (3.3) ˆ   ˙ BG 1 BG puq, ub ` puq, uc . σbc puq “ h∇Gpuq, ∇Fbc puqi “ 2 Buc Bub     BG BG Note that in general puq, ub ‰ puq, uc . Buc Bub r then σaa puq, σbc puq become the classical Lagrange multipliers. The If U is a critical point of G, embedded gradient vector field is a more explicit form of the equivalent projected gradient vector field described in [23]. The solutions of the equation BGpuq “ 0 are critical points for the function G restricted to regular leaves of the constraint functions. Consequently, using again the identification r “ G| n if and only if vecpU q “ u, a matrix U P Mnˆp pRq is a critical point for the cost function G St p

BGpuq “ 0 and U T U “ Ip , or equivalently, $ ř &∇Gpuq ´ σaa puq∇Faa puq ´ 1ďaďp

%U T U “ I . p

ř

σbc puq∇Fbc puq “ 0

1ďbăcďp

(3.4)

Next, we give necessary and sufficient conditions for a critical point. r “ G| n if and Theorem 3.2. A matrix U P Mnˆp pRq is a critical point for the cost function G Stp only if the following conditions are simultaneously satisfied:     BG BG (i) puq, ub “ puq, uc , for all 1 ď b ă c ď p; Buc Bub (ii)

BG puq P Spantu1 , ..., up u, for all 1 ď a ď p; Bua

(iii) U T U “ Ip , where u “ vecpU q.

6

Proof. First, we will prove that the conditions piq, piiq and piiiq of the Theorem are necessary. By straightforward computations, the hypothesis BGpuq “ 0 is equivalent with the following system of equations: $ p a ÿ ÿ BG ’ ’ puq ´ σad puqud “ 0, @a P t1, ..., p ´ 1u σ puqu ´ ’ da d & Bua d“a`1 d“1 (3.5) p ÿ ’ BG ’ ’ σdp puqud “ 0, puq ´ % Bup d“1

where the Lagrange multiplier functions σad are given by (3.3). From the above equalities it follows piiq. From piiq and the hypothesis that U P Stnp we have:  p  ÿ BG BG puq “ puq, ud ud . Bua Bua d“1

(3.6)

Substituting (3.6) into (3.5) and using the linear independence of the vectors u1 , ..., up formed with the columns of the matrix U , we obtain piq. For sufficiency, we solve the set of equations piq and piiq. This implies that p ÿ BG λd ud . puq “ Bua d“1

Among these solutions, we choose the ones that belong to Stnp , and consequently for those solutions we obtain   BG λd “ puq, ud . Bua Using piq and the fact that

 p  ÿ BG BG puq “ puq, ud ud . Bua Bua d“1

we obtain the desired equality BGpuq “ 0.

If the condition piiiq of the above theorem is satisfied, then the condition piiq can be replaced with pIn ´ U U T q

BG puq “ 0, for all 1 ď a ď p, Bua

(3.7)

where the matrix In ´ U U T is associated to the orthogonal projection in Rn onto the subspace normal to Spantu1 , ..., up u. The necessity of conditions piq and piiq have been previously discovered in [11] in the context of orthogonal Procrustes problem and Penrose regression problem. We will present the details later in the paper. r “ G| n The above Theorem shows that, in order to find the critical points of the cost function G Stp it is necessary and sufficient to solve the system of equations piq and piiq and among them choose the ones that belong to the orthogonal Stiefel manifold Stnp . The necessary and sufficient conditions of the above theorem are natural for orthogonal Stiefel manifolds in the sense that orthogonal Stiefel manifolds are in-between the sphere (p “ 1) and the orthogonal group (p “ n). In the case when p “ 1, we obtain that the necessary and sufficient BG puq “ λu, λ P R for a critical point of a conditions of Theorem 3.2 reduce to the radial condition Bu function restricted to a sphere. When p “ n, we are in the case of orthogonal group SOpnq and the necessary and sufficient conditions of Theorem 3.2 reduce to the symmetric condition piq. In order to formulate the necessary and sufficient conditions of Theorem 3.2 in a matrix form, we need to write the embedded vector field BG in a matrix form. For the following considerations we make the notations ∇GpU q :“ vec´1 p∇Gpuqq P Mnˆp pRq, 7

BGpU q :“ vec´1 pBGpuqq P Mnˆp pRq. We introduce the symmetric matrix ΣpU q :“ rσbc puqs P Mpˆp pRq, where we define σcb puq :“ σbc puq for 1 ď b ă c ď p. For the particular case of orthogonal Stiefel manifold, by a straightforward computation using (3.3), we have ΣpU q “

˘ 1` ∇GpU qT U ` U T ∇GpU q . 2

(3.8)

Computing vec´1 p∇Faa puqq and vec´1 p∇Fbc puqq, the matrix form of the embedded gradient vector field is given by BGpU q “ ∇GpU q ´ U ΣpU q. (3.9) From the geometrical point of view, the vector ∇GpU q P Mnˆp pRq does not in general belong to the tangent space TU Stnp . The vector ´U ΣpU q is a correcting term so that the vector BGpU q P TU Stnp for every U P Stnp . This has been proved in [5]. The matrix form of the system of equations (3.4) is given by # ∇GpU q ´ U ΣpU q “ 0 (3.10) U T U “ Ip . The conditions of Theorem 3.2 can be written in matrix form in the following way. r “ G| n if and only Theorem 3.3. A matrix U P Mnˆp pRq is a critical point for the cost function G Stp if the following conditions are simultaneously satisfied: piq U T ∇GpU q “ ∇GpU qT U ;

piiq ∇GpU q “ U U T ∇GpU q;

(3.11)

piiiq U T U “ Ip . Using the classical Lagrange multiplier approach for constraint optimization problems, in [25] the r “ G| n are equations that have to be solved in order to find the critical points of the cost function G St p

# ∇GpU q ´ U ∇GpU qT U “ Onˆp U T U “ Ip ,

(3.12)

which is an equivalent matrix form for the system of equations (3.11) in the case of orthogonal Stiefel manifold. Note that the vector field ∇GpU q ´ U ∇GpU qT U ‰ BGpU q when U P STpn is not a critical point of G. Also, in the same paper [25] the following equivalent necessary and sufficient conditions for critical points have been obtained: # ∇GpU qU T ´ U ∇GpU qT “ Onˆn (3.13) U T U “ Ip . We give a short proof of the equivalence between the equations (3.12) and (3.13). Multiplying (3.13) to the right with the matrix U and using the Stiefel condition UT U “ (3.12). Now assume ` T ˘ Ip , we obtain T T T that (3.12) holds, i.e. ∇GpU q “ U ∇GpU q U`“ U U ∇GpU qU U “ U U ∇GpU q. Multiplying to ˘ the right with U T we obtain ∇GpU qU T “ U U T ∇GpU qU T “ U ∇GpU qT , which is exactly the first equation of (3.13). Consequently, necessary and sufficient conditions of Theorem 3.3 are equivalent with necessary and sufficient conditions (3.13) obtained in [25]. The difference between the two sets of necessary and sufficient conditions is that in Theorem 3.3 the equations imply natural relations involving the 8

columns of the matrix U , i.e. components of the orthonormal vectors u1 , ..., up , while (3.13) involves equations containing the lines of the matrix U . Using the particularity of the Stiefel constraints, in [22] are given other equivalent conditions with those from (3.11). Critical points for orthonormal Procrustes cost function. We consider the following optimization problem: Minimize ||AU ´ B||2 , (3.14) U T U “ Ip where A P Mmˆn pRq, B P Mmˆp pRq, U P Mnˆp pRq, and || ¨ || is the Frobenius norm. The cost r : Stnp Ñ R and its natural extension function associated to this optimization problem is given by G np G : R Ñ R, Gpuq “

1 1 1 ||AU ´ B||2 “ trpU T AT AU q ´ trpU T AT Bq ` trpB T Bq. 2 2 2

In the following, we will give the specifics of the necessary and sufficient conditions of Theorem 3.3 in the case of Procrustes cost function. By a straightforward computation we have that ∇GpU q “ AT AU ´ AT B. Consequently, the condition piq is equivalent with the symmetry of the matrix U T AT AU ´ B T AU . As the matrix U T AT AU is symmetric, we obtain that condition piq of Theorem 3.3 is equivalent with the symmetry of the matrix B T AU . This condition has been previously obtained in [11]. The condition piiq of the Theorem 3.3 is equivalent with pIn ´ U U T qpAT AU ´ AT Bq “ 0, condition also previously obtained in [11]. The following result shows that the necessary conditions presented in [11] for the Procrustes cost function are also sufficient conditions. Theorem 3.4. A matrix U P Stnp is a critical point for the Procrustes cost function if and only if: (i) the matrix B T AU is symmetric; (ii) pIn ´ U U T qpAT AU ´ AT Bq “ 0. A different approach for studying critical points of Procrustes problem using normal and secular equations has been undertaken in [13]. Critical points for Penrose regression cost function. The Penrose regression problem is the following optimization problem: Minimize ||AU C ´ B||2 , (3.15) U T U “ Ip where A P Mmˆn pRq, B P Mmˆq pRq, C P Mpˆq pRq, U P Mnˆp pRq, and || ¨ || is the Frobenius norm. r : Stnp Ñ R and its natural The cost function associated to this optimization problem is given by G np extension G : R Ñ R, Gpuq “

1 1 1 ||AU C ´ B||2 “ trpC T U T AT AU Cq ´ trpC T U T AT Bq ` trpB T Bq. 2 2 2

By a straightforward computation, we have that ∇GpU q “ AT pAU C ´ BqC T . The necessary and sufficient conditions of Theorem 3.3 for critical points become in this case: Theorem 3.5. A matrix U P Stnp is a critical point for the Penrose regression cost function if and only if: (i) the matrix CpAU C ´ BqT AU is symmetric; 9

(ii) pIn ´ U U T qAT pAU C ´ BqC T “ 0. These conditions have been previously found in [11] as necessary conditions for critical points of the Penrose regression cost function. Critical points for sums of heterogeneous quadratic forms. Consider the following optimization problem on orthogonal Stiefel manifold Stnp , extensively studied in [3] and [22]: Minimize U T U “ Ip

p ř

i“1

uTi Ai ui

(3.16)

,

where Ai are n ˆ n symmetric matrices and ui are the column vectors of the the matrix U P Stnp . By a straightforward computation, we have that ∇GpU q “ rA1 u1 , ..., Ap up s . The necessary and sufficient conditions of Theorem 3.3 for critical points become in this case: Theorem 3.6. A matrix U P Stnp is a critical point for the cost function T

T

(i) U ApU q “ ApU q U ;

p ř

i“1

uTi Ai ui if and only if:

(ii) ApU q “ U U T ApU q, where we have made the notation ApU q “ rA1 u1 , ..., Ap up s. The above necessary and sufficient conditions are the same conditions discovered in [8] (eq. (3.3) and (3.4) from the proof of the Theorem 3.1). p ř uTi Ai ui is when Ai “ µi A, where 0 ď µ1 ď ... ď µp and A particular case of the cost function i“1

A is a n ˆ n symmetric matrix. Thus, we obtain the Brockett cost function GB “

p ÿ

µi uTi Aui .

i“1

For this cost function the conditions piq and piiq of Theorem 3.2 become: (i) pµb ´ µc quTb Auc “ 0, @b, c P t1, ..., pu, b ‰ c; (ii) µa Aua P Spantu1 , ..., up u, @a P t1, ..., pu. Depending on the parameters µ1 , ..., µp the above two conditions can be further explained. I. If the parameters µ1 , ..., µp are pairwise distinct and strictly positive, then U P Stnp is a critical point of the Brockett cost function if and only if every column vector of the matrix U is an eigenvector of the matrix A. II. For the case when among the strictly positive parameters µ1 , ..., µp we have multiplicity, the set of critical points becomes larger. More precisely, we have 0 ă µ1 “ ... “ µs1 ă µs1 `1 “ ... “ µs1 `s2 ă ¨ ¨ ¨ ă µs1 `...`sq´1 `1 “ ... “ µs1 `...`sq , where s1 ě 1, ..., sq ě 1 and s1 ` ... ` sq “ p. We denote by J1 “ t1, ..., s1 u, ...,Jq “ ts1 ` s2 ` ... ` sq´1 ` 1, ..., pu. By an elementary computation, the matrix U P Stnp is a critical point of the Brockett cost function if and only if Spantuk | k P Jl u is an invariant subspace of A for all l P t1, ..., qu. We illustrate the above results on a simple case of a Brockett cost function defined on St32 . Assume that the matrix A is in diagonal form with distinct entries. If 0 ă µ1 ă µ2 , then we are in the case I and U “ ru1 , u2 s is a critical point for the Brockett cost function if and only if u1 “ ˘ei and u2 “ ˘ej with i, j P t1, 2, 3u and i ‰ j. If 0 ă µ1 “ µ2 , then we are in the case II and U “ ru1 , u2 s is a critical point for the Brockett cost function if and only if the set tu1 , u2 u is an orthonormal frame of any coordinate plane Spantei , ej u with i ‰ j. 10

4

Steepest descent algorithm

r : S Ñ R be a smooth cost function. The iterative Let pS, gS q be a smooth Riemannian manifold and G scheme of steepest descent is given by r k qq, xk`1 “ Rxk p´λk ∇gS Gpx

(4.1)

where R : T S Ñ S is a smooth retraction, notion introduced in [24] (see also [2]), and λk P R is a scalar called step length. For the case when the manifold S is the preimage of a regular value of a r k q “ BGpxk q, where G is an extension to the ambient set of constraint functions we have that ∇gS Gpx r BG is the embedded gradient vector field introduced in [5], and gS is space M of the cost function G, the induced Riemannian metric on S by the ambient space M . The vector BGpxk q is written in the coordinates of the ambient space M , but it belongs to the tangent space Txk S viewed as a subspace of Txk M . If the manifold S is locally diffeomorphic with a manifold N via a local diffeomorphism f and we know a retraction RN for the manifold N , then Rf “ f ˝ RN ˝ pDf q´1 is a retraction for the manifold S. A particular case of the above construction is when we replace the local diffeomorphism with local charts. For the case of an orthogonal Stiefel manifold we will use the local charts ϕU defined by (2.1). More precisely, the retraction induced by a local chart ϕU is given by ˆ ˙ 1 ϕU Ω U, (4.2) R pΩU q “ C 2 where Ω P WIp .

Steepest descent algorithm on orthogonal Stiefel manifolds: ˘ ` 1. For a n ˆ p matrix U construct the vector u :“ vecpU q “ uT1 , ..., uTp P Rnp .

r : Stn Ñ R. 2. Consider a smooth prolongation G : Rnp Ñ R of the cost function G p 3. Compute ∇Gpuq and construct the n ˆ p matrix ∇GpU q “ vec´1 p∇Gpuqq.

4. Compute the Lagrange multiplier functions ˆ     ˙ BG 1 BG BG σaa puq “ puq, ua , σbc puq “ puq, ub ` puq, uc . Bua 2 Buc Bub 5. Construct the symmetric p ˆ p matrix ΣpU q “ rσbc puqs. 6. Compute the n ˆ p matrix BGpU q “ ∇GpU q ´ U ΣpU q. 7. Input U0 P Stnp and k “ 0. 8. repeat ‚ Compute the n ˆ p matrix BGpUk q. ‚ Determine a set Ip pUk q containing the indexes of the rows that form a full rank submatrix of Uk . ‚ Construct a generic n ˆ n skew-symmetric matrix Ωk “ rωij s in WIp pUk q “ tΩ “ rωij s P Skewnˆn pRq | ωij “ 0 for all i R Ip pUk q, j R Ip pUk q u .

11

‚ Choose a length step λk P R and solve the matrix equation of np ´

ppp`1q 2

variables ωij

´λk BGpUk q “ Ωk Uk . ‚ Using the solution Ωk of the above equation, compute Uk`1

ˆ ˙ˆ ˙´1 1 1 “ In ` Ωk Uk . In ´ Ωk 2 2

r until Uk`1 sufficiently minimizes G. The matrix equation ´ λk BGpUk q “ Ωk Uk

(4.3)

TUk Stnp

has a unique solution since ´λk BGpUk q P and this tangent vector is uniquely written as ´λk BGpUk q “ ωi1 j 1 Λi1 j 1 Uk ` ωi2 j 2 Λi2 j 2 Uk , see Proposition 2.1. Next we will describe a method for finding the explicit solution of the equation (4.3). Once we have computed Uk , we choose a set of indexes Ip pUk q “ ti1 , ..., ip u that give a full rank submatrix of Uk (the set of indexes Ip pUk q is not in general unique). We consider a permutation νk : t1, ..., nu Ñ t1, ..., nu such that νk pi1 q “ 1, ..., νk pip q “ p and we introduce the permutation matrix » fi eν ´1 p1q — k. ffi ffi P Mnˆn pRq. .. (4.4) Pν ´1 “ — – fl k eν ´1 pnq k

We make the following notations

rk :“ P ´1 ¨ Uk and BGpU Ă k q :“ P ´1 ¨ BGpUk q. U ν ν k

k

rk P Stn and it has the form The matrix U p

rk “ U

„ s  Uk , s U k

¯k P Mpˆp pRq is an invertible matrix. According to Theorem 2.2, we have that the tangent where U rk , where rkU vectors in TUrk Stnp are of the form Ω rk “ Ω

«

sk Ω sT ´Ω k

s Ω k O



.

rk has the equivalent form Ă kq “ Ω rkU The equation ´λk BGpU ff„  « „ Ď s sk  sk U BGpUk q Ω Ω k , ´λk Ě “ s sT O U BGpUk q ´Ω k k

where we have denoted

Ă k q :“ BGpU

 „ Ď BGpUk q Ě q . BGpU k

12

By a straightforward computation, the above system has the solution ¯ ´ # Ě qT U s U s ´1 Ď kq ` U s ´T BGpU s k “ ´λk BGpU Ω k k k k ´T Ě s “λ U T s BGpU q . Ω k k k

(4.5)

k

Next, we will prove that the skew-symmetric matrix

r k ¨ P ´1 Ωk :“ PνT´1 ¨ Ω ν k

k

is the unique solution of equation (4.3). Indeed, ´ ¯ r k P ´1 P T´1 U rk rk “ P T´1 Ω rkU Ωk Uk “ PνT´1 Ω ν νk νk k k ¯ ´ Ă k q “ ´λk P T´1 P ´1 ¨ BGpUk q “ PνT´1 ´λk BGpU ν ν k

k

k

“ ´λk BGpUk q,

where we have used the property that the permutation matrices are invertible and their inverse is equal with the transpose matrix. According to (4.2), we obtain after k iterations ˆ ˙ˆ ˙´1 1 1 In ` Ωk Uk In ´ Ωk 2 2 ˆ ˙ˆ ˙´1 1 T r 1 T r rk PνT´1 U “ In ` Pν ´1 Ωk Pν ´1 In ´ Pν ´1 Ωk Pν ´1 k k k 2 k 2 k ˙ˆ ˙´1 ˆ 1r 1r rk . Ω U I ´ “ PνT´1 In ` Ω k k n k 2 2

Uk`1 “

The following is an alternative box that describes the steepest descent algorithm on orthogonal Stiefel manifolds: ˘ ` 1. For a n ˆ p matrix U construct the vector u :“ vecpU q “ uT1 , ..., uTp P Rnp .

r : Stnp Ñ R. 2. Consider a smooth prolongation G : Rnp Ñ R of the cost function G 3. Compute ∇Gpuq and construct the n ˆ p matrix ∇GpU q “ vec´1 p∇Gpuqq.

4. Compute the Lagrange multiplier functions ˆ    ˙  1 BG BG BG puq, ua , σbc puq “ puq, ub ` puq, uc . σaa puq “ Bua 2 Buc Bub 5. Construct the symmetric p ˆ p matrix ΣpU q “ rσbc puqs. 6. Compute the n ˆ p matrix BGpU q “ ∇GpU q ´ U ΣpU q. 7. Input U0 P Stnp and k “ 0. 8. repeat ‚ Compute the n ˆ p matrix BGpUk q. ‚ Determine a set Ip pUk q containing the indexes of the rows that form a full rank submatrix of Uk . Construct the permutation matrix Pν ´1 using formula (4.4). k

13

rk “ P ´1 ¨ Uk and BGpU Ă k q “ P ´1 ¨ BGpUk q. Write U rk and BGpU Ă k q in the block ‚ Compute U „ s νk „ Ďνk  Uk BGpUk q matrix form , and respectively Ě . s Uk BGpUk q

‚ Choose a length step λk P R and compute ´ ¯ # Ě qT U s U Ď kq ` U s ´T BGpU s k “ ´λk BGpU s ´1 Ω k k k k Ě qT . s “λ U s ´T BGpU Ω k k k k

rk “ ‚ Form the matrix Ω

‚ Compute

«

sk Ω sT ´Ω k

Uk`1 “

s Ω k O

PνT´1 k



.

˙ˆ ˙´1 ˆ 1r 1r rk U In ´ Ωk In ` Ωk 2 2

r until Uk`1 sufficiently minimizes G.

An intrinsic way to construct an update for the steepest descent algorithm is to use a geodesic-like update. Using QR-decomposition, this has been constructed in [12]. A quasi-geodesic update has been introduced in [20] and [25] for computational efficiency. An interesting retraction and its associated quasi-geodesic curves have been constructed in [18] in relation to interpolation problems on Stiefel manifolds. An extrinsic method to update the algorithm is using projection-like retraction. For computational reasons various projection-like retraction updates have been constructed in [1], [19], [7]. Brockett cost function case. For the case I, we consider the following particular cost function on St42 GpU q “ µ1 uT1 Au1 ` µ2 uT2 Au2 , where µ1 “ 1, µ2 “ 2, and A “ diag p1, 2, 3, 4q. The cost function being quadratic it is invariant under the sign change of the vectors that give the columns of the matrix U , but it is not invariant under the order of these column vectors. The set of critical points is given by: • four critical points generated by re2 , e1 s (i.e., re2 , e1 s, r´e2 , e1 s, re2 , ´e1 s, and r´e2 , ´e1 s) with the value of the cost function equals 4, which is a global minimum. • eight critical points generated by re1 , e2 s and re3 , e1 s with the value of the cost function equals 5. • four critical points generated by re4 , e1 s with the value of the cost function equals 6. • eight critical points generated by re1 , e3 s and re3 , e2 s with the value of the cost function equals 7. • eight critical points generated by re2 , e3 s and re4 , e2 s with the value of the cost function equals 8. • four critical points generated by re1 , e4 s with the value of the cost function equals 9. • eight critical points generated by re2 , e4 s and re4 , e3 s with the value of the cost function equals 10. • four critical points generated by re3 , e4 s with the value of the cost function equals 11, which is a global maximum. For the case I, we have run the algorithm for some initial points and we show the convergence of the sequence of iterations toward the corresponding critical points.

14

U0 »

0 ?

— — 2 — — ´ — 2 — — 0 — ? – 2 ´ 2 »

0 ?

— — — — ´ 2 — 2 — — 0 — — ? – 2 ´ 2 » — — — — — — — — — — –

?

3 3 0 ? 3 ´ 3 ? 3 3

» Ucr (critical point)

GpUcr q

fi 1.0000000 ffi 0 ffi ´7.8365183 ¨ 10´171 fl 0

r´e2 , e1 s

4

fi ´0.99999998 ffi 0.00021656 ffi fl 0 3.6515800 ¨ 10´14

r´e2 , ´e1 s

4

fi ´1.0000000 ffi 0 ffi 1.4227614 ¨ 10´13 fl ´1.9382582 ¨ 10´14

r´e3 , ´e1 s

5

U300 ?

2 2



ffi ffi ffi 0 ffi ? ffi 2 ffi ffi ´ 2 ffi fl 0 ?

3 3 ? 3 3 0 ? 3 ´ 3 ?

2 2 0 ? 2 ´ 2 ´

0

fi ffi ffi ffi ffi ffi ffi ffi ffi ffi fl fi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi fl

»

0 — ´1.0000000 — – 0 ´6.1260222 ¨ 10´166

»

´0.00021656 — ´1.0000000 — – 0 ´1.8444858 ¨ 10´10

»

´1.4227613 ¨ 10´13 — 0 — – ´1.0000000 ´1.7746315 ¨ 10´14

For the case II, when µ1 “ µ2 “ 1 and the same matrix A, we obtain continuous families of critical points. Starting from the initial point » 1 fi 0 2 — ffi ? ffi — — 1 ffi 2 — ffi ´ — 2 2 ffi ffi U0 “ — — 1 ffi — ´ ffi 0 — 2 ffi — ? ffi – fl 1 2 ´ ´ 2 2 the algorithm goes after 300 iterations to fi fi » » 0.9786243 0.2056559 0.97862435 0.20565599 ffi — 0.2056559 ´0.9786243 ffi — 0.20565598 ´0.97862437 ffi, ffi — U300 “ — fl – 8.8491189 ¨ 10´11 ´4.4716231 ¨ 10´11 fl » – 0 0 ´12 ´12 0 0 9.2851360 ¨ 10 ´4.6922465 ¨ 10

which is a rotation of the frame te1 , e2 u with an angle θ » 0.207 radians and the value of the cost function equals 3, which is a global minimum.

Acknowledgment. This work was supported by a grant of Ministery of Research and Innovation, CNCS UEFISCDI, project number PN-III-P4-ID-PCE-2016-0165, within PNCDI III.

15

References [1] P.A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008. [2] R.L. Adler, J.-P. Dedieu, J.Y. Margulies, M. Martens, M. Shub, Newtons method on Riemannian manifolds and a geometric model for the human spine, IMA J. Numer. Anal., Vol. 22 (2002), pp. 359-390. [3] J. Balogh, T. Csendes, T. Rapcs´ ak, Some Global Optimization Problems on Stiefel Manifolds, Journal of Global Optimization, Vol. 30, Issue 1 (2004), pp. 91-101. [4] P. Birtea, D. Com˘ anescu, Geometric dissipation for dynamical systems, Comm. Math. Phys., Vol. 316, Issue 2 (2012), pp. 375-394. [5] P. Birtea, D. Com˘ anescu, Hessian Operators on Constraint Manifolds, J. Nonlinear Science, Vol. 25, Issue 6 (2015), pp. 1285-1305. [6] P. Birtea, D. Com˘ anescu, Newton Algorithm on Constraint Manifolds and the 5-Electron Thomson Problem, J. Optim. Theor. Appl., Vol. 173, Issue 2 (2017), pp. 563-583. [7] Bo Jiang, Yu-Hong Dai, A framework of constraint preserving update schemes for optimization on Stiefel manifold, Math. Program., Ser. A, Vol. 153, Issue 2 (2015), pp. 535-575. [8] M. Bolla, G. Michaletzky, G. Tusn´ ady, M. Ziermann, Extrema of Sums of Heterogeneous Quadratic Forms, Linear Algebra and its Applications, Vol. 269, Issues 1-3 (1998), pp. 331-365. [9] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, Vol. 3, Issue 1 (2010), pp. 1-122. [10] Caihua Chen, Bingsheng He, Yinyu Ye, Xiaoming Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent, Math. Program., Ser. A, Vol. 155, Issues 1-2 (2016), pp. 57-79. [11] M.T. Chu, N.T. Trendafilov, The orthogonally constrained regression revisited, J. Comput. and Graphical Statistics, Vol. 10, Issue 4 (2001), pp. 746-771. [12] A. Edelman, T. A. Arias, S. T. Smith,The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl., Vol. 20, Issue 2 (1998), pp. 303-353. [13] L. Eld´ en, H. Park, A Procrustes problem on the Stiefel manifold, Numer. Math., Vol. 82 (1999), pp. 599-619. [14] C. Fraikin, K. H¨ uper, P. Van Dooren, Optimization over the Stiefel manifold, Proc. in Appl. Math. Mech., Vol. 7, Issue 1 (2007). [15] R. Glowinski, On alternating direction methods of multipliers: a historical perspective. In: W. Fitzgibbon, Y.A. Kuznetsov, P. Neittaanmaki, O. Pironneau (eds.), Modeling, Simulation and Optimization for Science and Technology, Computational Methods in Applied Sciences, Vol. 34, pp. 59-82, Springer, Dordrecht (2014). [16] R. Glowinski, A. Marrocco, Sur l’approximation par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´e d’une classe de probl`emes de Dirichlet non lin´eaires, Rev. Fran¸caise Automat. Inf. Rech. Op´erationnelle, Vol. 9, Issue 2 (1975), pp. 41-76. [17] T. Kanamori, A. Takeda, Non-convex Optimization on Stiefel Manifold and Applications to Machine Learning, Neural Information Processing - 19th International Conference, ICONIP 2012, Doha, Qatar, Proceedings, Part I, pp. 109-116, 2012. [18] K.A. Krakowski, L. Machado, F.S. Leite, J. Batista, A modified Casteljau algorithm to solve interpolation problems on Stiefel manifolds, Journal of Computational and Applied Mathematics, Vol. 311 (2017), pp. 84-99. [19] J.H. Manton, Optimization algorithms exploiting unitary constraints, IEEE Trans. Signal Process., Vol. 50 (2002), pp. 635-650. [20] Y. Nishimori, S. Akaho, Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold, Neurocomputing, Vol. 67 (2005), pp. 106-135. [21] K.B. Petersen, M.S. Pedersen, The Matrix Cookbook, 2012.

16

[22] T. Rapcs´ ak, On minimization on Stiefel manifolds, European Journal of Operational Research, Vol. 143 (2002), pp. 365-376. [23] J.B. Rosen, The Gradient Projection Method for Nonlinear Programming. Part II. Nonlinear Constraints, Journal of the Society for Industrial and Applied Mathematics, Vol. 9, Issue 4 (1961), pp. 514-532. [24] M. Shub, Some remarks on dynamical systems and numerical analysis. In: Dynamical Systems and Partial Differential Equations (Caracas, 1984), pp. 69-91. Univ. Simon Bolivar, Caracas (1986). [25] Z. Wen, W. Yin, A feasible method for optimization with orthogonality constraints, Math. Program., Ser. A, Vol. 142, Issue 1 (2013), pp. 397-434. [26] Y. Zhang, Recent advances in alternating direction methods: Theory and practice. In: IPAM Workshop: Numerical Methods for Continuous Optimization. UCLA, Los Angeles (2010).

17