Interior Point Algorithm - Princeton University

48 downloads 0 Views 132KB Size Report
Mar 30, 2007 - 4. Update: x := x + t∆x. Advantages of Newton method: Fast, Robust, Scalable ..... Ellipsoid method, cutting plane method, simplex method .
ELE539A: Optimization of Communication Systems Lecture 18: Interior Point Algorithm

Professor M. Chiang Electrical Engineering Department, Princeton University

March 30, 2007

Lecture Outline • Newton’s method • Equality constrained convex optimization • Inequality constrained convex optimization • Barrier function and central path • Barrier method

• Summary of topics covered in the course • Final projects • Conclusions

Newton Method Newton step: ∆xnt = −∇2 f (x)−1 ∇f (x)

Positive definiteness of ∇2 f (x) implies that ∆xnt is a descent direction

Interpretation: linearize optimality condition ∇f (x∗ ) = 0 near x, ∇f (x + v) ≈ ∇f (x) + ∇2 f (x)v = 0 Solving this linear equation in v, obtain v = ∆xnt . Newton step is the addition needed to x to satisfy linearized optimality condition

Main Properties • Affine invariance: given nonsingular T ∈ Rn×n and let f¯(y) = f (T x). Then Newton step for f¯ at y: ∆ynt = T −1 ∆xnt and x + ∆xnt = T (y + ∆ynt ) • Newton decrement: “ ”1/2 “ ”1/2 T 2 −1 T 2 λ(x) = ∇f (x) ∇ f (x) ∇f (x) = ∆xnt ∇ f (x)∆xnt Let fˆ be second order approximation of f at x. Then 1 f (x) − p∗ ≈ f (x) − inf fˆ(y) = f (x) − fˆ(x + ∆xnt ) = λ(x)2 y 2

Newton Method GIVEN a starting point x ∈ dom f and tolerance ǫ > 0 REPEAT 1. Compute Newton step and decrement: ∆xnt = −∇2 f (x)−1 ∇f (x) and ` ´1/2 T 2 −1 λ = ∇f (x) ∇ f (x) ∇f (x) 2. Stopping criterion: QUIT if

λ2 2

≤ǫ

3. Line search: choose a step size t > 0 4. Update: x := x + t∆x

Advantages of Newton method: Fast, Robust, Scalable

Error vs. Iteration for Problems in R100 and R10000

T

minimize c x −

M X

100 log(bi − aT and ∈ R10000 i x), x ∈ R

i=1

5

5

10

f (x(k) ) − p∗

f (x(k) ) − p∗

10

0

10

0

10

−5

10

b

x

−10

10

−5

10

−15

10

0

2

4

k

6

8

10

0

5

10

k

15

20

Equality Constrained Problems Solve a convex optimization with equality constraints: minimize

f (x)

subject to

Ax = b

f : Rn → R is twice differentiable A ∈ Rp×n with rank p < n Optimality condition: KKT equations with n + p equations in n + p variables x∗ , ν ∗ : Ax∗ = b,

∇f (x∗ ) + AT ν ∗ = 0

Approach 1: Can be turned into an unconstrained optimization, after eliminating the equality constraints

Approach 2: Dual Solution Dual function: g(ν) = −bT ν − f ∗ (−AT ν) Dual problem: maximize − bT ν − f ∗ (−AT ν)

Example: solve Pn

minimize



subject to

Ax = b

i=1

log xi

Dual problem: maximize − bT ν +

n X

log(AT ν)i

i=1

Recover primal variable from dual variable: xi (ν) = 1/(AT ν)i

Example With Analytic Solution Convex quadratic minimization over equality constraints: minimize

(1/2)xT P x + q T x + r

subject to

Ax = b

Optimality condition: 2 4

P A

AT 0

32 54

x∗ ν∗

3

2

5=4

−q b

3 5

If KKT matrix is nonsingular, there is a unique optimal primal-dual pair x∗ , ν ∗ If KKT matrix is singular but solvable, any solution gives optimal x∗ , ν ∗ If KKT matrix has no solution, primal problem is unbounded below

Approach 3: Direct Derivation of Newton Method Make sure initial point is feasible and A∆xnt = 0 Replace objective with second order Tayler approximation near x: minimize

fˆ(x + v) = f (x) + ∇f (x)T v + (1/2)v T ∇2 f (x)v

subject to

A(x + v) = b

Find Newton step ∆xnt by solving: 2 32 3 2 3 ∇2 f (x) AT ∆xnt −∇f (x) 4 54 5=4 5 A 0 w 0 where w is associated optimal dual variable of Ax = b

Newton’s method (Newton decrement, affine invariance, and stopping criterion) stay the same

Inequality Constrained Minimization Let f0 , . . . , fm : Rn → R be convex and twice continuously differentiable and A ∈ Rp×n with rank p < n: minimize

f0 (x)

subject to

fi (x) ≤ 0, i = 1, . . . , m Ax = b

Assume the problem is strictly feasible and an optimal x∗ exists

Idea: reduce it to a sequence of linear equality constrained problems and apply Newton’s method First, need to approximately formulate inequality constrained problem as an equality constrained problem

Barrier Function Make inequality constraints implicit in the objective: minimize

f0 (x) +

subject to

Ax = b

Pm

i=1 I− (fi (x))

where I− is indicator function: 8 < 0 I− (u) = : ∞

u≤0 u>0

No inequality constraints, but objective function not differentiable Approximate indicator function by a differentiable, closed, and convex function: Iˆ− (u) = −(1/t) log(−u), dom Iˆ− = −R++ where a larger parameter t gives more accurate approximation Iˆ− increases to ∞ as u increases to 0

Log Barrier 10

y

5

0

−5 −3

−2

−1

0

u

Use Newton method to solve approximation: P ˆ minimize f0 (x) + m i=1 I− (fi (x)) subject to

Ax = b

1

Log Barrier Log barrier function: φ(x) = −

m X

log(−fi (x)), dom φ = {x ∈ Rn |fi (x) < 0, i = 1, . . . , m}

i=1

Approximation better if t is large, but then Hessian of f0 + (1/t)φ varies rapidly near boundary of feasible set. Accuracy Stability tradeoff Solve a sequence of approximation with larger t, using Newton method for each step of the sequence Gradient and Hessian of log barrier function: ∇φ(x) 2

∇ φ(x)

=

=

m X

1 ∇fi (x) −fi (x)

i=1

X 1 1 T 2 ∇f (x)∇f (x) + ∇ fi (x) i i fi (x)2 −f (x) i i=1

i=1 m X

m

Central Path Consider the family of optimization problems parameterized by t > 0: minimize

tf0 (x) + φ(x)

subject to

Ax = b

Central path: solutions to above problem x∗ (t), characterized by: 1. Strict feasibility: Ax∗ (t) = b,

fi (x∗ (t)) < 0, i = 1, . . . , m

2. Centrality condition: there exists νˆ ∈ Rp such that t∇f0 (x∗ (t)) +

m X i=1

1 ∇fi (x∗ (t)) + AT νˆ = 0 ∗ −fi (x (t))

Every central point gives a dual feasible point. Let λ∗i (t) = −

1 , i = 1, . . . , m tfi (x∗ (t))

ν ∗ (t) =

νˆ t

Central Path Dual function g(λ∗ (t), ν ∗ (t)) = f0 (x∗ (t)) − m/t which implies duality gap is m/t. Therefore, suboptimality gap f0 (x∗ (t)) − p∗ ≤ m/t

Interpretation as modified KKT condition: x is a central point x∗ (t) iff there exits λ, ν such that Ax = b, fi (x) ≤ 0,

i = 1, . . . , m

λ0 ∇f0 (x) +

m X

λi ∇fi (x) + AT ν = 0

i=1

−λi fi (x) = 1/t,

i = 1, . . . , m

Complementary slackness is relaxed from 0 to 1/t

Example Inequality form LP: minimize

cT x

subject to

Ax  b

Log barrier function: φ(x) = −

m X

log(bi − aT i x)

i=1

with gradient and Hessian: ∇φ(x) 2

∇ φ(x)

=

=

m X i=1 m X i=1

ai T d = A bi − aT x i ai aT T 2 i = A diag(d )A T (bi − ai x)2

where di = 1/(bi − aT i x) Centrality condition becomes: tc + AT d = 0

c is parallel to ∇φ(x). Therefore, hyperplane cT x∗ (t) is tangent to level set of φ

c

xo

xs

Barrier Method GIVEN a strictly feasible point x ∈ dom f, t := t(0) > 0, µ > 1 and tolerance ǫ > 0 REPEAT 1. Centering step: compute x∗ (t) by minimizing tf0 + φ subject to Ax = b, starting at x 2. Update: x := x∗ (t) 3. Stopping criterion: QUIT if

m t

≤ǫ

4. Increase t: t := µt

Other names: Sequential Unconstrained Minimization Technique (SUMT) or path-following method Usually use Newton method for Centering Step

Remarks • Each centering step does not need to be exact • Choice of µ: tradeoff number of inner iterations with number of outer iterations • Choice of t(0) : tradeoff number of inner iterations within the first outer iteration with number of outer iterations • Number of centering steps required: log(m/(ǫt(0) )) log µ where m is the number of inequality constraints and ǫ is desired accuracy

duality gap

Progress of Barrier Method for an LP Example

2

10

0

10

−2

10

−4

10

−6

10

0

c2 c3

20

Three curves for µ = 2, 50, 150

40

c1

60

NT. Ite.

80

Tradeoff of µ Parameter for a Small LP

NT. Ite.

140 120 100 80 60 40 20 0 0

20

40

60

80

100 120 140 160 180 200

µ

duality gap

Progress of Barrier Method for a GP Example 2

10

0

10

−2

10

−4

10

−6

10

0

c3

20

40

c2

c1

60

80

NT Ite.

100

120

Insensitive to Problem Size 35

4

NT. Ite.

duality gap

10

30

2

10

0

25

10

−2

20

10

c1

c2

c3

−4

10

0

10

20

30

NT. Ite.

40

50

15 1 10

Three curves for m = 50, 500, 1000, n = 2m.

2

10

m

3

10

Phase I Method How to compute a strictly feasible point to start barrier method? Consider a phase I optimization problem in variables x ∈ Rn , s ∈ R: minimize

s

subject to

fi (x) ≤ s, i = 1, . . . , m Ax = b

Strictly feasible point: for any x(0) , let s = max fi (x(0) )

1. Apply barrier method to solve phase I problem (stop when s < 0) 2. Use the resulted strictly feasible point for the original problem to start barrier method for the original problem

Not Covered • Self-concordance analysis and complexity analysis (polynomial time) • Numerical linear algebra (large scale implementation) • Generalized inequalities (SDP)

Other (sometimes more efficient) algorithms: • Primal-dual interior point method • Ellipsoid methods • Analytic center cutting plane methods

Lecture Summary • Convert equality constrained optimization into unconstrained optimization • Solve a general convex optimization with interior point methods • Turn inequality constrained problem into a sequence of equality constrained problems that are increasingly accurate approximation of the original problem • Polynomial time (in theory) and much faster (in practice): about 25-50 least-squares effort for a wide range of problem sizes

Readings: Chapters 9.1-9.3, 9.5, 10.1-10.2, 11.1-11.4 in Boyd and Vandenberghe

Why Take This Course • Learn the tools and mentality of optimization (surprisingly useful for other study you may engage in later on) • Learn classic and recent results on optimization of communication systems (over a surprisingly wide range of problems) • Enhance the ability to do original research in academia or industry • Have a fun and productive time on the final project

Where We Started

Application Presentation Source

Session

Source Encoder

Channel Encoder

Transport

Modulator

Channel

Network

Desti.

Link Physical

minimize

f (x)

subject to

x∈C

Source Decoder

Channel Decoder

Demodulator

Topics Covered: Methodologies • Nonlinear optimization (linear, convex, nonconvex), Pareto optimization, Dynamic programming, Integer programming • Convex optimization, convex sets and convex functions • Lagrange duality and KKT optimality condition • LP, QP, QCQP, SOCP, SDP, GP, SOS • Gradient method, Newton’s method, interior point method • Distributed algorithms and decomposition methods • Nonconvex optimization and relaxations • Stochastic network optimization and Lyapunov function • Robust LP and GP • Network flow problems • NUM and generalized NUM

Topics Covered: Applications • Information theory problems: channel capacity and rate distortion for discrete memoryless models • Detection and estimation problems: MAP and ML detector, multi-user detection • Physical layer: DSL spectrum management • Wireless networks: resource allocation, power control, joint power and rate allocation • Network algorithms and protocols: multi-commodity flow problems, max flow, shortest path routing, network rate allocation, TCP congestion control, IP routing • Reverse engineering, heterogeneous protocols • Congestion, collision, interference management • Layering as optimization decomposition • Eigenvalue and norm optimization problems

Some of the Topics Not Covered • Circuit switching and optimal routing • Packet switch architecture and algorithms • Multi-user information theory problems • More digital signal processing like blind equalization • Large deviations theory and queuing theory applications • System stability and control theoretic improvement of TCP • Inverse optimization, robust optimization • Self concordance analysis of interior point method’s complexity • Ellipsoid method, cutting plane method, simplex method ...

Final Projects: Timeline March 30 – May 24: • Talk to me as often as needed • Start early

April 27: Interim project report due

Mid May: Project Presentation Day

May 24: Final project report due Many student continue to extend the project

Final Projects: Topics Wenjie Jiang: Content distribution networking Suhas Mathur: Wireless network secuirty Hamed Mohsenian: Multi-channel MAC Martin Suchara: Stochastic simulation for TCP/IP Guanchun Wang: Google with feedback Yihong Wu: Practical stepsize Yiyue Wu: Coding Yongxin Xi: Image analysis Minlan Yu: Network embedding Dan Zhang: To be decided

The Optimization Mentality • What can be varied and what cannot? • What’re the objectives? • What’re the constraints? • What’s the dual? Build models, analyze models, prove theorems, compute solutions, design systems, verify with data, refine models ...

Warnings: • Remember the restrictive assumptions for application needs • Understand the limitations: discrete, nonconvex, complexity, robustness ... (still a lot can be done for these difficult issues)

Optimization of Communication Systems Three ways to apply optimization theory to communication systems: • Formulate the problem as an optimization problem • Interpret a given solution as an optimizer/algorithm for an optimization problem • Extend the underlying theory by optimization theoretic techniques A remarkably powerful, versatile, widely applicable and not yet fully recognized framework

Applications in communication systems also stimulate new developments in optimization theory and algorithms

Optimization of Communication Systems OPT Problems

Solutions OPT

OPT

Theories

Optimization Beyond Optimality • Modeling (reverse engineering, fairness, etc.) • Architecture • Robustness • Design for Optimizability

This Is The Beginning Quote from Thomas Kailath:

A course is not about what you’ve covered, but what you’ve uncovered

To Everyone In ELE539A You’ve Been Amazing