• Keine Ergebnisse gefunden

A multilevel stochastic gradient algorithm for PDE-constrained optimal control problems under

N/A
N/A
Protected

Academic year: 2022

Aktie "A multilevel stochastic gradient algorithm for PDE-constrained optimal control problems under"

Copied!
66
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

A multilevel stochastic gradient algorithm for PDE-constrained optimal control problems under

uncertainty

Fabio Nobile

CSQI – Institute of Mathematics, EPFL, Switzerland

Joint work with: M. Martin(Criteo, Grenoble),S. Krumscheid(RWTH Aachen), P. Tsilifis(EPFL)

RICAM Special Semestre on Optimization

Workshop 3 “Optimization and Inversion under Uncertainty”

November 11-15, 2019, Linz, Austria

(2)

Outline

1 Problem setting – quadratic optimal control problem

2 Discretization by finite elements + Monte Carlo

3 Deterministic (CG) iterative solvers versus Stochastic Gradient

4 Multilevel stochastic gradient algorithms

5 Conclusions

(3)

Problem setting – quadratic optimal control problem

Outline

1 Problem setting – quadratic optimal control problem

2 Discretization by finite elements + Monte Carlo

3 Deterministic (CG) iterative solvers versus Stochastic Gradient

4 Multilevel stochastic gradient algorithms

5 Conclusions

(4)

Problem setting – quadratic optimal control problem

Problem setting

(Ω,F,P): complete probability space

D⊂Rd: physical domain. Throughoutk · k=k · kL2(D). Forward problem

(−div(a(x,ω)∇y(x, ω)) =g(x) +u(x), for a.e. x∈D, ω∈Ω

y(x, ω) = 0, for a.e. x∈∂D, ω∈Ω (*) witha(·, ω) a random fields.t. 0<amin≤a(x, ω)≤amax, ∀(x, ω)∈D×Ω.

=⇒random solutionω7→y(·, ω)∈H01(D). In particulary ∈L2P(Ω;H01(D)). u∈L2(D): control function

Optimal control problem

u∈L2 (D)min

y∈L2 P(Ω;H1

0(D))

J(u,˜ y) := 1

2Eω[ky(·, ω)−ytargetk2] + β

2kuk2, subject to (*)

(5)

Problem setting – quadratic optimal control problem

Problem setting

(Ω,F,P): complete probability space

D⊂Rd: physical domain. Throughoutk · k=k · kL2(D).

Forward problem

(−div(a(x,ω)∇y(x, ω)) =g(x) +u(x), for a.e. x∈D, ω∈Ω

y(x, ω) = 0, for a.e. x∈∂D, ω∈Ω (*) witha(·, ω) a random fields.t. 0<amin≤a(x, ω)≤amax, ∀(x, ω)∈D×Ω.

=⇒random solutionω7→y(·, ω)∈H01(D). In particulary ∈L2P(Ω;H01(D)). u∈L2(D): control function

Optimal control problem

u∈L2 (D)min

y∈L2 P(Ω;H1

0(D))

J(u,˜ y) := 1

2Eω[ky(·, ω)−ytargetk2] + β

2kuk2, subject to (*)

(6)

Problem setting – quadratic optimal control problem

Problem setting

(Ω,F,P): complete probability space

D⊂Rd: physical domain. Throughoutk · k=k · kL2(D). Forward problem

(−div(a(x,ω)∇y(x, ω)) =g(x) +u(x), for a.e. x∈D, ω∈Ω

y(x, ω) = 0, for a.e. x∈∂D, ω∈Ω (*)

witha(·, ω) a random fields.t. 0<amin≤a(x, ω)≤amax, ∀(x, ω)∈D×Ω.

=⇒random solutionω7→y(·, ω)∈H01(D). In particulary ∈L2P(Ω;H01(D)).

u∈L2(D): control function Optimal control problem

u∈L2 (D)min

y∈L2 P(Ω;H1

0(D))

J(u,˜ y) := 1

2Eω[ky(·, ω)−ytargetk2] + β

2kuk2, subject to (*)

(7)

Problem setting – quadratic optimal control problem

Problem setting

(Ω,F,P): complete probability space

D⊂Rd: physical domain. Throughoutk · k=k · kL2(D). Forward problem

(−div(a(x,ω)∇y(x, ω)) =g(x) +u(x), for a.e. x∈D, ω∈Ω

y(x, ω) = 0, for a.e. x∈∂D, ω∈Ω (*)

witha(·, ω) a random fields.t. 0<amin≤a(x, ω)≤amax, ∀(x, ω)∈D×Ω.

=⇒random solutionω7→y(·, ω)∈H01(D). In particulary ∈L2P(Ω;H01(D)).

u∈L2(D): control function

Optimal control problem

u∈L2 (D)min

y∈L2 P(Ω;H1

0(D))

J(u,˜ y) := 1

2Eω[ky(·, ω)−ytargetk2] + β

2kuk2, subject to (*)

(8)

Problem setting – quadratic optimal control problem

Problem setting

(Ω,F,P): complete probability space

D⊂Rd: physical domain. Throughoutk · k=k · kL2(D). Forward problem

(−div(a(x,ω)∇y(x, ω)) =g(x) +u(x), for a.e. x∈D, ω∈Ω

y(x, ω) = 0, for a.e. x∈∂D, ω∈Ω (*)

witha(·, ω) a random fields.t. 0<amin≤a(x, ω)≤amax, ∀(x, ω)∈D×Ω.

=⇒random solutionω7→y(·, ω)∈H01(D). In particulary ∈L2P(Ω;H01(D)).

u∈L2(D): control function Optimal control problem

u∈L2 (D)min

y∈L2(Ω;H1(D))

J(u,˜ y) := 1

2Eω[ky(·, ω)−ytargetk2] + β

2kuk2, subject to (*)

(9)

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω:L2(D)→H01(D)

∀ω∈Ω u7→yω(u), solution of

(−div(a(·, ω)∇yω(u)) =g+u, inD

yω(u) = 0, on∂D.

Reduced functional: minuL2(D)J(u) J(u) =Eω[f(u, ω)], f(u, ω) = 1

2kyω(u)−ytargetk2+β 2kuk2 Adjoint based gradient computation:

uf(u, ω) =βu+pω(u), ∇uJ(u) =βu+Eω[pω(u)] wherepω(u) solves the adjoint problem∀ω∈Ω

−div(a(·, ω)∇pω(u)) =yω(u)−ytarget inD, pω(u) = 0 on∂D. Lipschitz and strong convexity properties of∇uf: ∀u1,u1∈L2(D), ω∈Ω

k∇uf(u1, ω)− ∇uf(u2, ω)k ≤Lku1−u2k, L=β+ CP4 a2min h∇uf(u1, ω)− ∇uf(u2, ω),u1−u2iL2(D)≥ `

2ku1−u2k2, `= 2β

(10)

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω:L2(D)→H01(D)

∀ω∈Ω u7→yω(u), solution of

(−div(a(·, ω)∇yω(u)) =g+u, inD

yω(u) = 0, on∂D.

Reduced functional: minuL2(D)J(u) J(u) =Eω[f(u, ω)], f(u, ω) = 1

2kyω(u)−ytargetk2+β 2kuk2

Adjoint based gradient computation:

uf(u, ω) =βu+pω(u), ∇uJ(u) =βu+Eω[pω(u)] wherepω(u) solves the adjoint problem∀ω∈Ω

−div(a(·, ω)∇pω(u)) =yω(u)−ytarget inD, pω(u) = 0 on∂D. Lipschitz and strong convexity properties of∇uf: ∀u1,u1∈L2(D), ω∈Ω

k∇uf(u1, ω)− ∇uf(u2, ω)k ≤Lku1−u2k, L=β+ CP4 a2min h∇uf(u1, ω)− ∇uf(u2, ω),u1−u2iL2(D)≥ `

2ku1−u2k2, `= 2β

(11)

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω:L2(D)→H01(D)

∀ω∈Ω u7→yω(u), solution of

(−div(a(·, ω)∇yω(u)) =g+u, inD

yω(u) = 0, on∂D.

Reduced functional: minuL2(D)J(u) J(u) =Eω[f(u, ω)], f(u, ω) = 1

2kyω(u)−ytargetk2+β 2kuk2 Adjoint based gradient computation:

uf(u, ω) =βu+pω(u), ∇uJ(u) =βu+Eω[pω(u)]

wherepω(u) solves the adjoint problem∀ω∈Ω

−div(a(·, ω)∇pω(u)) =yω(u)−ytarget inD, pω(u) = 0 on∂D.

Lipschitz and strong convexity properties of∇uf: ∀u1,u1∈L2(D), ω∈Ω k∇uf(u1, ω)− ∇uf(u2, ω)k ≤Lku1−u2k, L=β+ CP4

a2min h∇uf(u1, ω)− ∇uf(u2, ω),u1−u2iL2(D)≥ `

2ku1−u2k2, `= 2β

(12)

Problem setting – quadratic optimal control problem

Reduced functional

(Stochastic) Affine solution operator: yω:L2(D)→H01(D)

∀ω∈Ω u7→yω(u), solution of

(−div(a(·, ω)∇yω(u)) =g+u, inD

yω(u) = 0, on∂D.

Reduced functional: minuL2(D)J(u) J(u) =Eω[f(u, ω)], f(u, ω) = 1

2kyω(u)−ytargetk2+β 2kuk2 Adjoint based gradient computation:

uf(u, ω) =βu+pω(u), ∇uJ(u) =βu+Eω[pω(u)]

wherepω(u) solves the adjoint problem∀ω∈Ω

−div(a(·, ω)∇pω(u)) =yω(u)−ytarget inD, pω(u) = 0 on∂D.

Lipschitz and strong convexity properties of∇uf: ∀u1,u1∈L2(D), ω∈Ω k∇uf(u1, ω)− ∇uf(u2, ω)k ≤Lku1−u2k, L=β+ CP4

a2min h∇ f(u , ω)− ∇ f(u , ω),u −u i ≥ `

ku −u k2, `= 2β

(13)

Discretization by finite elements + Monte Carlo

Outline

1 Problem setting – quadratic optimal control problem

2 Discretization by finite elements + Monte Carlo

3 Deterministic (CG) iterative solvers versus Stochastic Gradient

4 Multilevel stochastic gradient algorithms

5 Conclusions

(14)

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE:∀u∈L2(D), ω∈Ω u7→yωh(u) solves

Z

D

a(·, ω)∇yωh(u)· ∇vh= Z

D

(g +u)vh, ∀vh∈Yhr Yhr: space of continuousPr finite element functions vanishing on the boundary

Monte Carlo approximation of expectation: ωi

iid∼ P, i= 1, . . .N

J(u) =Eω[f(u, ω)]≈ 1 N

N

X

i=1

f(u, ωi)

Discrete optimal control problem:

min

uL2(D) Jh,N(u) := 1 N

N

X

i=1

fh(u, ωi) = 1 N

N

X

i=1

1

2kyωhi(u)−ytargetk2+β 2kuk2

Remark: The unique minimizeruh,N? ∈Yhr

(15)

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE:∀u∈L2(D), ω∈Ω u7→yωh(u) solves

Z

D

a(·, ω)∇yωh(u)· ∇vh= Z

D

(g +u)vh, ∀vh∈Yhr Yhr: space of continuousPr finite element functions vanishing on the boundary

Monte Carlo approximation of expectation: ωi

iid∼ P, i= 1, . . .N

J(u) =Eω[f(u, ω)]≈ 1 N

N

X

i=1

f(u, ωi)

Discrete optimal control problem:

min

uL2(D) Jh,N(u) := 1 N

N

X

i=1

fh(u, ωi) = 1 N

N

X

i=1

1

2kyωhi(u)−ytargetk2+β 2kuk2

Remark: The unique minimizeruh,N? ∈Yhr

(16)

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE:∀u∈L2(D), ω∈Ω u7→yωh(u) solves

Z

D

a(·, ω)∇yωh(u)· ∇vh= Z

D

(g +u)vh, ∀vh∈Yhr Yhr: space of continuousPr finite element functions vanishing on the boundary

Monte Carlo approximation of expectation: ωi

iid∼ P, i= 1, . . .N

J(u) =Eω[f(u, ω)]≈ 1 N

N

X

i=1

f(u, ωi)

Discrete optimal control problem:

min

uL2(D) Jh,N(u) := 1 N

N

X

i=1

fh(u, ωi) = 1 N

N

X

i=1

1

2kyωhi(u)−ytargetk2+β 2kuk2

Remark: The unique minimizeruh,N? ∈Yhr

(17)

Discretization by finite elements + Monte Carlo

Finite dimensional approximation

Finite Element approximation of the PDE:∀u∈L2(D), ω∈Ω u7→yωh(u) solves

Z

D

a(·, ω)∇yωh(u)· ∇vh= Z

D

(g +u)vh, ∀vh∈Yhr Yhr: space of continuousPr finite element functions vanishing on the boundary

Monte Carlo approximation of expectation: ωi

iid∼ P, i= 1, . . .N

J(u) =Eω[f(u, ω)]≈ 1 N

N

X

i=1

f(u, ωi)

Discrete optimal control problem:

min

uYhr

Jh,N(u) := 1 N

N

X

i=1

fh(u, ωi) = 1 N

N

X

i=1

1

2kyωhi(u)−ytargetk2+β 2kuk2

Remark: The unique minimizeruh,N? ∈Yhr

(18)

Discretization by finite elements + Monte Carlo

Optimality conditions

primal pbs:

Z

D

a(·, ωi)∇yωhi · ∇vh= Z

D

(g+uh,N)vh, ∀vh∈Yhr, i= 1, . . . ,N, adjoint pbs:

Z

D

a(·, ωi)∇vh· ∇phωi = Z

D

(yωhi −ytar)vh, ∀vh∈Yhr, i= 1, . . . ,N,

sensitivity:

Z

D

(βuh,N+ 1 N

N

X

i=1

phωi)vh= 0 ∀vh∈Yhr.

Algebraic system

A1 −M

. .. ...

AN −M

−M AT1

. .. . ..

−M ATN M · · · M βNM

 y1

... yN

p1 ... pN

u

=

 g

... g

−ytar ...

−ytar

0

(19)

Discretization by finite elements + Monte Carlo

Optimality conditions

primal pbs:

Z

D

a(·, ωi)∇yωhi · ∇vh= Z

D

(g+uh,N)vh, ∀vh∈Yhr, i= 1, . . . ,N, adjoint pbs:

Z

D

a(·, ωi)∇vh· ∇phωi = Z

D

(yωhi −ytar)vh, ∀vh∈Yhr, i= 1, . . . ,N,

sensitivity:

Z

D

(βuh,N+ 1 N

N

X

i=1

phωi)vh= 0 ∀vh∈Yhr.

Algebraic system

A1 −M

. .. ...

AN −M

−M AT1

. .. . ..

−M ATN M · · · M βNM

 y1

... yN

p1

... pN

u

=

 g

... g

−ytar

...

−ytar

0

(20)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Outline

1 Problem setting – quadratic optimal control problem

2 Discretization by finite elements + Monte Carlo

3 Deterministic (CG) iterative solvers versus Stochastic Gradient

4 Multilevel stochastic gradient algorithms

5 Conclusions

(21)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013],[VanBarel-Vandewalle 2018],[Borz`ı-vonWinckel 2011]

Eliminating (y1, . . . ,yN) and (p1, . . . ,pN) and introducing the block matrices

A=

 A1

. .. AN

, M=

 M

. .. M

, 1=

 Id

... Id

leads to a reduced systemGu=ξwith matrix G=βM+ 1

N1TMATMA1M1 The matrixG is spd and Cond(G) =O(β1) indep. ofhandN. Reduced system can be solved efficiently by e.g. conjugate gradient. Denotingujh,N thej-th iterate

kuh,N? −ujh,Nk ≤Cρj, ρ=

pCond(G)−1 pCond(G) + 1

(22)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013],[VanBarel-Vandewalle 2018],[Borz`ı-vonWinckel 2011]

Eliminating (y1, . . . ,yN) and (p1, . . . ,pN) and introducing the block matrices

A=

 A1

. .. AN

, M=

 M

. .. M

, 1=

 Id

... Id

leads to a reduced systemGu=ξwith matrix G=βM+ 1

N1TMATMA1M1

The matrixG is spd and Cond(G) =O(β1) indep. ofhandN. Reduced system can be solved efficiently by e.g. conjugate gradient. Denotingujh,N thej-th iterate

kuh,N? −ujh,Nk ≤Cρj, ρ=

pCond(G)−1 pCond(G) + 1

(23)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013],[VanBarel-Vandewalle 2018],[Borz`ı-vonWinckel 2011]

Eliminating (y1, . . . ,yN) and (p1, . . . ,pN) and introducing the block matrices

A=

 A1

. .. AN

, M=

 M

. .. M

, 1=

 Id

... Id

leads to a reduced systemGu=ξwith matrix G=βM+ 1

N1TMATMA1M1 The matrixG is spd and Cond(G) =O(β1) indep. ofhandN.

Reduced system can be solved efficiently by e.g. conjugate gradient. Denotingujh,N thej-th iterate

kuh,N? −ujh,Nk ≤Cρj, ρ=

pCond(G)−1 pCond(G) + 1

(24)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Reduced algebraic system

Several appraoches can be used to solve this coupled system

[Kouri-Heinkenschloss-eal 2013],[VanBarel-Vandewalle 2018],[Borz`ı-vonWinckel 2011]

Eliminating (y1, . . . ,yN) and (p1, . . . ,pN) and introducing the block matrices

A=

 A1

. .. AN

, M=

 M

. .. M

, 1=

 Id

... Id

leads to a reduced systemGu=ξwith matrix G=βM+ 1

N1TMATMA1M1 The matrixG is spd and Cond(G) =O(β1) indep. ofhandN.

Reduced system can be solved efficiently by e.g. conjugate gradient.

Denotingujh,N thej-th iterate

kuh,N? −ujh,Nk ≤Cρj, ρ=

pCond(G)−1 pCond(G) + 1

(25)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system

Error splitting assuming smooth solutionsyω(u?),pω(u?)∈Hr+1(D) E[ku?−ujh,Nk2]≤ C1ρ2j

| {z }

CG error

+ C2 N

|{z}

MC error

+C3h2r+2

| {z }

FE error

Cost to computeujh,N: assume that the cost of solving 1 PDE isO(hγd) (withγ∈(1,3])

=⇒ Work to computeuh,Nj : Work ∼jNh−γd

Complexity analysis. Balancing errors contributions: ρj ∼N12 ∼hr+1∼tol Work(tol).tol2

| {z }

MC

tolr+1γd

| {z }

FE

logtol1

| {z }

solver

(26)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system

Error splitting assuming smooth solutionsyω(u?),pω(u?)∈Hr+1(D) E[ku?−ujh,Nk2]≤ C1ρ2j

| {z }

CG error

+ C2 N

|{z}

MC error

+C3h2r+2

| {z }

FE error

Cost to computeujh,N: assume that the cost of solving 1 PDE isO(hγd) (withγ∈(1,3])

=⇒ Work to compute uh,Nj : Work ∼jNh−γd

Complexity analysis. Balancing errors contributions: ρj ∼N12 ∼hr+1∼tol Work(tol).tol2

| {z }

MC

tolr+1γd

| {z }

FE

logtol1

| {z }

solver

(27)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Deterministic approach

Use standard (deterministic) iterative method (e.g. CG) to solve the fully discrete system

Error splitting assuming smooth solutionsyω(u?),pω(u?)∈Hr+1(D) E[ku?−ujh,Nk2]≤ C1ρ2j

| {z }

CG error

+ C2 N

|{z}

MC error

+C3h2r+2

| {z }

FE error

Cost to computeujh,N: assume that the cost of solving 1 PDE isO(hγd) (withγ∈(1,3])

=⇒ Work to compute uh,Nj : Work ∼jNh−γd

Complexity analysis. Balancing errors contributions: ρj ∼N12 ∼hr+1∼tol Work(tol).tol2

| {z }

MC

tolr+1γd

| {z }

FE

logtol1

| {z }

solver

(28)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability)

uhj+1=ujh−τjufh(uhj, ωj)

= (1−τjβ)ujh−τjphω

j(uju) ωj

iid∼ P Learning rate τj =j+ατ0

Proposition[Martin-Nobile-Krumscheid 2018]

Assumingyω(u?),pω(u?)∈Hr+1(D), for any α∈R+andτ0> 1 there exist D1,D2>0 independent ofj andhs.t.

SG convergence: E[ku?h−ujhk2]≤D1j1

Error splitting: E[ku?−uhjk2]≤2D1j1+D2h2r+2

Complexity: Work(tol).tol2tolr+1γd (no log terms !)

(29)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability)

uhj+1=ujh−τjufh(uhj, ωj)

= (1−τjβ)ujh−τjphω

j(uju) ωj

iid∼ P

Learning rate τj =j+ατ0

Proposition[Martin-Nobile-Krumscheid 2018]

Assumingyω(u?),pω(u?)∈Hr+1(D), for any α∈R+ andτ0> 1 there exist D1,D2>0 independent ofj andhs.t.

SG convergence: E[kuh?−ujhk2]≤D1j1

Error splitting: E[ku?−uhjk2]≤2D1j1+D2h2r+2

Complexity: Work(tol).tol2tolr+1γd (no log terms !)

(30)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability)

uhj+1=ujh−τjufh(uhj, ωj)

= (1−τjβ)ujh−τjphω

j(uju) ωj

iid∼ P

Learning rate τj =j+ατ0

Proposition[Martin-Nobile-Krumscheid 2018]

Assumingyω(u?),pω(u?)∈Hr+1(D), for any α∈R+ andτ0> 1 there exist D1,D2>0 independent ofj andhs.t.

SG convergence: E[kuh?−ujhk2]≤D1j1

Error splitting: E[ku?−uhjk2]≤2D1j1+D2h2r+2

Complexity: Work(tol).tol2tolr+1γd (no log terms !)

(31)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Stochastic gradient (Robbins-Monro)

Instead of introducing upfront the Monte Carlo approximation and then solve the discrete problem by a deterministic iterative solver, we could apply a stochastic gradient method to the continuous problem (non-discrete in probability)

uhj+1=ujh−τjufh(uhj, ωj)

= (1−τjβ)ujh−τjphω

j(uju) ωj

iid∼ P

Learning rate τj =j+ατ0

Proposition[Martin-Nobile-Krumscheid 2018]

Assumingyω(u?),pω(u?)∈Hr+1(D), for any α∈R+ andτ0> 1 there exist D1,D2>0 independent ofj andhs.t.

SG convergence: E[kuh?−ujhk2]≤D1j1

Error splitting: E[ku?−uhjk2]≤2D1j1+D2h2r+2

Complexity: Work(tol).tol2tolr+1γd (no log terms !)

(32)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results

Optimal control problem:

min

uL2(D)

1 2Eω

kyω(u)−ytargetk2

2kuk2 subject to

(−div(a(·, ω)∇yω(u)) =g+u inD

yω(u) = 0 on∂D

Problem parameters

D= (0,1)2, g = 1, ytarget(x1,x2) = sin(πx) sin(πy), β= 104

a(x1,x2,ξ) = 1 + exp{θ(ξ1cos(1.1πx1) +ξ2cos(1.2πx1) +ξ3sin(1.3πx2) +ξ4sin(1.4πx2))}

IsoValue 0.873489 0.891562 0.90361 0.915659 0.927708 0.939756 0.951805 0.963854 0.975903 0.987951 1 1.01205 1.0241 1.03615 1.04819 1.06024 1.07229 1.08434 1.09639 1.12651 delta

IsoValue 0.854672 0.875433 0.889274 0.903114 0.916955 0.930796 0.944637 0.958478 0.972318 0.986159 1 1.01384 1.02768 1.04152 1.05536 1.0692 1.08304 1.09689 1.11073 1.14533 delta

IsoValue 0.739162 0.776425 0.801266 0.826108 0.85095 0.875791 0.900633 0.925475 0.950317 0.975158 1 1.02484 1.04968 1.07453 1.09937 1.12421 1.14905 1.17389 1.19873 1.26084 delta

(33)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results – SG convergence

100 101 102 103

10−2 10−1 100

iteration counterj

E[ku

h ju

h ?k]

E[kuhjuh?k]: SG with fixed mesh fit:error= 100.16547j−0.48555 E[kuhjuh?k] +std(kuhjuh?k)

Mean L2error as a function of iteration counter, estimated by sample average over 100 independent realizations.

Fixed mesh size h= 2−4–P1 finite elements.

(34)

Deterministic (CG) iterative solvers versus Stochastic Gradient

Numerical results – complexity of CG versus SG

100 101 102 103 104 105 106 107 108 109 1010 10−4

10−3 10−2 10−1 100

Work model (γ=1) E[kuu?k]

SGE[kuu?k]

SGE[kuu?k] +std(kuu?k) CGE[kuu?k]

CGE[kuu?k] +std(kuu?k) reference complexity: error W-1/3

Complexity plot for CG and SG (average over 20 repetitions)

(35)

Multilevel stochastic gradient algorithms

Outline

1 Problem setting – quadratic optimal control problem

2 Discretization by finite elements + Monte Carlo

3 Deterministic (CG) iterative solvers versus Stochastic Gradient

4 Multilevel stochastic gradient algorithms

5 Conclusions

(36)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

LetYhr

0⊂Yhr

1⊂. . .⊂Yhr

L be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation

ufh(uj, ωj) by a multilevel approx. of the expectation[Heinrich 1998], [Giles 2008]

EMLMCL, ~N [∇uf(uj,·)] =

L

X

`=0

1 N`

N`

X

i=1

h∇ufh`(uj, ω(i,`)j )− ∇ufh`−1(uj, ωj(i,`))i

(with the convention∇ufh−1= 0) withωj(i,`)iid∼ P (drawn independently between levels and at each iteration)

Lcontrols the bias of the estimator (FE error on levelhL) Eh

EMLMCL, ~N [∇uf(uj,·)]−E[∇uf(uj,·)]i

=E[∇ufhL(uj,·)− ∇uf(uj,·)] N~ = (N0, . . . ,NL) controls the variance of the estimator (MC error)

Varh

EMLMCL, ~N [∇uf(uj,·)]i

=

L

X

`=0

Var

ufh`(uj,·)− ∇ufh`−1(uj,·) N`

(37)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

LetYhr

0⊂Yhr

1⊂. . .⊂Yhr

L be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation

ufh(uj, ωj) by a multilevel approx. of the expectation[Heinrich 1998],[Giles 2008]

EMLMCL, ~N [∇uf(uj,·)] =

L

X

`=0

1 N`

N`

X

i=1

h∇ufh`(uj, ω(i,`)j )− ∇ufh`−1(uj, ωj(i,`))i

(with the convention∇ufh−1= 0) withωj(i,`)iid∼ P (drawn independently between levels and at each iteration)

Lcontrols the bias of the estimator (FE error on levelhL) Eh

EMLMCL, ~N [∇uf(uj,·)]−E[∇uf(uj,·)]i

=E[∇ufhL(uj,·)− ∇uf(uj,·)] N~ = (N0, . . . ,NL) controls the variance of the estimator (MC error)

Varh

EMLMCL, ~N [∇uf(uj,·)]i

=

L

X

`=0

Var

ufh`(uj,·)− ∇ufh`−1(uj,·) N`

(38)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

LetYhr

0⊂Yhr

1⊂. . .⊂Yhr

L be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation

ufh(uj, ωj) by a multilevel approx. of the expectation[Heinrich 1998],[Giles 2008]

EMLMCL, ~N [∇uf(uj,·)] =

L

X

`=0

1 N`

N`

X

i=1

h∇ufh`(uj, ω(i,`)j )− ∇ufh`−1(uj, ωj(i,`))i

(with the convention∇ufh−1= 0) withωj(i,`)iid∼ P (drawn independently between levels and at each iteration)

Lcontrols the bias of the estimator (FE error on levelhL) Eh

EMLMCL, ~N [∇uf(uj,·)]−E[∇uf(uj,·)]i

=E[∇ufhL(uj,·)− ∇uf(uj,·)]

N~ = (N0, . . . ,NL) controls the variance of the estimator (MC error)

Varh

EMLMCL, ~N [∇uf(uj,·)]i

=

L

X

`=0

Var

ufh`(uj,·)− ∇ufh`−1(uj,·) N`

(39)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient

LetYhr

0⊂Yhr

1⊂. . .⊂Yhr

L be a sequence of finer and finer FE spaces.

Idea: In the Stochastic Gradient algorithm, replace the single evaluation

ufh(uj, ωj) by a multilevel approx. of the expectation[Heinrich 1998],[Giles 2008]

EMLMCL, ~N [∇uf(uj,·)] =

L

X

`=0

1 N`

N`

X

i=1

h∇ufh`(uj, ω(i,`)j )− ∇ufh`−1(uj, ωj(i,`))i

(with the convention∇ufh−1= 0) withωj(i,`)iid∼ P (drawn independently between levels and at each iteration)

Lcontrols the bias of the estimator (FE error on levelhL) Eh

EMLMCL, ~N [∇uf(uj,·)]−E[∇uf(uj,·)]i

=E[∇ufhL(uj,·)− ∇uf(uj,·)]

N~ = (N0, . . . ,NL) controls the variance of the estimator (MC error)

Varh

EMLMCL, ~N [∇uf(uj,·)]i

=

L

X

`=0

Var

ufh`(uj,·)− ∇ufh`−1(uj,·) N`

(40)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1=uj−τjEMLMCLj, ~Nj [∇uf(uj,·)]

= (1−τjβ)uj−τj Lj

X

`=0

1 N`,j

N`,j

X

i=1

h

ph`(uj, ωj(i,`))−ph`−1(uj, ω(i,`)j )i

Learning rateτj =j+ατ0 . Notice that∀j, uj+1∈Yhr

Lj

We allowLandN~ to depend on the iteration counterj. How to choose them optimally ?

To recover the optimal controlu? in the limit, we needLj→ ∞asj → ∞. On the other hand,N~ does not need to go to∞.

Similar approach proposed in[Dereich-MuellerGronbach 2015]for abstract optimization problem. Different working assumptions but similar results.

(41)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1=uj−τjEMLMCLj, ~Nj [∇uf(uj,·)]

= (1−τjβ)uj−τj Lj

X

`=0

1 N`,j

N`,j

X

i=1

h

ph`(uj, ωj(i,`))−ph`−1(uj, ω(i,`)j )i

Learning rateτj =j+ατ0 . Notice that∀j, uj+1∈Yhr

Lj

We allowLandN~ to depend on the iteration counterj. How to choose them optimally ?

To recover the optimal controlu? in the limit, we needLj→ ∞asj → ∞. On the other hand,N~ does not need to go to∞.

Similar approach proposed in[Dereich-MuellerGronbach 2015]for abstract optimization problem. Different working assumptions but similar results.

(42)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1=uj−τjEMLMCLj, ~Nj [∇uf(uj,·)]

= (1−τjβ)uj−τj Lj

X

`=0

1 N`,j

N`,j

X

i=1

h

ph`(uj, ωj(i,`))−ph`−1(uj, ω(i,`)j )i

Learning rateτj =j+ατ0 . Notice that∀j, uj+1∈Yhr

Lj

We allowLandN~ to depend on the iteration counterj. How to choose them optimally ?

To recover the optimal controlu? in the limit, we needLj → ∞asj → ∞. On the other hand,N~ does not need to go to∞.

Similar approach proposed in[Dereich-MuellerGronbach 2015]for abstract optimization problem. Different working assumptions but similar results.

(43)

Multilevel stochastic gradient algorithms

Multilevel stochastic gradient algorithm – first version

uj+1=uj−τjEMLMCLj, ~Nj [∇uf(uj,·)]

= (1−τjβ)uj−τj Lj

X

`=0

1 N`,j

N`,j

X

i=1

h

ph`(uj, ωj(i,`))−ph`−1(uj, ω(i,`)j )i

Learning rateτj =j+ατ0 . Notice that∀j, uj+1∈Yhr

Lj

We allowLandN~ to depend on the iteration counterj. How to choose them optimally ?

To recover the optimal controlu? in the limit, we needLj → ∞asj → ∞. On the other hand,N~ does not need to go to∞.

Similar approach proposed in[Dereich-MuellerGronbach 2015]for abstract optimization problem. Different working assumptions but similar results.

Referenzen

ÄHNLICHE DOKUMENTE

Distributed optimal control problems for the time-dependent and the stationary Navier-Stokes equations subject to pointwise control constraints are considered.. Under a

For a special case it is shown that a gradient based algorithm can be used to reconstruct the global minimizer of the transformed and the original functional, respectively.. At the

• This property is exploited in superiorization by using such perturbations to steer the algorithm to an output that is as constraints-compatible as the output of the

Furthermore, a standard way of computing integrals of the form (1) is to apply quasi- Monte Carlo integration by mapping the point set of a given QMC rule from the s- dimensional

Recently, in [17], a non-deterministic algorithm is developed to perform a systematic computer search for polynomials that recursively define explicit asymptotically good sequences

Based on a Border Division Algorithm, a variant of the usual Division Algorithm, we characterize border bases as border prebases with one of the following equivalent properties:

A procedure for the numerical approximation of high-dimensional Hamilton-Jacobi- Bellman (HJB) equations associated to optimal feedback control problems for semilinear para-

Furthermore we present a global in time existence result in the case of particles of same size and diffusivity (in which the system has a full gradient flow structure).. The