Extrapolation techniques and multiparameter treatment for Tikhonov regularization

(1)

Extrapolation techniques and multiparameter treatment for Tikhonov regularization

A review

Michela Redivo Zaglia University of Padua - Italy

Joint work with

C. Brezinski (Lille-France)

G. Rodriguez, S. Seatzu (Cagliari-Italy)

➜ Tikhonov regularization

➜ Extrapolation techniques

➜ Multiparameter regularization

(2)

A

N APPLICATION TO REGULARIZATION

When a p × p system Ax = b is ill-conditioned, its solution cannot be computed accurately.

Tikhonov’s regularization consists in computing the vector x_λ which minimizes the quadratic functional

J(λ, x) = kAx − bk² + λkHxk²

over all vectors x, where λ ≥ 0 is a parameter, and H a given q × p (q ≤ p) matrix. k · k denotes the Euclidean norm.

This vector x_λ is the solution of the system (C + λE)x_λ = A^Tb, where C = A^TA and E = H^TH.

(3)

If λ is close to zero, then x_λ is badly computed while,

if λ is far away from zero, x_λ is well computed but the norm of the error kx − x_λk is quite large.

For decreasing values of λ, the norm of the error kx − x_λk first decreases, and then increases when λ approaches 0.

Thus the norm of the error, which is the sum of the theoretical error and the error due to the computer’s arithmetic, passes through a minimum corresponding to the optimal choice of the regularization parameter λ.

(4)

Several methods have been proposed to obtain an effective choice of λ (i.e. L-curve, Morozov discrepancy principle,

GCV, . . . ). But each of these methods can fail.

Idea: compute x_λ for several values of λ, interpolate by a vector function of λ which mimics the exact behaviour, and then extrapolate at λ = 0.

The main problem is to use a conveniently chosen vector function.

For that purpose, let us study the exact behaviour of x_λ with respect to λ.

(5)

We assume that H is a p × p non singular matrix, and we set y = Hx. Hence

J(λ, x) = kAH⁻¹y − bk² + λkyk²

Using the SVD of AH⁻¹ = UΣV ^T, it can be proved that x_λ = H⁻¹y_λ with

y_λ =

Xp

i=1

σ_iγ_i

σ_i² + λv_i, and γ_i = (u_i, b).

So, we will choose an interpolation function of the same form but with a sum running only from i = 1 to k, with k < p.

Several extrapolation methods based on this idea exist.

They depend whether the vectors v_i are known or not.

We assume, without loss of generality, that H = I (i.e.

x_λ = y_λ).

(6)

E

XTRAPOLATION TECHNIQUES

RESTRICTED CASE: If the vectors v₀, . . . ,v_k are known, we interpolate by the rational function

R_k(λ) =

Xk i=1

a_i

b_i + λ v_i,

where the a_i’s and the b_i’s are 2k unknown scalars.

For determining the parameters a_i and b_i, we impose the interpolation conditions (x_n = x_λ_n)

x_n =

Xk i=1

a_i

b_i + λ_n v_i x_n+1 =

Xk i=1

a_i

b_i + λ_n+1 v_i.

(7)

Then, extrapolating at λ = 0 will give

x ≃ R_k(0) =

Xk

j=1

a_j

b_j v_j = y_k⁽ⁿ⁾.

We assume that vectors w₁, . . . , w_k such that (v_i,w_j) = δij are known, and we obtain

y_k⁽ⁿ⁾ =

Xk j=1

(x_n, w_j)(x_n+1, w_j) λ_n+1 − λ_n

λ_n+1(x_n+1, w_j) − λ_n(x_n, w_j) v_j.

Increasing the value of k, for n fixed we also have y_k+1⁽ⁿ⁾ = y_k⁽ⁿ⁾ + a_k+1

b_k+1 v_k+1, k = 0,1, . . . , p − 1.

(8)

If (v_i, v_j) = δ_ij and w_j = v_j, the process is equivalent to the Truncated SVD (TSVD), that is

y_k⁽ⁿ⁾ =

Xk i=1

γ_i

σ_i v_i, γ_i = (u_i, b).

In this case, we can drop the superscript n since y_k⁽ⁿ⁾ does not depend on n.

We set e_k = x − y_k and r_k = b − Ay_k. It holds

ke_k+1k² = ke_kk² − γ_k+1² /σ_k+1² kr_k+1k² = kr_kk² − γ_k+1² .

(9)

We have also the following identities, ∀k,

ky_kk² =

Xk i=1

γ_i² σ_i²

ky_k+1k² = ky_kk² + γ_k+1² σ_k+1² ke_kk² =

Xp i=k+1

γ_i² σ_i²

(y_k, e_k) = (e_k₋₁ − e_k, e_k) = 0 kxk² = ky_kk² + ke_kk²

kr_kk² =

Xp i=k+1

γ_i².

(10)

E

XTRAPOLATION TECHNIQUES FULL CASE:

If the vectors v_i are unknown, we base the extrapolation process of a rational function of the form

R_k(λ) =

Xk i=1

1

b_i + λ w_i, k ≤ p,

where the b_i’s are unknown numbers and the w_i are unknown vectors.

They are determined, as before, by imposing that x_n = R_k(λ_n)

for some values of n.

Then we extrapolate at the point λ = 0.

(11)

It corresponds to computing

x ≃ y_k = R_k(0) =

Xk

i=1

1

b_i w_i.

R_k can be written as

R_k(λ) = P_k₋₁(λ)/Q_k(λ)

with

P_k₋₁(λ) = α₀ + · · · + α_k₋₁λ^k⁻¹, α_i ∈ R^p Q_k(λ) =

Yk i=1

(b_i + λ) = β₀ + · · · + β_k₋₁λ^k⁻¹ + λ^k, β_i ∈ R.

We gave 6 different algorithms for determining the two unknowns needed α₀ and β₀ (R_k(0) = α₀/β₀).

(12)

Let us describe one of them (the most satisfying one).

We have to solve the interpolation problem

Q_k(λ_i)x_i = P_k₋₁(λ_i), for i = 0, . . . , k − 1.

Since Q_k and P_k₋₁ are polynomials, we have, by Lagrange’s formula

Q_k(λ) =

Xk

i=0

L_i(λ)Q_k(λ_i)

P_k₋₁(λ) =

kX−1 i=0

Lb_i(λ)P_k₋₁(λ_i)

=

kX−1

i=0

Lb_i(λ)Q_k(λ_i)x_i

(13)

with

L_i(λ) = Yk

j=0 j6=i

λ − λ_j

λ_i − λ_j and Lb_i(λ) =

kY−1

j=0 j6=i

λ − λ_j λ_i − λ_j .

Let λ_k 6= λ_j, for j = 0, . . . , k − 1. We have

k−1

X

i=0

Lb_i(λ_k)Q_k(λ_i)x_i = Q_k(λ_k)x_k.

Let s₁, . . . , s_p be linearly independent vectors. Setting

c_i = Q_k(λ_i)/Q_k(λ_k) and multiplying scalarly the preceding equation by s_j, for j = 1, . . . , p, leads to the following linear system

kX−1

i=0

Lb_i(λ_k)(x_i, s_j)c_i = (x_k, s_j), j = 1, . . . , p.

Solving this system in the least squares sense gives c₀, . . . , c_k₋₁.

(14)

Since the polynomial Q_k(λ) = Pk

i=0 L_i(λ)Q_k(λ_i) is monic and c_k = 1, we have a supplementary condition which gives the value Q_k(λ_k). Thus Q_k(λ_i) = c_iQ_k(λ_k), and, finally, β₀ = Q_k(0) is given by

β₀ =

Xk

i=0

L_i(0)Q_k(λ_i).

From what precedes, we see that

α₀ = P_k₋₁(0) =

k−1

X

i=0

Lb_i(0)Q_k(λ_i)x_i

and it follows that

y_k = R_k(0) = P_k₋₁(0)

Q_k(0) = β₀ α₀.

(15)

NUMERICAL EXAMPLES:

A wide numerical experimentation has been performed.

We used several kind of matrices A,

heat, ilaplace, shaw, spikes Hansen Matlab Regularization Toolbox

hilbert, lotkin, moler Matlab (gallery function)

different solutions x_t (t means true solution),

given defined as in the Regularization Toolbox

ones x_i = 1

lin x_i = i

quad x_i = i − _p

2

, i = 1, . . . , p

sin x_i = sin

2π(i − 1)/p

i = 1, . . . , p various matrices H,

I Identity matrix

D₁, D₂, D₃ discrete approximations of the first, second and third derivative

and we also try the case of a noised data vector.

(16)

The tests show the effectiveness of the procedures, but that the best approximation, denoted by x_opt, depends on the values of λ_n chosen for interpolating and that the norm of the error kx_t − x_optkcan be strongly influenced by the choice of the regularizating matrix H.

(17)

M

ULTIPARAMETER REGULARIZATION

A good choice of the matrix H often depends on the mathematical properties of the solution. Using several regularization terms avoids this difficult choice.

We are looking for the vector x_λ which minimizes the quadratic functional

J(λ, x) = k kAx − bk² +

Xk

i=1

λ_ikH_ixk²

! , with λ = (λ₁, . . . , λ_k)^T.

(18)

It is also the solution of the system

C +

Xk i=1

λ_iE_i

!

x_λ = A^Tb,

with C = A^TA and E_i = H_i^TH_i.

We have the following relation between x_λ and x

I +

Xk i=1

λ_iC⁻¹E_i

!

x_λ = x.

We note that x_λ is also the vector minimizing

J(λ, x) =

Xk

i=1

kAx − bk² + kλ_ikH_ixk² .

(19)

Hence, we consider the k vectors x_λ_i solving the minimization problems

minx

kAx − bk² + kλ_ikH_ixk² , i = 1, . . . , k.

These vectors satisfy the linear systems

(C + kλ_iE_i)x_λ_i = A^Tb, i = 1, . . . , k.

Thus we compute k one-parameter regularized solutions,

and, after, we consider the approximation of x_λ given by the linear combination

xe_λ(α) =

Xk

i=1

α_ix_λ_i,

where α = (α₁, . . . , α_k)^T and Pk

i=1 α_i = 1.

(20)

How to choose the vector α?

The following relation between x_λ and xe_λ(α) holds

x_λ = xe_λ(α) + C +

Xk i=1

λ_iE_i

!−1

ρ_λ(α),

where

ρ_λ(α) =

Xk i=1

α_i



kλ_iE_i −

Xk j=1

λ_jE_j



x_λ_i.

The vector α is chosen to minimize ρ_λ(α).

It is the solution of an overdetermined system which is solved in the least-squares sense.

(21)

How to estimate the vector λ?

The parameters λ₁, . . . , λ_k can be chosen according to a test based on a modification of the Generalized Cross Validation.

It consists in minimizing the function

Z(λ) =

Xk i=1

V_i(λ_i)

!²

with V_i(λ) = N_i(λ)

D_i(λ), i = 1, . . . , k,

and

N_i(λ) =





qi

X

j=1

kλc⁽ⁱ⁾_j (γ_j⁽ⁱ⁾)² + kλ

!²



1/2

,

D_i(λ) =

qi

X

j=1

kλ

(γ_j⁽ⁱ⁾)² + kλ.

The c⁽ⁱ⁾_j ’s and the γ_j⁽ⁱ⁾’s are the parameters related to the SVD decompositions.

(22)

NUMERICAL EXAMPLES:

In many tests performed, regularizing with more than one

parameter seems to increase the possibilities of computing a good approximation to the solution of an ill-conditioned

linear system.

Moreover, in general, the error is never worse than the worst one-parameter error.

The algorithm seems enough stable, robust and easy to implement (only the solution of k one-parameter

regularization problems and one additional linear system).

In the sequel some tables reporting the relative errors kx − xke kxk corresponding to the solution obtained with the best λ.

(23)

σ = 0 I D₁ D₂ MP heat

ones 8.7 · 10⁻⁶ 1.4 · 10⁻¹⁴ 2.1 · 10⁻¹⁵ 6.9 · 10⁻¹⁶ lin 3.7 · 10⁻¹ 1.8 · 10⁻² 7.7 · 10⁻¹⁵ 2.3 · 10⁻¹⁵ quad 2.8 · 10⁻⁵ 2.8 · 10⁻⁵ 9.7 · 10⁻³ 9.4 · 10⁻³

sin 9.6 · 10⁻² 6.7 · 10⁻⁶ 5.0 · 10⁻⁷ 5.9 · 10⁻¹⁰ ilaplace

ones 6.1 · 10⁻¹ 5.3 · 10⁻¹⁵ 1.7 · 10⁻¹⁵ 9.8 · 10⁻¹⁶ lin 8.4 · 10⁻¹ 2.8 · 10⁻¹ 1.1 · 10⁻¹⁵ 2.8 · 10⁻¹⁵ quad 7.9 · 10⁻¹ 7.2 · 10⁻¹ 6.7 · 10⁻¹ 5.6 · 10⁻¹

sin 7.1 · 10⁻¹ 5.2 · 10⁻¹ 2.1 · 10⁻¹ 3.7 · 10⁻¹ lotkin

lin 1.1 · 10⁻⁶ 1.1 · 10⁻⁶ 1.6 · 10⁻¹³ 1.7 · 10⁻¹⁵ quad 4.5 · 10⁻⁶ 4.5 · 10⁻⁶ 1.1 · 10⁻³ 2.4 · 10⁻⁴

sin 2.2 · 10⁻⁴ 2.2 · 10⁻⁴ 2.1 · 10⁻³ 5.5 · 10⁻⁴ moler

ones 1.6 · 10⁻⁸ 3.2 · 10⁻¹⁴ 2.1 · 10⁻¹⁴ 1.9 · 10⁻¹⁴ quad 5.9 · 10⁻⁸ 8.5 · 10⁻⁴ 1.3 · 10⁻³ 9.4 · 10⁻¹¹ sin 5.7 · 10⁻⁴ 3.3 · 10⁻⁴ 2.4 · 10⁻⁴ 1.3 · 10⁻⁷

(24)

σ = 10⁻⁶ I D₁ D₂ MP heat

ones 1.1 · 10⁻⁴ 1.1 · 10⁻⁴ 1.7 · 10⁻⁶ 5.3 · 10⁻⁸ lin 1.9 · 10⁻⁴ 1.9 · 10⁻⁴ 2.8 · 10⁻⁶ 2.8 · 10⁻⁶ quad 2.5 · 10⁻⁴ 2.5 · 10⁻⁴ 9.7 · 10⁻³ 9.4 · 10⁻³ sin 9.6 · 10⁻² 8.7 · 10⁻² 1.8 · 10⁻² 1.6 · 10⁻⁴ ilaplace

ones 7.2 · 10⁻¹ 1.7 · 10⁻⁵ 4.4 · 10⁻⁷ 5.7 · 10⁻⁷ lin 9.4 · 10⁻¹ 4.2 · 10⁻¹ 7.3 · 10⁻⁷ 7.3 · 10⁻⁷ quad 7.9 · 10⁻¹ 6.0 · 10⁻¹ 1.1 1.3 · 10⁻¹ sin 7.1 · 10⁻¹ 4.1 · 10⁻¹ 5.1 · 10⁻¹ 3.4 · 10⁻¹ lotkin

lin 2.8 · 10⁻³ 2.8 · 10⁻³ 3.4 · 10⁻⁷ 3.4 · 10⁻⁷ quad 4.1 · 10⁻² 4.1 · 10⁻² 2.3 · 10⁻² 2.3 · 10⁻² sin 7.3 · 10⁻² 7.3 · 10⁻² 4.7 · 10⁻² 4.7 · 10⁻² moler

ones 6.4 · 10⁻⁶ 3.4 · 10⁻⁷ 2.0 · 10⁻⁸ 2.7 · 10⁻¹⁰ quad 3.5 · 10⁻⁶ 3.5 · 10⁻⁶ 8.9 · 10⁻⁷ 7.6 · 10⁻⁷

sin 6.6 · 10⁻⁷ 2.9 · 10⁻⁶ 1.1 · 10⁻¹ 4.9 · 10⁻⁷

(25)

F

UTURE WORKS

➜ Choice of the regularization parameters (in the multiparameter regularization) by estimation of the errors, following the work of C. Brezinski, G. Rodriguez, S. Seatzu, Error estimates for linear systems with applications to regularization, Numer. Algorithms, 49 (2008) 85–104.

➜ Multiparameter regularization of least squares problems, following the work of

C. Brezinski, G. Rodriguez, S. Seatzu, Error estimates for the

regularization of least squares problems, Numer. Algorithms, 51 (2009) 61–76.

(26)

References

C. Brezinski, M. Redivo Zaglia, G. Rodriguez, S. Seatzu

Extrapolation techniques for ill–conditioned linear systems Numer. Math., 81 (1998) 1–29.

C. Brezinski, M. Redivo Zaglia, G. Rodriguez, S. Seatzu

Multi-parameter regularization techniques for ill-conditioned linear systems

Numer. Math., 94 (2003) 203–228.