Extrapolation techniques and multiparameter treatment for Tikhonov regularization
A review
Michela Redivo Zaglia University of Padua - Italy
Joint work with
C. Brezinski (Lille-France)
G. Rodriguez, S. Seatzu (Cagliari-Italy)
➜ Tikhonov regularization
➜ Extrapolation techniques
➜ Multiparameter regularization
A
N APPLICATION TO REGULARIZATIONWhen a p × p system Ax = b is ill-conditioned, its solution cannot be computed accurately.
Tikhonov’s regularization consists in computing the vector xλ which minimizes the quadratic functional
J(λ, x) = kAx − bk2 + λkHxk2
over all vectors x, where λ ≥ 0 is a parameter, and H a given q × p (q ≤ p) matrix. k · k denotes the Euclidean norm.
This vector xλ is the solution of the system (C + λE)xλ = ATb, where C = ATA and E = HTH.
If λ is close to zero, then xλ is badly computed while,
if λ is far away from zero, xλ is well computed but the norm of the error kx − xλk is quite large.
For decreasing values of λ, the norm of the error kx − xλk first decreases, and then increases when λ approaches 0.
Thus the norm of the error, which is the sum of the theoretical error and the error due to the computer’s arithmetic, passes through a minimum corresponding to the optimal choice of the regularization parameter λ.
Several methods have been proposed to obtain an effective choice of λ (i.e. L-curve, Morozov discrepancy principle,
GCV, . . . ). But each of these methods can fail.
Idea: compute xλ for several values of λ, interpolate by a vector function of λ which mimics the exact behaviour, and then extrapolate at λ = 0.
The main problem is to use a conveniently chosen vector function.
For that purpose, let us study the exact behaviour of xλ with respect to λ.
We assume that H is a p × p non singular matrix, and we set y = Hx. Hence
J(λ, x) = kAH−1y − bk2 + λkyk2
Using the SVD of AH−1 = UΣV T, it can be proved that xλ = H−1yλ with
yλ =
Xp
i=1
σiγi
σi2 + λvi, and γi = (ui, b).
So, we will choose an interpolation function of the same form but with a sum running only from i = 1 to k, with k < p.
Several extrapolation methods based on this idea exist.
They depend whether the vectors vi are known or not.
We assume, without loss of generality, that H = I (i.e.
xλ = yλ).
E
XTRAPOLATION TECHNIQUESRESTRICTED CASE: If the vectors v0, . . . ,vk are known, we interpolate by the rational function
Rk(λ) =
Xk i=1
ai
bi + λ vi,
where the ai’s and the bi’s are 2k unknown scalars.
For determining the parameters ai and bi, we impose the interpolation conditions (xn = xλn)
xn =
Xk i=1
ai
bi + λn vi xn+1 =
Xk i=1
ai
bi + λn+1 vi.
Then, extrapolating at λ = 0 will give
x ≃ Rk(0) =
Xk
j=1
aj
bj vj = yk(n).
We assume that vectors w1, . . . , wk such that (vi,wj) = δij are known, and we obtain
yk(n) =
Xk j=1
(xn, wj)(xn+1, wj) λn+1 − λn
λn+1(xn+1, wj) − λn(xn, wj) vj.
Increasing the value of k, for n fixed we also have yk+1(n) = yk(n) + ak+1
bk+1 vk+1, k = 0,1, . . . , p − 1.
If (vi, vj) = δij and wj = vj, the process is equivalent to the Truncated SVD (TSVD), that is
yk(n) =
Xk i=1
γi
σi vi, γi = (ui, b).
In this case, we can drop the superscript n since yk(n) does not depend on n.
We set ek = x − yk and rk = b − Ayk. It holds
kek+1k2 = kekk2 − γk+12 /σk+12 krk+1k2 = krkk2 − γk+12 .
We have also the following identities, ∀k,
kykk2 =
Xk i=1
γi2 σi2
kyk+1k2 = kykk2 + γk+12 σk+12 kekk2 =
Xp i=k+1
γi2 σi2
(yk, ek) = (ek−1 − ek, ek) = 0 kxk2 = kykk2 + kekk2
krkk2 =
Xp i=k+1
γi2.
E
XTRAPOLATION TECHNIQUES FULL CASE:If the vectors vi are unknown, we base the extrapolation process of a rational function of the form
Rk(λ) =
Xk i=1
1
bi + λ wi, k ≤ p,
where the bi’s are unknown numbers and the wi are unknown vectors.
They are determined, as before, by imposing that xn = Rk(λn)
for some values of n.
Then we extrapolate at the point λ = 0.
It corresponds to computing
x ≃ yk = Rk(0) =
Xk
i=1
1
bi wi.
Rk can be written as
Rk(λ) = Pk−1(λ)/Qk(λ)
with
Pk−1(λ) = α0 + · · · + αk−1λk−1, αi ∈ Rp Qk(λ) =
Yk i=1
(bi + λ) = β0 + · · · + βk−1λk−1 + λk, βi ∈ R.
We gave 6 different algorithms for determining the two unknowns needed α0 and β0 (Rk(0) = α0/β0).
Let us describe one of them (the most satisfying one).
We have to solve the interpolation problem
Qk(λi)xi = Pk−1(λi), for i = 0, . . . , k − 1.
Since Qk and Pk−1 are polynomials, we have, by Lagrange’s formula
Qk(λ) =
Xk
i=0
Li(λ)Qk(λi)
Pk−1(λ) =
kX−1 i=0
Lbi(λ)Pk−1(λi)
=
kX−1
i=0
Lbi(λ)Qk(λi)xi
with
Li(λ) = Yk
j=0 j6=i
λ − λj
λi − λj and Lbi(λ) =
kY−1
j=0 j6=i
λ − λj λi − λj .
Let λk 6= λj, for j = 0, . . . , k − 1. We have
k−1
X
i=0
Lbi(λk)Qk(λi)xi = Qk(λk)xk.
Let s1, . . . , sp be linearly independent vectors. Setting
ci = Qk(λi)/Qk(λk) and multiplying scalarly the preceding equation by sj, for j = 1, . . . , p, leads to the following linear system
kX−1
i=0
Lbi(λk)(xi, sj)ci = (xk, sj), j = 1, . . . , p.
Solving this system in the least squares sense gives c0, . . . , ck−1.
Since the polynomial Qk(λ) = Pk
i=0 Li(λ)Qk(λi) is monic and ck = 1, we have a supplementary condition which gives the value Qk(λk). Thus Qk(λi) = ciQk(λk), and, finally, β0 = Qk(0) is given by
β0 =
Xk
i=0
Li(0)Qk(λi).
From what precedes, we see that
α0 = Pk−1(0) =
k−1
X
i=0
Lbi(0)Qk(λi)xi
and it follows that
yk = Rk(0) = Pk−1(0)
Qk(0) = β0 α0.
NUMERICAL EXAMPLES:
A wide numerical experimentation has been performed.
We used several kind of matrices A,
heat, ilaplace, shaw, spikes Hansen Matlab Regularization Toolbox
hilbert, lotkin, moler Matlab (gallery function)
different solutions xt (t means true solution),
given defined as in the Regularization Toolbox
ones xi = 1
lin xi = i
quad xi = i − p
2
2
, i = 1, . . . , p
sin xi = sin
2π(i − 1)/p
i = 1, . . . , p various matrices H,
I Identity matrix
D1, D2, D3 discrete approximations of the first, second and third derivative
and we also try the case of a noised data vector.
The tests show the effectiveness of the procedures, but that the best approximation, denoted by xopt, depends on the values of λn chosen for interpolating and that the norm of the error kxt − xoptkcan be strongly influenced by the choice of the regularizating matrix H.
M
ULTIPARAMETER REGULARIZATIONA good choice of the matrix H often depends on the mathematical properties of the solution. Using several regularization terms avoids this difficult choice.
We are looking for the vector xλ which minimizes the quadratic functional
J(λ, x) = k kAx − bk2 +
Xk
i=1
λikHixk2
! , with λ = (λ1, . . . , λk)T.
It is also the solution of the system
C +
Xk i=1
λiEi
!
xλ = ATb,
with C = ATA and Ei = HiTHi.
We have the following relation between xλ and x
I +
Xk i=1
λiC−1Ei
!
xλ = x.
We note that xλ is also the vector minimizing
J(λ, x) =
Xk
i=1
kAx − bk2 + kλikHixk2 .
Hence, we consider the k vectors xλi solving the minimization problems
minx
kAx − bk2 + kλikHixk2 , i = 1, . . . , k.
These vectors satisfy the linear systems
(C + kλiEi)xλi = ATb, i = 1, . . . , k.
Thus we compute k one-parameter regularized solutions,
and, after, we consider the approximation of xλ given by the linear combination
xeλ(α) =
Xk
i=1
αixλi,
where α = (α1, . . . , αk)T and Pk
i=1 αi = 1.
How to choose the vector α?
The following relation between xλ and xeλ(α) holds
xλ = xeλ(α) + C +
Xk i=1
λiEi
!−1
ρλ(α),
where
ρλ(α) =
Xk i=1
αi
kλiEi −
Xk j=1
λjEj
xλi.
The vector α is chosen to minimize ρλ(α).
It is the solution of an overdetermined system which is solved in the least-squares sense.
How to estimate the vector λ?
The parameters λ1, . . . , λk can be chosen according to a test based on a modification of the Generalized Cross Validation.
It consists in minimizing the function
Z(λ) =
Xk i=1
Vi(λi)
!2
with Vi(λ) = Ni(λ)
Di(λ), i = 1, . . . , k,
and
Ni(λ) =
qi
X
j=1
kλc(i)j (γj(i))2 + kλ
!2
1/2
,
Di(λ) =
qi
X
j=1
kλ
(γj(i))2 + kλ.
The c(i)j ’s and the γj(i)’s are the parameters related to the SVD decompositions.
NUMERICAL EXAMPLES:
In many tests performed, regularizing with more than one
parameter seems to increase the possibilities of computing a good approximation to the solution of an ill-conditioned
linear system.
Moreover, in general, the error is never worse than the worst one-parameter error.
The algorithm seems enough stable, robust and easy to implement (only the solution of k one-parameter
regularization problems and one additional linear system).
In the sequel some tables reporting the relative errors kx − xke kxk corresponding to the solution obtained with the best λ.
σ = 0 I D1 D2 MP heat
ones 8.7 · 10−6 1.4 · 10−14 2.1 · 10−15 6.9 · 10−16 lin 3.7 · 10−1 1.8 · 10−2 7.7 · 10−15 2.3 · 10−15 quad 2.8 · 10−5 2.8 · 10−5 9.7 · 10−3 9.4 · 10−3
sin 9.6 · 10−2 6.7 · 10−6 5.0 · 10−7 5.9 · 10−10 ilaplace
ones 6.1 · 10−1 5.3 · 10−15 1.7 · 10−15 9.8 · 10−16 lin 8.4 · 10−1 2.8 · 10−1 1.1 · 10−15 2.8 · 10−15 quad 7.9 · 10−1 7.2 · 10−1 6.7 · 10−1 5.6 · 10−1
sin 7.1 · 10−1 5.2 · 10−1 2.1 · 10−1 3.7 · 10−1 lotkin
lin 1.1 · 10−6 1.1 · 10−6 1.6 · 10−13 1.7 · 10−15 quad 4.5 · 10−6 4.5 · 10−6 1.1 · 10−3 2.4 · 10−4
sin 2.2 · 10−4 2.2 · 10−4 2.1 · 10−3 5.5 · 10−4 moler
ones 1.6 · 10−8 3.2 · 10−14 2.1 · 10−14 1.9 · 10−14 quad 5.9 · 10−8 8.5 · 10−4 1.3 · 10−3 9.4 · 10−11 sin 5.7 · 10−4 3.3 · 10−4 2.4 · 10−4 1.3 · 10−7
σ = 10−6 I D1 D2 MP heat
ones 1.1 · 10−4 1.1 · 10−4 1.7 · 10−6 5.3 · 10−8 lin 1.9 · 10−4 1.9 · 10−4 2.8 · 10−6 2.8 · 10−6 quad 2.5 · 10−4 2.5 · 10−4 9.7 · 10−3 9.4 · 10−3 sin 9.6 · 10−2 8.7 · 10−2 1.8 · 10−2 1.6 · 10−4 ilaplace
ones 7.2 · 10−1 1.7 · 10−5 4.4 · 10−7 5.7 · 10−7 lin 9.4 · 10−1 4.2 · 10−1 7.3 · 10−7 7.3 · 10−7 quad 7.9 · 10−1 6.0 · 10−1 1.1 1.3 · 10−1 sin 7.1 · 10−1 4.1 · 10−1 5.1 · 10−1 3.4 · 10−1 lotkin
lin 2.8 · 10−3 2.8 · 10−3 3.4 · 10−7 3.4 · 10−7 quad 4.1 · 10−2 4.1 · 10−2 2.3 · 10−2 2.3 · 10−2 sin 7.3 · 10−2 7.3 · 10−2 4.7 · 10−2 4.7 · 10−2 moler
ones 6.4 · 10−6 3.4 · 10−7 2.0 · 10−8 2.7 · 10−10 quad 3.5 · 10−6 3.5 · 10−6 8.9 · 10−7 7.6 · 10−7
sin 6.6 · 10−7 2.9 · 10−6 1.1 · 10−1 4.9 · 10−7
F
UTURE WORKS➜ Choice of the regularization parameters (in the multiparameter regularization) by estimation of the errors, following the work of C. Brezinski, G. Rodriguez, S. Seatzu, Error estimates for linear systems with applications to regularization, Numer. Algorithms, 49 (2008) 85–104.
➜ Multiparameter regularization of least squares problems, following the work of
C. Brezinski, G. Rodriguez, S. Seatzu, Error estimates for the
regularization of least squares problems, Numer. Algorithms, 51 (2009) 61–76.
References
C. Brezinski, M. Redivo Zaglia, G. Rodriguez, S. Seatzu
Extrapolation techniques for ill–conditioned linear systems Numer. Math., 81 (1998) 1–29.
C. Brezinski, M. Redivo Zaglia, G. Rodriguez, S. Seatzu
Multi-parameter regularization techniques for ill-conditioned linear systems
Numer. Math., 94 (2003) 203–228.