Expokit - Matrix exponential software package for dense and sparse matrices in Fortran and Matlab, embeddadble in C/C++

	Home \| Download \| Support
	Understanding Expokit
	Fundamentals Numerical example Step-by-step integration

Fundamentals

The solution of the problem

{
{
{
{
{

dw(t)

Aw(t) + u,

[0, T]

w(0)

initial condition.

(1)

is known to be

w(t) = e^tAv + t

(tA)u.

(2)

where

(x) = (e^x-1)/x.

When u = 0, the solution is simply w(t) = exp(tA)v. Expokit provides user-friendly routines (in Fortran 77 and Matlab) for addressing either situation (case u = 0 and case u 0). Routines for computing small matrix exponentials in full are provided as well. The backbone of the sparse routines consists of Krylov subspace projection methods (Arnoldi and Lanczos processes) and that is why the toolkit is capable of coping with sparse matrices of very large dimension. The package handles real and complex matrices and provides specific routines for symmetric and Hermitian matrices. When dealing with Markov chains, the computation of the matrix exponential is subject to probabilistic constraints. In addition to addressing general matrix exponentials, a distinct attention is assigned to the computation of transient states of Markov chains.

The purpose of this cursory introduction is to let the reader catch a glimpse of the fundamental elements that make Expokit so fast and robust. The large sparse matrix exponential situation (case u = 0) is taken as the basis of the exposition.

To compute w(t) = exp(tA)v, the Krylov-based algorithm of Expokit purposely sets out to compute the matrix exponential times a vector rather than the matrix exponential in isolation.

Given an integer m, the Taylor expansion

w(t) = e^tAv = v +

(tA)

v +

(tA)²

v + ...

(3)

can be truncated at order m-1, thereby yielding a polynomial approximation of degree m-1

c₀v + c₁(tA)v + c₂(tA)²v + ... + c_m-1(tA)^m-1v

with coefficients {c_i = 1/i!} that approximates the vector. However these are not necessarily the best coefficients and one can set out to look for a better linear combination (viz. polynomial). All polynomial approximations of degree at most m-1 (including therefore the truncated Taylor polynomial as well as the optimal polynomial approximation) are elements of the Krylov subspace of dimension m, defined as

K_m(tA,v) = Span{v, (tA)v, ..., (tA)^m-1v}.

(4)

Hence it is more general to state the problem as that of finding an element of K_m(tA,v) that approximates w(t).

[V,H,] = arnodi(A,v,m)

:= ||v||₂
v₁ := v/;
for j := 1:m do
      p := Av_j
      for i := 1:j do
            h_ij := v_i^Tp
            p := p - h_ijv_i
      end
      h_j+1,j := ||p||₂
      v_j+1 := p/h_j+1,j
end

Elements of the Krylov subspace are better manipulated via their representation onto an orthonormal basis. The Arnoldi procedure outlined on the right is a convenient procedure which constructs such a basis. This procedure is mathematically equivalent (but numerically superior by far) to the Modified Gram-Schmidt procedure over the power sequence [v, (tA)v, ..., (tA)^m-1v].

From the starting vector v, if the Arnoldi procedure is applied simply with A (rather than tA), then upon completion it builds two matrices

V_m+1 = [v₁, v₂, ..., v_m+1]

R^n×(m+1) and

H	= [h_ij] R^(m+1)×m

satisfying the fundamental relations:

AV_m

V_m+1

= V_mH_m + h_m+1,mv_m+1e_m^T

(5)

V_m^TAV_m

H_m.

(6)

For j = 1, ..., m+1, the subset V_j = [v₁, ..., v_j]

R^n×j is an orthonormal basis of K_j(A,v), i.e., V_j^TV_j = I. The first basis vector v₁ is the normalized v so that in particular v =

V_me₁, where

= ||v||₂, e_i is the i-th unit basis vector and throughout this exposition, it is assumed that its length is defined acccording to its context.

The matrices

and

H_m

have a special form

[
[
[
[
[
[
[

H_m

0 ... 0 h_m+1,m

]
]
]
]
]
]
]

R^(m+1)×m; H_m =

[
[
[
[
[
[
[
[
[

h₁₁

h₁₂

h₁₃

...

h_1m

h₂₁

h₂₂

h₂₃

...

h_2m

^··_·

h_m,m-1

h_mm

]
]
]
]
]
]
]
]
]

R^m×m

H_m = V_m^TAV_m is a Hessenberg matrix (i.e., triangular with an extra sub-diagonal) and it represents the projection of the operator A onto the Krylov subspace with respect to the basis V_m.

For any arbitrary scalar , K_m(A,v) = K_m(A,v), and in fact, multiplying (5) and (6) by shows that it is straightforward to extend these relations to A using Hbar and H_m with V_m+1 unchanged. Hence there is no loss of generality when applying the Arnoldi procedure directly to A.

These key observations are vital to the overall Krylov solution technique.

Let w_opt be the optimal Krylov approximation in the least squares sense to w() = exp(A)v. Since V_m is a basis of the Krylov subspace, w_opt = V_my_opt with y_opt R^m. Thus by defintion we have

||w_opt - w(

)||₂ =

min
x

K_m(

A,v)

||x-w(

)||₂ =

min
y

R^m

||V_my - w(

)||₂.

The linear least squares problem defined by the last expression is full-rank and it is well-known that its solution y_opt can be stated in terms of V_m⁺ (the Moore-Penrose inverse of V_m) as y_opt = V_m⁺w(

) = (V_m^TV_m)^-1V_m^Tw(

) = V_m^Texp(

A)v. Using the fact that v =

V_me₁, it follows that w_opt = V_my_opt =

V_m(V_m^Texp(

A)V_m) e₁. This relation characterizes the optimal approximation but it is awkward since it requires exp(

A). However, we may approximate V_m^Texp(

A)V_m by exp(

V_m^TAV_m). In other words, the projection of the exponential operator exp(

A) with respect to the basis V_m is approximated by the exponential of the projection of the operator

A with respect to the same basis. Now recall (6), we then end up with the approximation

exp(

A)v

V_m exp(

H_m)e₁.

(7)

Moreover, we can make further use of the ingredients computed by the Arnoldi procedure to define an improved approximation

exp(

A)v

V_m+1 exp(

_m+1)e₁

(8)

where

_m+1 = [

| 0]

R^(m+1)×(m+1).

Thus the distinctive feature in these Krylov approximations is that the original large problem of size n is converted to a small problem which is more desirable (usually m << 50 whilst n can exceed many thousands). The computation of the reduced-size problem is done with classical dense methods, such as the Padé method. The mathematical basis of these approximations has been documented [Gallopoulous and Saad, Lubich et al., Philippe and Sidje]. In particular, Saad has shown that these are indeed polynomial approximations. Specifically,

V_mexp(

H_m)e₁ = p_m-1(

A)v where p_m-1 is in fact the Hermite interpolation polynomial (of degree m-1) that interpolates the exponential function at Eig(

H_m), the set of eigenvalues of

H_m, and

V_m+1exp(

_m+1)e₁ = p_m(

A)v

where here p_m is the Hermite interpolation polynomial (of degree m) that interpolates the exponential function at Eig(

H_m)

{0}.

It is known from the theory of matrix functions that exp(A)v = p_n-1(A)v where p_n-1 is the Hermite interpolation polynomial of the exponential function at Eig(A). From the theory of Krylov subspaces, Eig(H_m) not only approximates a larger subset of Eig(A) as m increases but also approximates it more accurately. These results further provide a stimulus for using these approximations. In our implementation, we use the corrected approximation (8) exclusively.

Numerical example

Spectrum

Errors
Errors

For a random v and A of order 500 whose spectrum Eig(A) is given on the first figure, the second figure shows three error curves corresponding to ||w_exact - w_m^thapprox||₂, with w_exact = exp(A)v and w_m^thapprox is either the m-th degree optimal approximation, the m-th degree Krylov approximation in (8), or the m-th degree truncated Taylor approximation. As m gradually increases from 1 to 30, the optimal error and the Krylov error decrease significantly from 10⁶ to 10^-6.

As illustrated graphically, this relatively easy to get Krylov approximation is not the optimal one but as m augments, it quickly draws close to the optimal and it is better than the m-fold Taylor expansion - which highlights that for the same amount of matrix-vector products, the Krylov approximation is better than the Taylor approximation. However, the Krylov approach needs more room (especially to store V R^n×(m+1)) and involves more local calculation (notably the Arnoldi sweeps needed to construct V and Hbar). Hochbruck and Lubich have shown that the error in the Krylov approximation behaves like O(e^-||A||₂[(||A||₂e)/ m]^m) when m 2||A||₂.

Krylov approximation
Computes w(t) = exp(tA)v
w := v;
t_k := 0;
while t_k < t do
      [V,H,] = arnoldi(A,w,m)
      repeat
            := stepsize;
            w := Vexp(H)e₁;
            errloc := local error estimate;
      until errloc <= 1.2tol;
      t_k := t_k + ;
end

Step-by-Step Integration

Given that in reality t||A|| can be large, w(t) = exp(tA)v is not computed in one go. Instead, as shown below, a step-by-step integration similar to that of a standard ODE solver is used:

{
{
{

w(0)

w(t_k+1)

w(t_k+

_k) = exp((t_k+

_k)A)v = exp(

_kA)w(t_k)

(9)

where

k = 0, 1, ..., s,

_k = t_k+1-t_k, 0 = t₀ < t₁ < ... < t_s < t_s+1 = t.

Consequently, in the course of the integration, one can output intermediate discrete observations (if they are needed) at no extra cost. The {_k} are step-sizes that are selected automatically within the code to ensure stability and accuracy. This time-stepping selection is done in conjunction with error estimations. Nevertheless it remains clear from (9) that the crux of the problem at each stage is an operation of the form exp(A)v, albeit with different 's and v's. The selection of a specific step-size is made so that exp(A)v is now effectively approximated by (8) using information of the current Arnoldi process for the stage. Following the procedures of ODEs solvers, an a posteriori error control is carried out to ensure that the intermediate approximation is acceptable with respect to expectations on the global error. Further information and references may be obtained from the Expokit documentation.

Fundamentals

Numerical example

Krylov approximation Computes w(t) = exp(tA)v

Step-by-Step Integration

Krylov approximation
Computes w(t) = exp(tA)v