Analytic function of a matrix: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile app edit Android app edit
Citation bot (talk | contribs)
Altered author-link. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Matrix theory | #UCB_Category 110/117
 
(28 intermediate revisions by 17 users not shown)
Line 1:
{{short description|Function that maps matrices to matrices}}
In [[mathematics]], every [[analytic function]] can be used for defining a '''matrix function''' isthat amaps [[functionsquare (mathematics)|functionmatrices]] whichwith mapscomplex aentries [[matrixto (mathematics)|matrix]]square tomatrices anotherof matrixthe same size.
 
This is used for defining the [[exponential of a matrix]], which is involved in the [[closed-form expression|closed-form]] solution of systems of [[linear differential equation]]s.
 
== Extending scalar function to matrix functions ==
Line 7 ⟶ 9:
 
=== Power series ===
If the real[[analytic function]] {{mvar|f}} has the [[Taylor expansion]]
:<math display="block">f(x) = f(0)c_0 + f'(0)\cdotc_1 x + f''(0)\cdotc_2 \frac{x^2}{2!} + \cdots</math>
then a matrix function <math>A\mapsto f(A)</math> can be defined by substituting {{mvar|x}} by a [[square matrix]]: the powers become [[Matrix multiplication#Powers of matricesa matrix|matrix power]]s, the additions become matrix sums and the multiplications by coefficients become scaling[[scalar operationsmultiplication]]s. If the real series converges for <math>|x| < r</math>, then the corresponding matrix series will convergeconverges for matrix argument matrices {{mvar|A}} such ifthat <math>\|A\| < r</math> for some [[matrix norm]] <math>\|\cdot\|</math> whichthat satisfies <math>\|AB\|\leq \|A\|\cdot\|B\|</math>.
 
===Diagonalizable matrices===
IfA thesquare matrix {{mvar|A}} is [[diagonalizable matrix|diagonalizable]], theif problemthere mayis bean reduced[[invertible tomatrix]] an{{mvar|P}} arraysuch ofthat the<math>D function= onP^{-1}\,A\,P</math> eachis a [[diagonal matrix]], that is, {{mvar|D}} has the eigenvalue.shape
:<math>f(A) display="block">D= P \begin{bmatrix}
This is to say we can find a matrix {{mvar|P}} and a [[diagonal matrix]] {{mvar|D}}
f(d_1) & \dotscdots & 0 \\
such that <math>A = P~ D~ P^{-1}</math>.
Applying the power series definition to this decomposition, we find that {{math|''f''(''A'')}} is defined by
:<math>f(A) = P \begin{bmatrix}
f(d_1) & \dots & 0 \\
\vdots & \ddots & \vdots \\
0 & \dotscdots & f(d_n)
\end{bmatrix} P^{-1} ~, .</math>
 
where <math>d_1, \dots, d_n</math> denote the diagonal entries of ''D''.
such thatAs <math>A = P~ \,D~ \,P^{-1},</math>. it is natural to set
:<math display="block"> f (A)=P\left(, \begin{bmatrix}
f(d_1) & \cdots & 0 \\
0 & 0 & \ddotsvdots & \ddots & \vdots \\
0 & \cdots & f(d_n)
\end{arraybmatrix}\,P^{-1}.</math>
 
It can be verified that the matrix {{math|''f''(''A'')}} does not depend on a particular choice of {{mvar|P}}.
 
For example, suppose one is seeking <math>\Gamma(A!) = \Gamma(A+-1)!</math> for
:<math display="block">A= = \begin{bmatrix}
1&3\\
2&1
\end{bmatrix} ~. </math>
 
One has
:<math display="block">A = P \begin{bmatrix}
1-\sqrt{6}& 0 \\
0 & 1+ \sqrt{6}
\end{bmatrix} P^{-1}~, </math>
for
:<math display="block"> P= \begin{bmatrix}
1/2 & 1/2 \\
-\frac{1}{\sqrt{6}} &\frac{1}{\sqrt{6}}
\end{bmatrix} ~. </math>
 
Application of the formula then simply yields
:<math display="block"> \Gamma(A!) = \begin{bmatrix}
1/2 & 1/2 \\
-\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} }
\end{bmatrix} \cdot
\begin{bmatrix}
\Gamma(21-\sqrt{6}) & 0\\
0&\Gamma(21+\sqrt{6})
\end{bmatrix} \cdot
\begin{bmatrix}
1 & -\sqrt{6}/2 \\
1 & \sqrt{6}/2
\end{bmatrix} \approx
\begin{bmatrix} 32.6274 8114 & 0.4080 8.8423\\
50.89492720 & 32.62748114
\end{bmatrix} ~.
</math>
 
Likewise,
:<math display="block"> A^4 = \begin{bmatrix}
1/2 & 1/2 \\
-\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} }
\end{bmatrix} \cdot
\begin{bmatrix}
(1-\sqrt{6})^4 & 0\\
0&(1+\sqrt{6})^4
\end{bmatrix} \cdot
\begin{bmatrix}
1 & -\sqrt{6}/2 \\
1 & \sqrt{6}/2
\end{bmatrix} =
\begin{bmatrix} 73 & 84\\
56 & 73
\end{bmatrix} ~.
</math>
 
=== Jordan decomposition ===
{{main|Jordan normal form}}
All complex matrices, whether they are diagonalizable or not, have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>, where the matrix ''J'' consists of [[Jordan block]]s. Consider these blocks separately and apply the [[power series]] to a Jordan block:
<math display="block"> f \left( \begin{bmatrix}
have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>,
\lambda & 1 & 0 & \cdots & 0 \\
where the matrix ''J'' consists of [[Jordan block]]s.
0 & \lambda & 1 & \vdots & \vdots \\
Consider these blocks separately and apply the power series to a Jordan block:
\vdots0 & \ldots0 & \ddots & \lambdaddots & 1\vdots \\
:<math> f \left( \begin{bmatrix}
\lambdavdots & 1\cdots & 0\ddots & \ldotslambda & 01 \\
0 & \lambdacdots & 1\cdots & \vdots0 & \vdots \\lambda
0 & 0 & \ddots & \ddots & \vdots \\
\vdots & \ldots & \ddots & \lambda & 1 \\
0 & \ldots & \ldots & 0 & \lambda
\end{bmatrix} \right) =
 
\begin{bmatrix}
\frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \frac{f''(\lambda)}{2!} & \ldotscdots & \frac{f^{(n-1)}(\lambda)}{(n-1)!} \\
0 & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \vdots & \frac{f^{(n-12)}(\lambda)}{(n-12)!} \\
0 & 0 & \ddots & \ddots & \vdots \\
\vdots & \ldotscdots & \ddots & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} \\
0 & \ldotscdots & \ldotscdots & 0 & \frac{f(\lambda)}{0!}
\end{bmatrix}.
</math>
 
This definition can be used to extend the ___domain of the matrix function
beyond the set of matrices with [[spectral radius]] smaller than the radius of convergence of the power series.
Note that there is also a connection to [[Divided difference#Polynomials and power series|divided differences]].
 
Line 107 ⟶ 111:
==== Hermitian matrices ====
A [[Hermitian matrix]] has all real eigenvalues and can always be diagonalized by a [[unitary matrix]] P, according to the [[spectral theorem]].
In this case, the Jordan definition is natural. Moreover, this definition allows one to extend standard inequalities for
real functions:
 
If <math> f(a) \leq g(a)</math> for all eigenvalues of <math>A</math>, then <math>f(A) \preceq g(A)</math>.
</math>.
(As a convention, <math>X \preceq Y \Leftrightarrow Y - X </math> is a [[positive-semidefinite matrix]].)
The proof follows directly from the definition.
 
=== Cauchy integral ===
[[Cauchy's integral formula]] from [[complex analysis]] can also be used to generalize scalar functions to matrix functions. Cauchy's integral formula states that for any [[analytic function]] {{mvar|f}} defined on a set {{math|''D'' ⊂ '''C'''}}, one has
:<math display="block">f(x) = \frac{1}{2\pi i} \oint_{\!\!\!\!\!C}\! {\frac{f(z)}{z-x}}\, \mathrm{d}z ~,</math>
where {{mvar|C}} is a closed simple curve inside the ___domain {{mvar|D}} enclosing {{mvar|x}}.
 
Now, replace {{mvar|x}} by a matrix {{mvar|A}} and consider a path {{mvar|C}} inside {{mvar|D}} that encloses all [[eigenvalue]]s of {{mvar|A}}. One possibility to achieve this is to let {{mvar|C}} be a circle around the [[origin (mathematics)|origin]] with [[radius]] larger than {{mvarmath|{{norm|''A''}}}} for an arbitrary [[matrix norm]] ‖•‖{{math|{{norm|·}}}}. Then, {{math|''f''{{hair space}}(''A'')}} is definable by
:<math display="block">f(A) = \frac{1}{2\pi i} \oint_{\!\!\!\!\!C}\!oint_C {f(z)\left(zIz I - A\right)^{-1}}\, \mathrm{d}z~ \,. </math>
 
This integral can readily be evaluated numerically using the [[trapezium rule]], which [[Convergent series|converges]] exponentially in this case. That means that the [[precision (arithmetic)|precision]] of the result doubles when the number of nodes is doubled. In routine cases, this is bypassed by [[Sylvester's formula]].
Line 131 ⟶ 134:
=== Matrix perturbations ===
 
The above Taylor power series allows the scalar <math>x</math> to be replaced by the matrix. This is not true in general when expanding in terms of <math>A(\eta) = A+\eta B</math> about <math>\eta = 0</math> unless <math>[A,B]=0</math>. A counterexample is <math>f(x) = x^{3}</math>, which has a finite length [[Taylor series]]. We compute this in two ways,
 
* Distributive law: <math display="block">f(A + \eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math>
* Distributive law:
* Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end : <math display="block">\begin{align}
:<math>f(A+\eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math>
f(a+\eta b) &=& f(a) + f'(a)\frac{\eta b}{1!} + f''(a)\frac{(\eta b)^{2}}{2!} + f'''(a)\frac{(\eta b)^{3}}{3!} \\[.5em]
&=& a^{3} + 3a^{2}(\eta b) + 3a(\eta b)^{2} + (\eta b)^{3} \\[.5em]
&\to & A^{3} = + 3A^{2}(\eta B) + 3A(\eta B)^{2} + (\eta B)^{3}
\end{align}</math>
 
The scalar expression assumes commutativity while the matrix expression does not, and thus they cannot be equated directly unless <math>[A,B]=0</math>. For some ''f''(''x'') this can be dealt with using the same method as scalar Taylor series. For example, <math display="inline">f(x) = \frac{1}{x}</math>. If <math>A^{-1}</math> exists then <math>f(A+\eta B) = f(\mathbb{I} + \eta A^{-1}B)f(A)</math>. The expansion of the first term then follows the power series given above,
* Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end :
:<math>\begin{array}{rcl}
f(a+\eta b) &=& f(a) + f'(a)\frac{\eta b}{1!} + f''(a)\frac{(\eta b)^{2}}{2!} + f'''(a)\frac{(\eta b)^{3}}{3!} \\[.5em]
&=& a^{3} + 3a^{2}(\eta b) + 3a(\eta b)^{2} + (\eta b)^{3} \\[.5em]
&\to & A^{3} + 3A^{2}(\eta B) + 3A(\eta B)^{2} + (\eta B)^{3}
\end{array}</math>
 
:<math display="block">f(\mathbb{I} + \eta A^{-1}B) = \mathbb{I} - \eta A^{-1}B + (-\eta A^{-1}B)^{2} + \ldotscdots = \sum_{n=0}^{\infty} (-\eta A^{-1}B)^{n} </math>
The scalar expression assumes commutativity while the matrix expression does not, and thus they cannot be equated directly unless <math>[A,B]=0</math>. For some ''f''(''x'') this can be dealt with using the same method as scalar Taylor series. For example, <math>f(x) = \frac{1}{x}</math>. If <math>A^{-1}</math> exists then <math>f(A+\eta B) = f(\mathbb{I} + \eta A^{-1}B)f(A)</math>. The expansion of the first term then follows the power series given above,
 
:<math>f(\mathbb{I} + \eta A^{-1}B) = \mathbb{I} - \eta A^{-1}B + (-\eta A^{-1}B)^{2} + \ldots = \sum_{n=0}^{\infty} (-\eta A^{-1}B)^{n}</math>
 
The convergence criteria of the power series then apply, requiring <math>\Vert \eta A^{-1}B \Vert</math> to be sufficiently small under the appropriate matrix norm. For more general problems, which cannot be rewritten in such a way that the two matrices commute, the ordering of matrix products produced by repeated application of the Leibniz rule must be tracked.
 
=== Arbitrary function of a 2&times;22×2 matrix ===
An arbitrary function ''f''(''A)'') of a 2×2 matrix A has its [[Sylvester's formula]] simplify to
<math display="block">f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - |A|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~,</math>
:<math>
where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, {{math|1={{!}}''A-'' − ''λI|''{{!}} = 0}}, and are given by
f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - |A|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~,
<math display="block">\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - |A|} .</math>
</math>
However, if there is degeneracy, the following formula is used, where f' is the derivative of f.
where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, |A-λI|=0, and are given by
<math display="block">f(A) = f \left( \frac{tr(A)}{2} \right) I + \mathrm{adj} \left( \frac{tr(A)}{2}I - A \right ) f' \left( \frac{tr(A)}{2} \right) .</math>
:<math>
\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - |A|} .
</math>
 
=== Examples ===
 
* [[Algebraic Riccati equation]]
* [[Matrix polynomial]]
* [[Square root of a matrix|Matrix root]]
* [[Matrix logarithm]]
* [[Matrix exponential]]
* [[Matrix sign function]]<ref>{{Cite web|last=Higham|first=Nick|date=2020-12-15|title=What Is the Matrix Sign Function?| url=https://nhigham.com/2020/12/15/what-is-the-matrix-sign-function/|access-date=2020-12-27|website=Nick Higham|language=en}}</ref>
 
== Classes of matrix functions ==
Using the [[Loewner order|semidefinite ordering]] (<math>X \preceq Y \Leftrightarrow Y - X</math> is [[positive-semidefinite matrix|positive-semidefinite]] and
<math> X \prec Y \Leftrightarrow Y - X </math> is [[positive-definite matrix|positive definite]]), some
of the classes of scalar functions can be extended to matrix functions of [[Hermitian matrix|Hermitian matrices]].<ref name="Bhatia">{{cite book | author=Bhatia, R. | title=Matrix Analysis |series=Graduate Texts in Mathematics | year = 1997 | publisher=Springer | volume=169}}</ref>
Line 174 ⟶ 172:
=== Operator monotone ===
{{Main|Operator monotone function}}
A function {{mvar|f}} is called operator monotone if and only if <math> 0 \prec A \preceq H \Rightarrow f(A) \preceq f(H) </math> for all self-adjoint matrices {{math|''A'',''H''}} with spectra in the ___domain of {{mvar|f}}. This is analogous to [[monotonic function|monotone function]] in the scalar case.
A function <math>f</math> is called operator monotone if and only if
<math>
0 \prec A \preceq H \Rightarrow f(A) \preceq f(H)
</math>
for all self-adjoint matrices <math>A,H</math> with spectra in the ___domain of f.
This is analogous to [[monotonic function|monotone function]] in the scalar case.
 
=== Operator concave/convex ===
A function <math>{{mvar|f</math>}} is called operator concave if and only if
<math display="block"> \tau f(A) + (1-\tau) f(H) \preceq f \left ( \tau A + (1-\tau)H \right ) </math>
:<math>
for all self-adjoint matrices {{math|''A'',''H''}} with spectra in the ___domain of {{mvar|f}} and <math>\tau \in [0,1]</math>. This definition is analogous to a [[concave function|concave scalar function]]. An operator convex function can be defined be switching <math>\preceq</math> to <math>\succeq</math> in the definition above.
\tau f(A) + (1-\tau) f(H) \preceq f \left ( \tau A + (1-\tau)H \right )
</math>
for all self-adjoint matrices <math>A,H</math> with spectra in the ___domain of f and <math>\tau \in [0,1]</math>.
This definition is analogous to a [[concave function|concave scalar function]].
An operator convex function can be defined be switching <math>\preceq</math> to <math>\succeq</math> in the
definition above.
 
===Examples===
The matrix log is both operator monotone and operator concave. The matrix square is operator convex. The matrix exponential is none of these. '''Loewner's theorem''' states that a function on an ''open'' interval is operator monotone if and only if it has an analytic extension to the upper and lower complex half planes so that the upper half plane is mapped to itself.<ref name="Bhatia" />
 
==See also==
* [[Algebraic Riccati equation]]
*[[Sylvester's formula]]
*[[Loewner order]]
*[[Matrix calculus]]
*[[Trace inequalities]]
Line 204 ⟶ 194:
 
==References==
* {{cite book|last1=Higham|first1=Nicholas J.|title=Functions of matrices theory and computation|date=2008|publisher=Society for Industrial and Applied Mathematics|___location=Philadelphia|author-link=Nicholas Higham|isbn=9780898717778}}
 
{{Authority control}}
 
[[Category:Matrix theory]]