Analytic function of a matrix: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:04, 3 May 2021 edit InverseHypercube (talk \| contribs) Extended confirmed users, File movers 16,739 edits →Classes of matrix functions: linking Loewner order ← Previous edit		Latest revision as of 10:45, 12 November 2024 edit undo Citation bot (talk \| contribs) Bots 5,873,322 edits Altered author-link. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Matrix theory \| #UCB_Category 110/117
(14 intermediate revisions by 11 users not shown)
Line 10: === Power series === If the [[analytic function]] {{mvar\|f}} has the [[Taylor expansion]] :<math display="block">f(x) = c_0 + c_1 x + c_2 x^2 + \cdots</math> then a matrix function <math>A\mapsto f(A)</math> can be defined by substituting {{mvar\|x}} by a [[square matrix]]: powers become [[Matrix multiplication#Powers of ~~matrices~~a matrix\|matrix power]]s, additions become matrix sums and multiplications by coefficients become [[scalar multiplication]]s. If the series converges for <math>\|x\| < r</math>, then the corresponding matrix series converges for matrices {{mvar\|A}} such that <math>\\|A\\| < r</math> for some [[matrix norm]] that satisfies <math>\\|AB\\|\leq \\|A\\|\,\|B\\|</math>. ===Diagonalizable matrices=== A square matrix {{mvar\|A}} is [[diagonalizable matrix\|diagonalizable]], if there is an [[invertible matrix]] {{mvar\|P}} such that <math>D = P^{-1}\,A\,P</math> is a [[diagonal matrix]], that is, {{~~math~~mvar\|D}} has the shape :<math display="block">D=\begin{bmatrix} d_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ Line 22: As <math>A = P\,D\,P^{-1},</math> it is natural to set :<math display="block">f(A)=P\, \begin{bmatrix} f(d_1) & \cdots & 0 \\ \vdots & \ddots & \vdots \\ Line 30: It can be verified that the matrix {{math\|''f''(''A'')}} does not depend on a particular choice of {{mvar\|P}}. For example, suppose one is seeking <math>\Gamma(A) = (A-1)!</math> for :<math display="block">A= = \begin{bmatrix} 1&3\\ 2&1 \end{bmatrix} . </math> One has :<math display="block">A = P \begin{bmatrix} 1-\sqrt{6}& 0 \\ 0 & 1+ \sqrt{6} \end{bmatrix} P^{-1}~, </math> for :<math display="block"> P= \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6}} &\frac{1}{\sqrt{6}} \end{bmatrix} ~. </math> Application of the formula then simply yields :<math display="block"> \Gamma(A) = \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} } \end{bmatrix} \cdot \begin{bmatrix} \Gamma(1-\sqrt{6}) & 0\\ 0&\Gamma(1+\sqrt{6}) \end{bmatrix} \cdot \begin{bmatrix} 1 & -\sqrt{6}/2 \\ 1 & \sqrt{6}/2 \end{bmatrix} \approx \begin{bmatrix} 32.~~6274~~ 8114 & 0.4080 ~~8.8423~~\\ 50.~~8949~~2720 & 32.~~6274~~8114 \end{bmatrix} ~. </math> Likewise, :<math display="block"> A^4 = \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} } \end{bmatrix} \cdot \begin{bmatrix} (1-\sqrt{6})^4 & 0\\ 0&(1+\sqrt{6})^4 \end{bmatrix} \cdot \begin{bmatrix} 1 & -\sqrt{6}/2 \\ 1 & \sqrt{6}/2 \end{bmatrix} = \begin{bmatrix} 73 & 84\\ 56 & 73 \end{bmatrix} ~. </math> === Jordan decomposition === {{main\|Jordan normal form}} All complex matrices, whether they are diagonalizable or not, have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>, where the matrix ''J'' consists of [[Jordan block]]s. Consider these blocks separately and apply the [[power series]] to a Jordan block: :<math display="block"> f \left( \begin{bmatrix}▼ ~~have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>, where the matrix ''J'' consists of [[Jordan block]]s. Consider these blocks separately and apply the power series to a Jordan block:~~ ▲:<math> f \left( \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \vdots & \vdots \\ Line 96 ⟶ 95: \begin{bmatrix} \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \frac{f''(\lambda)}{2!} & \cdots & \frac{f^{(n-1)}(\lambda)}{(n-1)!} \\ 0 & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \vdots & \frac{f^{(n-12)}(\lambda)}{(n-12)!} \\ 0 & 0 & \ddots & \ddots & \vdots \\ \vdots & \cdots & \ddots & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} \\ 0 & \cdots & \cdots & 0 & \frac{f(\lambda)}{0!} \end{bmatrix}. Line 105 ⟶ 104: This definition can be used to extend the ___domain of the matrix function beyond the set of matrices with [[spectral radius]] smaller than the radius of convergence of the power series. Note that there is also a connection to [[Divided difference#Polynomials and power series\|divided differences]]. Line 112 ⟶ 111: ==== Hermitian matrices ==== A [[Hermitian matrix]] has all real eigenvalues and can always be diagonalized by a [[unitary matrix]] P, according to the [[spectral theorem]]. In this case, the Jordan definition is natural. Moreover, this definition allows one to extend standard inequalities for real functions: If <math> f(a) \leq g(a)</math> for all eigenvalues of <math>A</math>, then <math>f(A) \preceq g(A)</math>. ~~</math>.~~ (As a convention, <math>X \preceq Y \Leftrightarrow Y - X </math> is a [[positive-semidefinite matrix]].) The proof follows directly from the definition. === Cauchy integral === [[Cauchy's integral formula]] from [[complex analysis]] can also be used to generalize scalar functions to matrix functions. Cauchy's integral formula states that for any [[analytic function]] {{mvar\|f}} defined on a set {{math\|''D'' ⊂ ℂ'''C'''}}, one has :<math display="block">f(x) = \frac{1}{2\pi i} \oint_{C}\! {\frac{f(z)}{z-x}}\, \mathrm{d}z ~,</math> where {{mvar\|C}} is a closed simple curve inside the ___domain {{mvar\|D}} enclosing {{mvar\|x}}. Now, replace {{mvar\|x}} by a matrix {{mvar\|A}} and consider a path {{mvar\|C}} inside {{mvar\|D}} that encloses all [[eigenvalue]]s of {{mvar\|A}}. One possibility to achieve this is to let {{mvar\|C}} be a circle around the [[origin (mathematics)\|origin]] with [[radius]] larger than ‖{{~~mvar~~math\|{{norm\|''A''}}}}‖ for an arbitrary [[matrix norm]] ~~‖•‖~~{{math\|{{norm\|·}}}}. Then, {{math\|''f''~~&hairsp;~~{{hair space}}(''A'')}} is definable by :<math display="block">f(A) = \frac{1}{2\pi i} \oint_C f(z)\left(zIz I - A\right)^{-1} \mathrm{d}z~ \,. </math> This integral can readily be evaluated numerically using the [[trapezium rule]], which [[Convergent series\|converges]] exponentially in this case. That means that the [[precision (arithmetic)\|precision]] of the result doubles when the number of nodes is doubled. In routine cases, this is bypassed by [[Sylvester's formula]]. Line 136 ⟶ 134: === Matrix perturbations === The above Taylor power series allows the scalar <math>x</math> to be replaced by the matrix. This is not true in general when expanding in terms of <math>A(\eta) = A+\eta B</math> about <math>\eta = 0</math> unless <math>[A,B]=0</math>. A counterexample is <math>f(x) = x^{3}</math>, which has a finite length [[Taylor series]]. We compute this in two ways, * Distributive law: <math display="block">f(A + \eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math>▼ * Distributive law: * Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end : <math display="block">\begin{align}▼ ▲:<math>f(A+\eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math> ▲* Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end : ~~:: <math>\begin{align}~~ f(a+\eta b) &= f(a) + f'(a)\frac{\eta b}{1!} + f''(a)\frac{(\eta b)^2}{2!} + f'''(a)\frac{(\eta b)^3}{3!} \\[.5em] &= a^3 + 3a^2(\eta b) + 3a(\eta b)^2 + (\eta b)^3 \\[.5em] Line 150 ⟶ 145: The scalar expression assumes commutativity while the matrix expression does not, and thus they cannot be equated directly unless <math>[A,B]=0</math>. For some ''f''(''x'') this can be dealt with using the same method as scalar Taylor series. For example, <math display="inline">f(x) = \frac{1}{x}</math>. If <math>A^{-1}</math> exists then <math>f(A+\eta B) = f(\mathbb{I} + \eta A^{-1}B)f(A)</math>. The expansion of the first term then follows the power series given above, :<math display="block">f(\mathbb{I} + \eta A^{-1}B) = \mathbb{I} - \eta A^{-1}B + (-\eta A^{-1}B)^2 + \cdots = \sum_{n=0}^\infty (-\eta A^{-1}B)^n </math> The convergence criteria of the power series then apply, requiring <math>\Vert \eta A^{-1}B \Vert</math> to be sufficiently small under the appropriate matrix norm. For more general problems, which cannot be rewritten in such a way that the two matrices commute, the ordering of matrix products produced by repeated application of the Leibniz rule must be tracked. === Arbitrary function of a ~~2×2~~2×2 matrix === An arbitrary function ''f''(''A)'') of a 2×2 matrix A has its [[Sylvester's formula]] simplify to <math display="block">f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~,</math>▼ ~~:<math>~~ where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, {{math\|1={{!}}''A-'' − ''λI\|''{{!}} = 0}}, and are given by▼ ▲f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~, <math display="block">\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|} .</math>▼ ~~</math>~~ However, if there is degeneracy, the following formula is used, where f' is the derivative of f. ▲where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, \|A-λI\|=0, and are given by <math display="block">f(A) = f \left( \frac{tr(A)}{2} \right) I + \mathrm{adj} \left( \frac{tr(A)}{2}I - A \right ) f' \left( \frac{tr(A)}{2} \right) .</math> ~~:<math>~~ ▲\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|} . ~~</math>~~ === Examples === Line 170 ⟶ 163: * [[Matrix logarithm]] * [[Matrix exponential]] * [[Matrix sign function]]<ref>{{Cite web\|last=Higham\|first=Nick\|date=2020-12-15\|title=What Is the Matrix Sign Function?\| url=https://nhigham.com/2020/12/15/what-is-the-matrix-sign-function/\|access-date=2020-12-27\|website=Nick Higham\|language=en}}</ref> == Classes of matrix functions == Using the [[Loewner order\|semidefinite ordering]] (<math>X \preceq Y \Leftrightarrow Y - X</math> is [[positive-semidefinite matrix\|positive-semidefinite]] and <math> X \prec Y \Leftrightarrow Y - X </math> is [[positive-definite matrix\|positive definite]]), some of the classes of scalar functions can be extended to matrix functions of [[Hermitian matrix\|Hermitian matrices]].<ref name="Bhatia">{{cite book \| author=Bhatia, R. \| title=Matrix Analysis \|series=Graduate Texts in Mathematics \| year = 1997 \| publisher=Springer \| volume=169}}</ref> Line 183 ⟶ 176: === Operator concave/convex === A function {{mvar\|f}} is called operator concave if and only if :<math display="block"> \tau f(A) + (1-\tau) f(H) \preceq f \left ( \tau A + (1-\tau)H \right ) </math> for all self-adjoint matrices {{math\|''A'',''H''}} with spectra in the ___domain of {{mvar\|f}} and <math>\tau \in [0,1]</math>. This definition is analogous to a [[concave function\|concave scalar function]]. An operator convex function can be defined be switching <math>\preceq</math> to <math>\succeq</math> in the definition above. ===Examples=== The matrix log is both operator monotone and operator concave. The matrix square is operator convex. The matrix exponential is none of these. '''Loewner's theorem''' states that a function on an ''open'' interval is operator monotone if and only if it has an analytic extension to the upper and lower complex half planes so that the upper half plane is mapped to itself.<ref name="Bhatia" /> ==See also== Line 201 ⟶ 194: ==References== * {{cite book\|last1=Higham\|first1=Nicholas J.\|title=Functions of matrices theory and computation\|date=2008\|publisher=Society for Industrial and Applied Mathematics\|___location=Philadelphia\|author-link=Nicholas Higham\|isbn=9780898717778}} {{Authority control}} [[Category:Matrix theory]]