Analytic function of a matrix: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 10:15, 2 March 2021 edit 176.150.91.115 (talk) →Diagonalizable matrices: Corrected typo in matrix ← Previous edit		Latest revision as of 10:45, 12 November 2024 edit undo Citation bot (talk \| contribs) Bots 5,872,929 edits Altered author-link. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Matrix theory \| #UCB_Category 110/117
(22 intermediate revisions by 14 users not shown)
Line 1: {{short description\|Function that maps matrices to matrices}} In [[mathematics]], every [[analytic function]] can be used for defining a '''matrix function''' isthat amaps [[~~function~~square ~~(mathematics)\|function~~matrices]] ~~which~~with ~~maps~~complex aentries ~~[[matrix~~to ~~(mathematics)\|matrix]]~~square tomatrices ~~another~~of ~~matrix~~the same size. This is used for defining the [[exponential of a matrix]], which is involved in the [[closed-form expression\|closed-form]] solution of systems of [[linear differential equation]]s. == Extending scalar function to matrix functions == Line 7 ⟶ 9: === Power series === If the ~~real~~[[analytic function]] {{mvar\|f}} has the [[Taylor expansion]] :<math display="block">f(x) = ~~f(0)~~c_0 + ~~f'(0)\cdot~~c_1 x + ~~f''(0)\cdot~~c_2 ~~\frac{~~x^2~~}{2!}~~ + \cdots</math> then a matrix function <math>A\mapsto f(A)</math> can be defined by substituting {{mvar\|x}} by a [[square matrix]]: ~~the~~ powers become [[Matrix multiplication#Powers of ~~matrices~~a matrix\|matrix power]]s, ~~the~~ additions become matrix sums and ~~the~~ multiplications by coefficients become ~~scaling~~[[scalar ~~operations~~multiplication]]s. If the ~~real~~ series converges for <math>\|x\| < r</math>, then the corresponding matrix series ~~will converge~~converges for ~~matrix argument~~ matrices {{mvar\|A}} such ifthat <math>\\|A\\| < r</math> for some [[matrix norm]] ~~<math>\\|\cdot\\|</math> which~~that satisfies <math>\\|AB\\|\leq \\|A\\|~~\cdot~~\\|B\\|</math>. ===Diagonalizable matrices=== IfA ~~the~~square matrix {{mvar\|A}} is [[diagonalizable matrix\|diagonalizable]], ~~the~~if ~~problem~~there ~~may~~is bean ~~reduced~~[[invertible tomatrix]] an{{mvar\|P}} ~~array~~such ofthat ~~the~~<math>D ~~function~~= onP^{-1}\,A\,P</math> ~~each~~is a [[diagonal matrix]], that is, {{mvar\|D}} has the ~~eigenvalue.~~shape :<math~~>f(A)~~ display="block">D= P \begin{bmatrix}▼ ~~This is to say we can find a matrix {{mvar\|P}} and a [[diagonal matrix]] {{mvar\|D}}~~ d_1 & \cdots & 0 \\ such that <math>A = P~ D~ P^{-1}</math>.▼ \vdots & \ddots & \vdots \\ ~~Applying the power series definition to this decomposition, we find that {{math\|''f''&hairsp;(''A'')}} is defined by~~ 0 & \cdots & d_n ▲:<math>f(A) = P \begin{bmatrix} \end{~~array~~bmatrix}.</math>▼ ▲~~such that~~As <math>A = P~ \,D~ \,P^{-1},</math>. it is natural to set :<math display="block"> f (A)=P\~~left(~~, \begin{bmatrix}▼ f(d_1) & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & f(d_n) \end{bmatrix} \,P^{-1} ~, .</math> ~~where <math>d_1, \dots, d_n</math> denote the diagonal entries of ''D''.~~ It can be verified that the matrix {{math\|''f''(''A'')}} does not depend on a particular choice of {{mvar\|P}}. For example, suppose one is seeking <math>\Gamma(A!) = ~~\Gamma~~(A+-1)!</math> for :<math display="block">A= = \begin{bmatrix} 1&3\\ 2&1 \end{bmatrix} ~. </math> One has :<math display="block">A = P \begin{bmatrix} 1-\sqrt{6}& 0 \\ 0 & 1+ \sqrt{6} \end{bmatrix} P^{-1}~, </math> for :<math display="block"> P= \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6}} &\frac{1}{\sqrt{6}} \end{bmatrix} ~. </math> Application of the formula then simply yields :<math display="block"> \Gamma(A!) = \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} } \end{bmatrix} \cdot \begin{bmatrix} \Gamma(1-\sqrt{6}) & 0\\ 0&\Gamma(1+\sqrt{6}) \end{bmatrix} \cdot \begin{bmatrix} 1 & -\sqrt{6}/2 \\ 1 & \sqrt{6}/2 \end{bmatrix} \approx \begin{bmatrix} 32.~~6274~~ 8114 & 0.4080 ~~8.8423~~\\ 50.~~8949~~2720 & 32.~~6274~~8114 \end{bmatrix} ~. </math> Likewise, :<math display="block"> A^4 = \begin{bmatrix} 1/2 & 1/2 \\ -\frac{1}{\sqrt{6} } & \frac{1}{\sqrt{6} } \end{bmatrix} \cdot \begin{bmatrix} (1-\sqrt{6})^4 & 0\\ 0&(1+\sqrt{6})^4 \end{bmatrix} \cdot \begin{bmatrix} 1 & -\sqrt{6}/2 \\ 1 & \sqrt{6}/2 \end{bmatrix} = \begin{bmatrix} 73 & 84\\ 56 & 73 \end{bmatrix} ~. </math> === Jordan decomposition === {{main\|Jordan normal form}} All complex matrices, whether they are diagonalizable or not, have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>, where the matrix ''J'' consists of [[Jordan block]]s. Consider these blocks separately and apply the [[power series]] to a Jordan block: <math display="block"> f \left( \begin{bmatrix} ~~have a [[Jordan normal form]] <math>A = P\,J\,P^{-1}</math>, where the matrix ''J'' consists of [[Jordan block]]s. Consider these blocks separately and apply the power series to a Jordan block:~~ ▲:<math> f \left( \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \vdots & \vdots \\ Line 89 ⟶ 95: \begin{bmatrix} \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \frac{f''(\lambda)}{2!} & \cdots & \frac{f^{(n-1)}(\lambda)}{(n-1)!} \\ 0 & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} & \vdots & \frac{f^{(n-12)}(\lambda)}{(n-12)!} \\ 0 & 0 & \ddots & \ddots & \vdots \\ \vdots & \cdots & \ddots & \frac{f(\lambda)}{0!} & \frac{f'(\lambda)}{1!} \\ 0 & \cdots & \cdots & 0 & \frac{f(\lambda)}{0!} \end{bmatrix}. Line 98 ⟶ 104: This definition can be used to extend the ___domain of the matrix function beyond the set of matrices with [[spectral radius]] smaller than the radius of convergence of the power series. Note that there is also a connection to [[Divided difference#Polynomials and power series\|divided differences]]. Line 105 ⟶ 111: ==== Hermitian matrices ==== A [[Hermitian matrix]] has all real eigenvalues and can always be diagonalized by a [[unitary matrix]] P, according to the [[spectral theorem]]. In this case, the Jordan definition is natural. Moreover, this definition allows one to extend standard inequalities for real functions: If <math> f(a) \leq g(a)</math> for all eigenvalues of <math>A</math>, then <math>f(A) \preceq g(A)</math>. ~~</math>.~~ (As a convention, <math>X \preceq Y \Leftrightarrow Y - X </math> is a [[positive-semidefinite matrix]].) The proof follows directly from the definition. === Cauchy integral === [[Cauchy's integral formula]] from [[complex analysis]] can also be used to generalize scalar functions to matrix functions. Cauchy's integral formula states that for any [[analytic function]] {{mvar\|f}} defined on a set {{math\|''D'' ⊂ ℂ'''C'''}}, one has :<math display="block">f(x) = \frac{1}{2\pi i} \oint_{C}\! {\frac{f(z)}{z-x}}\, \mathrm{d}z ~,</math> where {{mvar\|C}} is a closed simple curve inside the ___domain {{mvar\|D}} enclosing {{mvar\|x}}. Now, replace {{mvar\|x}} by a matrix {{mvar\|A}} and consider a path {{mvar\|C}} inside {{mvar\|D}} that encloses all [[eigenvalue]]s of {{mvar\|A}}. One possibility to achieve this is to let {{mvar\|C}} be a circle around the [[origin (mathematics)\|origin]] with [[radius]] larger than ‖{{~~mvar~~math\|{{norm\|''A''}}}}‖ for an arbitrary [[matrix norm]] ~~‖•‖~~{{math\|{{norm\|·}}}}. Then, {{math\|''f''~~&hairsp;~~{{hair space}}(''A'')}} is definable by :<math display="block">f(A) = \frac{1}{2\pi i} \oint_C f(z)\left(zIz I - A\right)^{-1} \mathrm{d}z~ \,. </math> This integral can readily be evaluated numerically using the [[trapezium rule]], which [[Convergent series\|converges]] exponentially in this case. That means that the [[precision (arithmetic)\|precision]] of the result doubles when the number of nodes is doubled. In routine cases, this is bypassed by [[Sylvester's formula]]. Line 129 ⟶ 134: === Matrix perturbations === The above Taylor power series allows the scalar <math>x</math> to be replaced by the matrix. This is not true in general when expanding in terms of <math>A(\eta) = A+\eta B</math> about <math>\eta = 0</math> unless <math>[A,B]=0</math>. A counterexample is <math>f(x) = x^{3}</math>, which has a finite length [[Taylor series]]. We compute this in two ways, * Distributive law: <math display="block">f(A + \eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math>▼ * Distributive law: * Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end : <math display="block">\begin{align}▼ ▲:<math>f(A+\eta B) = (A+\eta B)^{3} = A^{3} + \eta(A^{2}B + ABA + BA^{2}) + \eta^{2}(AB^{2} + BAB + B^{2}A) + \eta^{3}B^{3}</math> f(a+\eta b) &=& f(a) + f'(a)\frac{\eta b}{1!} + f''(a)\frac{(\eta b)^{2}}{2!} + f'''(a)\frac{(\eta b)^{3}}{3!} \\[.5em]▼ &=& a^{3} + 3a^{2}(\eta b) + 3a(\eta b)^{2} + (\eta b)^{3} \\[.5em]▼ ▲* Using scalar Taylor expansion for <math>f(a+\eta b)</math> and replacing scalars with matrices at the end : &\to & A^{3} = + 3A^{2}(\eta B) + 3A(\eta B)^{2} + (\eta B)^{3}▼ ~~:<math>\begin{array}{rcl}~~ \end{align}</math> ▲f(a+\eta b) &=& f(a) + f'(a)\frac{\eta b}{1!} + f''(a)\frac{(\eta b)^{2}}{2!} + f'''(a)\frac{(\eta b)^{3}}{3!} \\[.5em] ▲&=& a^{3} + 3a^{2}(\eta b) + 3a(\eta b)^{2} + (\eta b)^{3} \\[.5em] ▲&\to & A^{3} + 3A^{2}(\eta B) + 3A(\eta B)^{2} + (\eta B)^{3} ▲\end{array}</math> The scalar expression assumes commutativity while the matrix expression does not, and thus they cannot be equated directly unless <math>[A,B]=0</math>. For some ''f''(''x'') this can be dealt with using the same method as scalar Taylor series. For example, <math display="inline">f(x) = \frac{1}{x}</math>. If <math>A^{-1}</math> exists then <math>f(A+\eta B) = f(\mathbb{I} + \eta A^{-1}B)f(A)</math>. The expansion of the first term then follows the power series given above, :<math display="block">f(\mathbb{I} + \eta A^{-1}B) = \mathbb{I} - \eta A^{-1}B + (-\eta A^{-1}B)^{2} + \cdots = \sum_{n=0}^{\infty} (-\eta A^{-1}B)^{n} </math> The convergence criteria of the power series then apply, requiring <math>\Vert \eta A^{-1}B \Vert</math> to be sufficiently small under the appropriate matrix norm. For more general problems, which cannot be rewritten in such a way that the two matrices commute, the ordering of matrix products produced by repeated application of the Leibniz rule must be tracked. === Arbitrary function of a ~~2×2~~2×2 matrix === An arbitrary function ''f''(''A)'') of a 2×2 matrix A has its [[Sylvester's formula]] simplify to <math display="block">f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~,</math>▼ ~~:<math>~~ where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, {{math\|1={{!}}''A-'' − ''λI\|''{{!}} = 0}}, and are given by▼ ▲f(A) = \frac{f(\lambda_+) + f(\lambda_-)}{2} I + \frac{A - \left (\frac{tr(A)}{2}\right )I}{\sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|}} \frac{f(\lambda_+) - f(\lambda_-)}{2} ~, <math display="block">\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|} .</math>▼ ~~</math>~~ However, if there is degeneracy, the following formula is used, where f' is the derivative of f. ▲where <math>\lambda_\pm</math> are the eigenvalues of its characteristic equation, \|A-λI\|=0, and are given by <math display="block">f(A) = f \left( \frac{tr(A)}{2} \right) I + \mathrm{adj} \left( \frac{tr(A)}{2}I - A \right ) f' \left( \frac{tr(A)}{2} \right) .</math> ~~:<math>~~ ▲\lambda_\pm = \frac{tr(A)}{2} \pm \sqrt{\left (\frac{tr(A)}{2}\right )^2 - \|A\|} . ~~</math>~~ === Examples === Line 163: * [[Matrix logarithm]] * [[Matrix exponential]] * [[Matrix sign function]]<ref>{{Cite web\|last=Higham\|first=Nick\|date=2020-12-15\|title=What Is the Matrix Sign Function?\| url=https://nhigham.com/2020/12/15/what-is-the-matrix-sign-function/\|access-date=2020-12-27\|website=Nick Higham\|language=en}}</ref> == Classes of matrix functions == Using the [[Loewner order\|semidefinite ordering]] (<math>X \preceq Y \Leftrightarrow Y - X</math> is [[positive-semidefinite matrix\|positive-semidefinite]] and <math> X \prec Y \Leftrightarrow Y - X </math> is [[positive-definite matrix\|positive definite]]), some of the classes of scalar functions can be extended to matrix functions of [[Hermitian matrix\|Hermitian matrices]].<ref name="Bhatia">{{cite book \| author=Bhatia, R. \| title=Matrix Analysis \|series=Graduate Texts in Mathematics \| year = 1997 \| publisher=Springer \| volume=169}}</ref> Line 176: === Operator concave/convex === A function {{mvar\|f}} is called operator concave if and only if :<math display="block"> \tau f(A) + (1-\tau) f(H) \preceq f \left ( \tau A + (1-\tau)H \right ) </math> for all self-adjoint matrices {{math\|''A'',''H''}} with spectra in the ___domain of {{mvar\|f}} and <math>\tau \in [0,1]</math>. This definition is analogous to a [[concave function\|concave scalar function]]. An operator convex function can be defined be switching <math>\preceq</math> to <math>\succeq</math> in the definition above. ===Examples=== The matrix log is both operator monotone and operator concave. The matrix square is operator convex. The matrix exponential is none of these. '''Loewner's theorem''' states that a function on an ''open'' interval is operator monotone if and only if it has an analytic extension to the upper and lower complex half planes so that the upper half plane is mapped to itself.<ref name="Bhatia" /> ==See also== [[Algebraic Riccati equation]] [[Sylvester's formula]] [[Loewner order]] [[Matrix calculus]] [[Trace inequalities]] Line 193 ⟶ 194: ==References== {{cite book\|last1=Higham\|first1=Nicholas J.\|title=Functions of matrices theory and computation\|date=2008\|publisher=Society for Industrial and Applied Mathematics\|___location=Philadelphia\|author-link=Nicholas Higham\|isbn=9780898717778}} {{Authority control}} [[Category:Matrix theory]]