Content deleted Content added
Citation bot (talk | contribs) Add: s2cid, author pars. 1-1. Removed parameters. Some additions/deletions were actually parameter name changes. | You can use this bot yourself. Report bugs here. | Suggested by Abductive | All pages linked from cached copy of User:Abductive/sandbox | via #UCB_webform_linked 889/951 |
m Open access bot: url-access updated in citation with #oabot. |
||
(33 intermediate revisions by 18 users not shown) | |||
Line 1:
{{FeatureDetectionCompVisNavbox}}{{Use American English|date = March 2019}}▼
{{Short description|Tensor related to gradients}}
{{FeatureDetectionCompVisNavbox}}
In mathematics, the '''structure [[tensor]]''', also referred to as the '''second-moment matrix''', is a [[matrix (mathematics)|matrix]] derived from the [[gradient]] of a [[function (mathematics)|function]]. It summarizes the predominant directions of the gradient in a specified neighborhood of a point, and the degree to which those directions are coherent. The structure tensor is often used in [[image processing]] and [[computer vision]].<ref name=bigun86>▼
J. Bigun and G. Granlund (1986), ''Optimal Orientation Detection of Linear Symmetry''. Tech. Report LiTH-ISY-I-0828, Computer Vision Laboratory, Linkoping University, Sweden 1986; Thesis Report, Linkoping studies in science and technology No. 85, 1986.▼
▲In mathematics, the '''structure [[tensor]]''', also referred to as the '''second-moment matrix''', is a [[matrix (mathematics)|matrix]] derived from the
▲
</ref><ref name=bigun87>
</ref><ref name=knutsson89>
</ref>
Line 12 ⟶ 13:
===Continuous version===
For a function <math>I</math> of two variables {{
S_w(p) =
\begin{bmatrix}
\int w(r) (I_x(p-r))^2\,
\int w(r) I_x(p-r)I_y(p-r)\,
\end{bmatrix}
</math>
where <math>I_x</math> and <math>I_y</math> are the [[partial derivative]]s of <math>I</math> with respect to ''x'' and ''y''; the integrals range over the plane <math>\mathbb{R}^2</math>; and ''w'' is some fixed "window function" (such as a [[Gaussian blur]]), a [[distribution (mathematics)|distribution]] on two variables.
The formula above can be written also as <math display="inline">S_w(p) = \int w(r) S_0(p-r) \,
S_0(p)=
\begin{bmatrix}
Line 32 ⟶ 33:
</math>
If the [[gradient]] <math>\nabla I = (I_x,I_y)^\text{T}</math> of <math>I</math> is viewed as a 2×1 (single-column) matrix, where <math>(
===Discrete version===
In image processing and other similar applications, the function <math>I</math> is usually given as a discrete [[array data structure|array]] of samples <math>I[p]</math>, where ''p'' is a pair of integer indices.
S_w[p] =
\begin{bmatrix}
\sum_r w[r] (I_x[p-r])^2 & \sum_r w[r]
\sum_r w[r] I_x[p-r]I_y[p-r]
\end{bmatrix}
</math>
Here the summation index ''r'' ranges over a finite set of index pairs (the "window", typically <math>\{-m
The formula of the structure tensor can be written also as <math display="inline">S_w[p]=\sum_r w[r] S_0[p-r]</math>, where <math>S_0</math> is the matrix-valued array such that
S_0[p] =
\begin{bmatrix}
(I_x[p])^2 & I_x[p]I_y[p] \\[10pt]
I_x[p]I_y[p]
\end{bmatrix}
</math>
Line 59 ⟶ 60:
The importance of the 2D structure tensor <math>S_w</math> stems from the fact [[eigenvalue]]s <math>\lambda_1,\lambda_2</math> (which can be ordered so that <math>\lambda_1 \geq \lambda_2\geq 0</math>) and the corresponding [[eigenvector]]s <math>e_1,e_2</math> summarize the distribution of the [[gradient]] <math>\nabla I = (I_x,I_y)</math> of <math>I</math> within the window defined by <math>w</math> centered at <math>p</math>.<ref name=bigun86/><ref name=bigun87/><ref name=knutsson89/>
Namely, if <math>\lambda_1 > \lambda_2</math>, then
In particular, if <math>\lambda_1 > 0, \lambda_2 = 0</math> then the gradient is always a multiple of <math>e_1</math> (positive, negative or zero); this is the case if and only if <math>I</math> within the window varies along the direction <math>e_1</math> but is constant along <math>e_2</math>. This condition of eigenvalues is also called linear symmetry condition because then the iso-curves of <math>I</math> consist in parallel lines, i.e there exists a one dimensional function <math>g
If <math>\lambda_1 = \lambda_2</math>, on the other hand, the gradient in the window has no predominant direction; which happens, for instance, when the image has [[rotational symmetry]] within that window. This condition of eigenvalues
Furthermore, the condition <math>\lambda_1 = \lambda_2 = 0</math> happens if and only if the function <math>I</math> is constant (<math>\nabla I = (0,0)</math>) within <math>W</math>.
More generally, the value of <math>\lambda_k </math>, for ''k''=1 or ''k''=2, is the <math>w</math>-weighted average, in the neighborhood of ''p'', of the square of the [[directional derivative]] of <math>I</math> along <math>e_k</math>.
</ref><ref name=MedioniEA>
</ref>
if <math>\lambda_2>0</math>.
Note that the average of the gradient <math>\nabla I</math> inside the window is '''not''' a good indicator of anisotropy. Aligned but oppositely oriented gradient vectors would cancel out in this average, whereas in the structure tensor they are properly added together.<ref>
</ref> This is a reason for why <math>(\nabla I)(\nabla I)^\text{T}</math> is used in the averaging of the structure tensor to optimize the direction instead of <math>\nabla I</math>.
By expanding the effective radius of the window function <math>w</math> (that is, increasing its variance), one can make the structure tensor more robust in the face of noise, at the cost of diminished spatial resolution.<ref name=MedioniEA /><ref name=lin94book>T. Lindeberg (
</ref>
===Complex version===
The interpretation
S_w(p) =
\begin{bmatrix}
\mu_{20} & \mu_{11} \\[10pt]
\mu_{11}
\end{bmatrix}
</math>
where <math display="inline">
\mu_{20} =\int (w(r) (I_x(p-r))^2\,
</math> ,
\mu_{02} =\int (w(r) (I_y(p-r))^2\,
</math>
\mu_{11} =\int w(r) I_x(p-r)I_y(p-r)\,
</math> in which integrals can be replaced by summations for discrete representation. Using [[Parseval's
<math display="
\kappa_{20} = \mu_{20}-\mu_{02}+i2\mu_{11}=\int
</math>
where <math>i=\sqrt{-1}</math> and <math>\phi</math> is the direction angle of the most significant eigenvector of the structure tensor <math>\phi=\angle{e_1}</math> whereas <math>\lambda_1</math> and <math>\lambda_2</math> are the most and the least significant eigenvalues. From, this it follows that
Likewise the following second order complex moment of the power spectrum of <math>I</math>, which happens to be always real because <math>I</math> is real,
<math display="
\kappa_{11} = \mu_{20}+\mu_{02} = \int
</math>
can be obtained, with
However, decomposing the structure tensor in its eigenvectors yields its tensor components as
S_w(p) =
\lambda_1 e_1e_1^\text{T}+
\lambda_2 e_2e_2^\text{T} =(\lambda_1 -\lambda_2)e_1e_1^\text{T}+
\lambda_2(
\lambda_2 E
</math>
where <math>E</math>
Evidently, <math>\kappa_{20}</math> is the complex equivalent of the first term in the tensor decomposition, whereas <math display="block">\tfrac 1 2 (|\kappa_{20}|-\kappa_{11})
\begin{
\kappa_{20} &=& (\lambda_1-\lambda_2)\exp(i2\phi) &=
\kappa_{11} &=& \lambda_1+\lambda_2&=
\end{
</math>
where <math>h(x,y)=(x+iy)\exp(-(x^2+y^2)/(2\sigma^2)) </math> is the (complex) gradient filter, and <math>*</math> is convolution, constitute a complex representation of the 2D Structure Tensor. As discussed here and elsewhere <math>w</math> defines the local image which is usually a Gaussian (with a certain variance defining the outer scale), and <math>\sigma </math> is the (inner scale) parameter determining the effective frequency
The elegance of the complex representation stems from that the two components of the structure tensor can be obtained as averages and
The complex representation of the structure tensor is frequently used in fingerprint analysis to obtain direction maps containing certainties which in turn are used to enhance them, to find the locations of the global (cores and deltas) and local
==The 3D structure tensor==
===Definition===
The structure tensor can be defined also for a function <math>I</math> of three variables ''p''=(''x'',''y'',''z'') in an entirely analogous way.
S_0(p) =
\begin{bmatrix}
(I_x(p))^2 & I_x(p)I_y(p) & I_x(p)I_z(p) \\[10pt]
Line 158 ⟶ 159:
where <math>I_x,I_y,I_z</math> are the three partial derivatives of <math>I</math>, and the integral ranges over <math>\mathbb{R}^3</math>.
In the discrete version,<math display="inline">S_w[p] = \sum_r w[r] S_0[p-r]</math>, where
S_0[p] =
\begin{bmatrix}
(I_x[p])^2 & I_x[p]I_y[p] & I_x[p]I_z[p] \\[10pt]
I_x[p]I_y[p]
I_x[p]I_z[p] & I_y[p]I_z[p] & (I_z[p])^2
\end{bmatrix}
</math>
and the sum ranges over a finite set of 3D indices, usually <math>\{-m
===Interpretation===
As in the
[[File:STgeneric.png|thumb|center|240px|Ellipsoidal representation of the 3D structure tensor.]]
In particular, if the ellipsoid is stretched along one axis only, like a cigar (that is, if <math>\lambda_1</math> is much larger than both <math>\lambda_2</math> and <math>\lambda_3</math>), it means that the gradient in the window is predominantly aligned with the direction <math>e_1</math>, so that the [[isosurface]]s of <math>I</math> tend to be flat and perpendicular to that vector.
{| cellborder=0px border=0px style="margin: 1em auto;"▼
▲{| cellborder=0px border=0px
|- valign=top
| [[File:STsurfel.png|thumb|180px|The structure tensor ellipsoid of a surface-like neighborhood ("[[surfel]]"), where <math>\lambda_1 >\!> \lambda_2 \approx \lambda_3</math>.]]||[[File:StepPlane3D.png|thumb|180px|A 3D window straddling a smooth boundary surface between two uniform regions of a 3D image.]]||[[File:StepPlane3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
If the ellipsoid is flattened in one direction only, like a pancake (that is, if <math>\lambda_3</math> is much smaller than both <math>\lambda_1</math> and <math>\lambda_2</math>), it means that the gradient directions are spread out but perpendicular to <math>e_3</math>; so that the isosurfaces tend to be like tubes parallel to that vector.
{| cellborder=0px border=0px style="margin: 1em auto;"▼
▲{| cellborder=0px border=0px
|- valign=top
| [[File:STcurvel.png|thumb|180px|The structure tensor of a line-like neighborhood ("curvel"), where <math>\lambda_1 \approx \lambda_2 >\!> \lambda_3</math>.]]||[[File:curve3D.png|thumb|180px|A 3D window straddling a line-like feature of a 3D image.]]||[[File:curve3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
Finally, if the ellipsoid is roughly spherical (that is, if <math>\lambda_1\approx\lambda_2\approx\lambda_3</math>), it means that the gradient directions in the window are more or less evenly distributed, with no marked preference; so that the function <math>I</math> is mostly isotropic in that neighborhood.
{| cellborder=0px border=0px style="margin: 1em auto;"▼
▲{| cellborder=0px border=0px
|- valign=top
| [[File:STball.png|thumb|180px|The structure tensor in an isotropic neighborhood, where <math>\lambda_1\approx\lambda_2\approx\lambda_3</math>.]]||[[File:Sphere3D.png|thumb|180px|A 3D window containing a spherical feature of a 3D image.]]||[[File:Sphere3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
==The multi-scale structure tensor==
The structure tensor is an important tool in [[scale space]] analysis.
One scale parameter, referred to as ''local scale'' <math>t</math>, is needed for determining the amount of pre-smoothing when computing the image gradient <math>(\nabla I)(x; t)</math>. Another scale parameter, referred to as ''integration scale'' <math>s</math>, is needed for specifying the spatial extent of the window function <math>w(\xi; s)</math> that determines the weights for the region in space over which the components of the outer product of the gradient by itself <math>(\nabla I)(\nabla I)^\text{T}</math> are accumulated.
More precisely, suppose that <math>I</math> is a real-valued signal defined over <math>\mathbb{R}^k</math>. For any local scale <math>t > 0</math>, let a multi-scale representation <math>I(x; t)</math> of this signal be given by <math>I(x; t) = h(x; t)*I(x)</math> where <math>h(x; t)</math> represents a pre-smoothing kernel. Furthermore, let <math>(\nabla I)(x; t)</math> denote the gradient of the [[scale space representation]].
Then, the ''multi-scale structure tensor/second-moment matrix'' is defined by<ref name=lin94book/><ref name=lingar97>{{cite journal
| name-list-style=amp
▲|author1=T. Lindeberg |author2=J. Garding
| journal=Image and Vision Computing
| year=1997
| volume=15
| pages=415–434
| url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A472972&dswid=7790
| doi=10.1016/S0262-8856(97)01144-X
| issue=6
}}</ref><ref name=garlin96>
</ref>
\mu(x; t, s) =
\int_{\xi \in \mathbb{R}^k}
(\nabla I)(x-\xi; t) \, (\nabla I)^\text{T}(x-\xi; t) \,
w(\xi; s) \, d\xi
</math>
Conceptually, one may ask if it would be sufficient to use any self-similar families of smoothing functions <math>h(x; t)</math> and <math>w(\xi; s)</math>. If one naively would apply, for example, a box filter, however, then non-desirable artifacts could easily occur. If one wants the multi-scale structure tensor to be well-behaved over both increasing local scales <math>t</math> and increasing integration scales <math>s</math>, then it can be shown that both the smoothing function and the window function ''have to'' be Gaussian.<ref name=lin94book/> The conditions that specify this uniqueness are similar to the [[scale-space axioms]] that are used for deriving the uniqueness of the Gaussian kernel for a regular Gaussian [[scale space]] of image intensities.
Line 232 ⟶ 227:
A conceptually similar construction can be performed for discrete signals, with the convolution integral replaced by a convolution sum and with the continuous Gaussian kernel <math> g(x; t)</math> replaced by the [[discrete Gaussian kernel]] <math>T(n; t)</math>:
\mu(x; t, s) =
\sum_{n \in \mathbb{Z}^k}
(\nabla I)(x-n; t) \, (\nabla I)^\text{T}(x-n; t) \,
w(n; s)
</math>
When quantizing the scale parameters <math>t</math> and <math>s</math> in an actual implementation, a finite geometric progression <math>\alpha^i</math> is usually used, with ''i'' ranging from 0 to some maximum scale index ''m''.
==Applications==
The eigenvalues of the structure tensor play a significant role in many image processing algorithms, for problems like [[corner detection]], [[interest point detection]], and [[feature tracking]].<ref name="Medioni">
</ref><ref>
|author=W. Förstner|title=A Feature Based Correspondence Algorithm for Image Processing
|
</ref><ref>
|
</ref><ref>
|
</ref><ref>
|
</ref><ref>
|
</ref><ref>
{{cite conference|author1=C. Kenney, M. Zuliani
</ref>
</ref> [[diffusion-based image processing]],<ref>[http://www.mia.uni-saarland.de/weickert/book.html J. Weickert (1998), Anisotropic diffusion in image processing, Teuber Verlag, Stuttgart.]</ref><ref>
</ref><ref>
</ref><ref>
</ref> and several other image processing problems. The structure tensor can be also applied in [[geology]] to filter [[Seismology|seismic]] data.<ref>{{Cite journal| last1=Yang|first1=Shuai| last2=Chen|first2=Anqing| last3=Chen|first3=Hongde| date=2017-05-25|title=Seismic data filtering using non-local means algorithm based on structure tensor| journal=Open Geosciences| volume=9|issue=1|pages=151–160|doi=10.1515/geo-2017-0013| issn=2391-5447| bibcode=2017OGeo....9...13Y| s2cid=134392619|doi-access=free}}</ref>
===Processing spatio-temporal video data with the structure tensor===
The three-dimensional structure tensor has been used to analyze three-dimensional video data (viewed as a function of ''x'', ''y'', and time ''t'').<ref name="Jahne1993" />
If one in this context aims at image descriptors that are ''invariant'' under [[Galilean
it is, however, from a computational viewpoint preferable to parameterize the components in the structure tensor/second-moment matrix <math>S</math> using the notion of ''Galilean diagonalization''<ref name=lin04icpr>
</ref>
where <math>G</math> denotes a Galilean transformation of spacetime and <math>R_\text{space}</math> a two-dimensional rotation over the spatial ___domain,
compared to the abovementioned use of eigenvalues of a 3-D structure tensor, which corresponds to an eigenvalue decomposition and a (non-physical) three-dimensional rotation of spacetime
To obtain true Galilean invariance, however, also the shape of the spatio-temporal window function needs to be adapted,<ref name=lin04icpr/><ref>
</ref> corresponding to the transfer of [[affine shape adaptation]]<ref name=lingar97/> from spatial to spatio-temporal image data.
In combination with local spatio-temporal histogram descriptors,<ref>
</ref>
these concepts together allow for Galilean invariant recognition of spatio-temporal events.<ref>
==See also==
|