Structure tensor: Difference between revisions

Content deleted Content added
Overview: added vector arrows to e1, e2 and e3
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
 
(331 intermediate revisions by 86 users not shown)
Line 1:
{{Short description|Tensor related to gradients}}
The '''structure tensor''' page is a general purpose description, tutorial and demonstration of the uses of this form.
{{FeatureDetectionCompVisNavbox}}
{{Use American English|date = March 2019}}
In mathematics, the '''structure [[tensor]]''', also referred to as the '''second-moment matrix''', is a [[matrix (mathematics)|matrix]] derived from the [[gradient]] of a [[function (mathematics)|function]]. It describes the distribution of the gradient in a specified neighborhood around a point and makes the information invariant to the observing coordinates<!-- Example: if you have a 2D image with two components storing the gradient direction and a Gaussian blur is performed separately on each component, the result will be ill-formed (specially for the directions were vector orientations flip). On the other hand if the blur is performed component-wise on a 2x2 structure tensor the main eigenvector (scaled by its eigenvalue) will properly represent the gradient. -->. The structure tensor is often used in [[image processing]] and [[computer vision]].<ref name=bigun86>
J. Bigun and G. Granlund (1986), ''Optimal Orientation Detection of Linear Symmetry''. Tech. Report LiTH-ISY-I-0828, Computer Vision Laboratory, Linkoping University, Sweden 1986; Thesis Report, Linkoping studies in science and technology No. 85, 1986.
</ref><ref name=bigun87>
{{cite conference|author1=J. Bigun |author2=G. Granlund |name-list-style=amp |chapter=Optimal Orientation Detection of Linear Symmetry|___location=Piscataway|title=First int. Conf. on Computer Vision, ICCV, (London) |publisher=IEEE Computer Society Press, Piscataway|pages=433–438|year=1987 }}
</ref><ref name=knutsson89>
{{cite conference|author=H. Knutsson|chapter=Representing local structure using tensors|___location=Oulu|title=Proceedings 6th Scandinavian Conf. on Image Analysis|publisher=Oulu University|pages=244–251|year=1989}}
</ref>
 
==The 2D structure tensor==
 
===Continuous version===
==Summary==
For a function <math>I</math> of two variables {{math|1=''p'' = (''x'', ''y'')}}, the structure tensor is the 2×2 matrix
Structure tensors are a matrix representation of partial derivative information. In the field of image processing and computer vision, it is typically used to represent the gradient or "edge" information. It also has a more powerful description of local patterns as opposed to the directional derivative through its coherence measure. It has several other advantages that are detailed in the structure tensor section of this tutorial. This information is designed specifically as either an introduction or refresher to structure tensors to those in the image processing field; however, it should also be useful to those from a more general math background who want to learn more about this matrix representation. To provide more detail than the typical tutorial, I have included the MATLAB source code both as inline with the text as well as in a comprehensive zip file at the end of this document.
 
<math display="block">
==Background==
S_w(p) =
\begin{bmatrix}
\int w(r) (I_x(p-r))^2\,dr & \int w(r) I_x(p-r)I_y(p-r)\,dr \\[10pt]
\int w(r) I_x(p-r)I_y(p-r)\,dr & \int w(r) (I_y(p-r))^2\,dr
\end{bmatrix}
</math>
where <math>I_x</math> and <math>I_y</math> are the [[partial derivative]]s of <math>I</math> with respect to ''x'' and ''y''; the integrals range over the plane <math>\mathbb{R}^2</math>; and ''w'' is some fixed "window function" (such as a [[Gaussian blur]]), a [[distribution (mathematics)|distribution]] on two variables. Note that the matrix <math>S_w</math> is itself a function of {{math|1=''p'' = (''x'', ''y'')}}.
 
The formula above can be written also as <math display="inline">S_w(p) = \int w(r) S_0(p-r) \, dr</math>, where <math>S_0</math> is the matrix-valued function defined by
===Gradient===
<math display="block">
[[Image:CircuitAndEdge.png|300px|thumb]]
S_0(p)=
Gradient information serves several purposes. It can relate
\begin{bmatrix}
the structure of objects in an image, identify features of interest
(I_x(p))^2 & I_x(p)I_y(p) \\[10pt]
for recognition/classification directly or provide the basis of further
I_x(p)I_y(p) & (I_y(p))^2
processing for various computer vision tasks. For example, "edges," noted
\end{bmatrix}
by regions of high gradient magnitude, are central to the task of
</math>
identifying defects in circuitry. A sample "edge-detected" image created using
the 'Image Processing Toolbox' for MATLAB is shown where
locations marked by white are those points that are indicative of high
gradient magnitude, which can also be described as regions of high
pixel contrast.
 
If the [[gradient]] <math>\nabla I = (I_x,I_y)^\text{T}</math> of <math>I</math> is viewed as a 2×1 (single-column) matrix, where <math>(\cdot)^\text{T}</math> denotes [[transpose]] operation, turning a row vector to a column vector, the matrix <math>S_0</math> can be written as the [[matrix product]] <math>(\nabla I)(\nabla I)^\text{T}</math> or [[tensor product|tensor or outer product]] <math>\nabla I \otimes \nabla I</math>. Note however that the structure tensor <math>S_w(p)</math> cannot be factored in this way in general except if <math>w</math> is a [[Dirac delta function]].
===Directional derivative===
[[Image:DDoneArrow.jpg|thumb|300px]]
[[Image:StepDDorient.jpg|thumb|300px]]
[[Image:CoherenceCases.png|thumb|300px]]
When representing gradient information derived from an image, there are
several options from which to choose. One such choice is the
[[directional derivative]] that provides a vector representation.
Its magnitude reflects the maximum change in pixel values while the phase
is directed along the orientation corresponding to the maximum change.
These two components are calculated as per Equations (1) and (2) respectively:
 
===Discrete version===
In image processing and other similar applications, the function <math>I</math> is usually given as a discrete [[array data structure|array]] of samples <math>I[p]</math>, where ''p'' is a pair of integer indices. The 2D structure tensor at a given [[pixel]] is usually taken to be the discrete sum
 
<math display="block">
[[Image:DDmagEqn.png]]
S_w[p] =
\begin{bmatrix}
\sum_r w[r] (I_x[p-r])^2 & \sum_r w[r] I_x[p-r]I_y[p-r] \\[10pt]
\sum_r w[r] I_x[p-r]I_y[p-r] & \sum_r w[r] (I_y[p-r])^2
\end{bmatrix}
</math>
 
Here the summation index ''r'' ranges over a finite set of index pairs (the "window", typically <math>\{-m \ldots +m\}\times\{-m \ldots +m\}</math> for some ''m''), and ''w''[''r''] is a fixed "window weight" that depends on ''r'', such that the sum of all weights is 1. The values <math>I_x[p],I_y[p]</math> are the partial derivatives sampled at pixel ''p''; which, for instance, may be estimated from by <math>I</math> by [[finite difference]] formulas.
 
The formula of the structure tensor can be written also as <math display="inline">S_w[p]=\sum_r w[r] S_0[p-r]</math>, where <math>S_0</math> is the matrix-valued array such that
[[Image:DDphaseEqn.png]]
<math display="block">
S_0[p] =
\begin{bmatrix}
(I_x[p])^2 & I_x[p]I_y[p] \\[10pt]
I_x[p]I_y[p] & (I_y[p])^2
\end{bmatrix}
</math>
 
===Interpretation===
The importance of the 2D structure tensor <math>S_w</math> stems from the fact [[eigenvalue]]s <math>\lambda_1,\lambda_2</math> (which can be ordered so that <math>\lambda_1 \geq \lambda_2\geq 0</math>) and the corresponding [[eigenvector]]s <math>e_1,e_2</math> summarize the distribution of the [[gradient]] <math>\nabla I = (I_x,I_y)</math> of <math>I</math> within the window defined by <math>w</math> centered at <math>p</math>.<ref name=bigun86/><ref name=bigun87/><ref name=knutsson89/>
 
whereNamely, if <math>I_x\lambda_1 > \lambda_2</math>, denotes the partial derivative of imagethen <math>Ie_1</math> along the(or <math>x-e_1</math>-axis.) is Anthe direction that is maximally aligned with the gradient within the window.
example is shown against a horizontal step-edge image with the directional
derivative overlaid with a red arrow.
The direction of the gradient in this case is based purely on the ordering
of the pixel values. i.e. it points from black to white. If these colors
were reversed, the gradient magnitude would remain the same; however, the
direction would then be reversed such that it again pointed FROM the black
TO the white region.
In applications where the gradient is used to denote contours within the scene,
such as object outlines, the polarity is of little use. Nevertheless, this
orientation reveals the "normal" to the contour in question. "Orientation,"
in this context, implies PI-periodic behavior, as in the term "horizontal
orientation." i.e. orientation ranges between <math>[0,\pi)</math>. To reflect this
periodicity, the directional derivative, when applied to the extraction
of gradient-based structures, should also have an arrow pointed in the
opposite direction.
 
In particular, if <math>\lambda_1 > 0, \lambda_2 = 0</math> then the gradient is always a multiple of <math>e_1</math> (positive, negative or zero); this is the case if and only if <math>I</math> within the window varies along the direction <math>e_1</math> but is constant along <math>e_2</math>. This condition of eigenvalues is also called linear symmetry condition because then the iso-curves of <math>I</math> consist in parallel lines, i.e there exists a one dimensional function <math>g </math> which can generate the two dimensional function <math>I</math> as <math>I(x,y)=g(d^\text{T} p)</math> for some constant vector <math>d=(d_x,d_y)^T </math> and the coordinates <math>p=(x,y)^T </math>.
Although the directional derivative is relatively computational inexpensive,
it does possess a weakness. This is best illustrated with an example.
Given an isotropic structure, where there is no preferred direction of
gradient, the directional derivative formula results in a zero magnitude.
An example of such an isotropic structure is a black circle on a white
background. There is clearly gradient information; however, since there
is no preferred phase, it zeros itself out.
 
If <math>\lambda_1 = \lambda_2</math>, on the other hand, the gradient in the window has no predominant direction; which happens, for instance, when the image has [[rotational symmetry]] within that window. This condition of eigenvalues is also called balanced body, or directional equilibrium condition because it holds when all gradient directions in the window are equally frequent/probable.
There are several ways to calculate the partial derivatives of the image.
For example, simple differencing between neighboring pixels one option,
although the gradient will only be representative of a <math>2x1</math> region of
interest. The 'Difference of Gaussians' (DoG) is a common technique that
convolves a specialized mask to calculate the partial derivative value over
a larger region of interest.
 
Furthermore, the condition <math>\lambda_1 = \lambda_2 = 0</math> happens if and only if the function <math>I</math> is constant (<math>\nabla I = (0,0)</math>) within <math>W</math>.
The same output is reached if the original input is a uniformly colored
region. Again, the directional derivative magnitude is zero as there is
no gradient information with which to calculate.
What is needed is a representation capable of discerning between these
two examples that properly reflects the presence of gradient information
in the first case, but none in the second.
 
More generally, the value of <math>\lambda_k </math>, for ''k''=1 or ''k''=2, is the <math>w</math>-weighted average, in the neighborhood of ''p'', of the square of the [[directional derivative]] of <math>I</math> along <math>e_k</math>. The relative discrepancy between the two eigenvalues of <math>S_w</math> is an indicator of the degree of [[isotropy|anisotropy]] of the gradient in the window, namely how strongly is it biased towards a particular direction (and its opposite).<ref name="Jahne1993">
==Structure tensors==
{{cite book|author=B. Jahne|title=Spatio-Temporal Image Processing: Theory and Scientific Applications| ___location=Berlin| publisher=Springer-Verlag| volume=751|year=1993}}
</ref><ref name=MedioniEA>
===Overview===
{{cite book|author1=G. Medioni, M. Lee |author2=C. Tang |name-list-style=amp |title=A Computational Framework for Feature Extraction and Segmentation|publisher=Elsevier Science|date=March 2000}}
A different method of representing gradient information is by using the
</ref> This attribute can be quantified by the '''coherence''', defined as
"structure tensor." It also makes use of the <math>I_x</math> and <math>I_y</math> values, however,
in a matrix form. The term [[tensor]] refers to a representation of an array
of data. The "rank" of a tensor denotes the number of indices needed to
describe it. For example, a scalar value <math>x</math> is a single value, hence
requires no indices. This means that a scalar is a tensor of rank zero.
A vector <math>\vec v</math>, often represented as <math>v_i = \{v_1,v_2,..v_n\}</math> uses a single index, <math>i</math>.
This implies that a vector is a tensor of rank one. A two-dimensional
matrix <math>M_{ij}</math> is a tensor of rank two and so and so forth.
 
<math display="block">c_w=\left(\frac{\lambda_1-\lambda_2}{\lambda_1+\lambda_2}\right)^2</math>
===2D structure tensor===
The structure tensor matrix is formed as:
 
if <math>\lambda_2>0</math>. This quantity is 1 when the gradient is totally aligned, and 0 when it has no preferred direction. The formula is undefined, even in the [[limit of a function|limit]], when the image is constant in the window (<math>\lambda_1=\lambda_2=0</math>). Some authors define it as 0 in that case.
:[[Image:STeqn2D.png]]
Eigen-decomposition is then applied to the structure tensor matrix <math>S</math> to
form the eigenvalues and eigenvectors <math>(\lambda_1, \lambda_2)</math> and <math>(\vec e_1, \vec e_2)</math> respectively. These new gradient features allow a more precise description of the local gradient characteristics. For example, <math>\vec e_1</math>, is a [[unit vector]] directed normal to the gradient edge, while the <math>\vec e_2</math> vector is tangent. The eigenvalues indicate the underlying certainty of the gradient structure along their associated eigenvector directions. As noted by several researchers, the "coherence" is obtained as a function of the eigen-values<ref>
{{cite book|author=B. Jahne|title=Spatio-Temporal Image Processing: Theory and Scientific Applications|___location=Berlin
|publisher=Springer-Verlag|volume=751|year=1993}}</ref>
<ref>{{cite book|author=G. Medioni, M. Lee and C. Tang|title=A Computational Framework for Feature Extraction
and Segmentation|publisher=Elsevier Science|month=March|year=2000}}</ref>
<ref>{{cite journal|author=D. Tschumperle and Deriche|title=Diffusion PDE's on Vector-Valued Images|booktitle=IEEE Signal Processing Magazine|pages=16–25|month=September|year=2002}}</ref>. This value is capable of distinguishing between the isotropic and uniform
cases. The coherence is calculated as per the following equation:
 
Note that the average of the gradient <math>\nabla I</math> inside the window is '''not''' a good indicator of anisotropy. Aligned but oppositely oriented gradient vectors would cancel out in this average, whereas in the structure tensor they are properly added together.<ref>
:[[Image:CoherenceEqn.png]]
{{cite tech report|author1=T. Brox |author2= J. Weickert |author3=B. Burgeth |author4=P. Mrazek |name-list-style=amp |title=Nonlinear Structure Tensors|institution=Universität des Saarlandes|number=113|year=2004}}
</ref> This is a reason for why <math>(\nabla I)(\nabla I)^\text{T}</math> is used in the averaging of the structure tensor to optimize the direction instead of <math>\nabla I</math>.
If a different method is used to obtain the partial derivative information, for example using Horn's approach where the average, absolute pixel difference is used, the coherence for the isotropic region is one while the uniform case is zero: opposite of the values obtained above. What is key is that no matter the partial derivative method used, the coherence, a feature of the structure tensor, is able to distinguish between the two cases.
 
By expanding the effective radius of the window function <math>w</math> (that is, increasing its variance), one can make the structure tensor more robust in the face of noise, at the cost of diminished spatial resolution.<ref name=MedioniEA /><ref name=lin94book>T. Lindeberg (1993), ''[http://www.csc.kth.se/~tony/book.html Scale-Space Theory in Computer Vision]''. Kluwer Academic Publishers, (see sections 14.4.1 and 14.2.3 on pages 359–360 and 355–356 for detailed statements about how the multi-scale second-moment matrix/structure tensor defines a true and uniquely determined multi-scale representation of directional data).
Many [[corner detection]] and [[salient point ___location]] algorithms <ref>{{cite journal
</ref> The formal basis for this property is described in more detail below, where it is shown that a multi-scale formulation of the structure tensor, referred to as the [[Structure tensor#The multi-scale structure tensor|multi-scale structure tensor]], constitutes a ''true multi-scale representation of directional data under variations of the spatial extent of the window function''.
|author=W. Forstner|title=A Feature Based Correspondence Algorithm for Image Processing
|booktitle=International Archives of Photogrammetry and Remote Sensing|volume=26|pages=150–166|year=1986}}</ref>
<ref>{{cite conference|author=C. Harris and M. Stephens|title=A Combined Corner and Edge Detector
|booktitle=Proc. of the 4th ALVEY Vision Conference|pages=147–151|year=1988}}</ref>
<ref>{{cite journal|author=K. Rohr|title=On 3D Differential Operators for Detecting Point Landmarks
|booktitle=Image and Vision Computing|volume=15|issue=3|pages=219–233|year=1997}}</ref>
<ref>{{cite conference|author=B. Triggs|title=Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes
|booktitle=Proc. European Conference on Computer Vision|volume=4|pages=100–113|year=2004}}</ref> make use of the eigenvalues that are derived from the structure tensor
to quantify the certainty in the measurements <ref>{{cite conference|author=C. Kenney, M. Zuliani and B. Manjunath, |title=An Axiomatic Approach to Corner Detection|booktitle=Proc. IEEE Computer Vision and Pattern Recognition|pages=191–197|year=2005}}</ref>
There are other advantages to using the structure tensor representation as
well. For example, local shifting of edge locations in minimized when
applying a [[Gaussian smoothing operator]] element-by-element to the structure
tensor <ref name="Medioni">{{cite conference|author=M. Nicolescu and G. Medioni
|title=Motion Segmentation with Accurate Boundaries — A Tensor Voting Approach|booktitle=Proc. IEEE Computer Vision and Pattern Recognition|volume=1|pages=pp.382–389|year=2003}}</ref>. Furthermore, the cancellations of
opposing gradient polarity directions are prevented when structure tensors
are summed <ref>{{cite journal|author=T. Brox, J. Weickert, B. Burgeth and P. Mrazek|title=Nonlinear Structure Tensors|booktitle=Universitat des Saarlandes, Tech. Report|issue=113|pages=1–32|year=2004}}</ref>.
 
===3DComplex structure tensorsversion===
The interpretation and implementation of the 2D structure tensor becomes particularly accessible using [[complex number]]s.<ref name="bigun87" /> The structure tensor consists in 3 real numbers
====Overview====
[[Image:STgeneric.png|thumb|Elliptical representation of structure tensor]]
This form of gradient representation serves especially well in inferring
structure from sparse and noisy data [Medioni et al. 2000]. The structure
tensor is also applicable to higher dimensional data. For example, given
three-dimensional data, such as that from a spatio-temporal volume <math>(x,y,time)</math>
or medical imaging input <math>(x,y,z)</math>, the structure tensor is represented as
follows:
 
<math display="block">
[[Image:STeqn3D.png]]
S_w(p) =
\begin{bmatrix}
The eigen-decomposition of the tensor of rank two results in <math>(\lambda_1, \lambda_2, \lambda_3)</math> and
\mu_{20} & \mu_{11} \\[10pt]
<math>(\vec e_1, \vec e_2, \vec e_3)</math> for the eigenvalues and eigenvectors respectively. The
\mu_{11} & \mu_{02}
interpretation of these components can be visualized as 3D ellipses where
\end{bmatrix}
the radii are equal to the eigenvalues in descending order and directed
</math>
along their corresponding eigenvectors.
 
where <math display="inline">
====3d structure types====
\mu_{20} =\int (w(r) (I_x(p-r))^2\,dr
The differences between the eigenvalues indicate underlying structure as
</math> , <math display="inline">
well. For example, if the value of <math>(\lambda_1 - \lambda_2)>>0</math>, this depicts a "surfel"
\mu_{02} =\int (w(r) (I_y(p-r))^2\,dr
surface element, where <math>e_1</math> is the normal to the surface<ref name="Medioni"/>.
</math> and <math display="inline">
\mu_{11} =\int w(r) I_x(p-r)I_y(p-r)\,dr
</math> in which integrals can be replaced by summations for discrete representation. Using [[Parseval's identity]] it is clear that the three real numbers are the second order moments of the power spectrum of <math>I</math>. The following second order complex moment of the power spectrum of <math>I</math> can then be written as
 
<math display="block">
<gallery>
\kappa_{20} = \mu_{20}-\mu_{02}+i2\mu_{11}=\int w(r) (I_x(p-r)+i I_y(p-r))^2 \, dr =(\lambda_1-\lambda_2)\exp(i2\phi)
Image:STsurfel.png |A local curve element, "curvel," is identified as <math>(\lambda_2 - \lambda_3)>>0</math> where <math>e_3</math> is tangent to the curve.
</math>
Image:STcurvel.png|If <math>\lambda_3>>0</math> then there is presence of an isotropic behavior, similar to a spherical structure or uniform region.
Image:STball.png
</gallery>
 
where <math>i=\sqrt{-1}</math> and <math>\phi</math> is the direction angle of the most significant eigenvector of the structure tensor <math>\phi=\angle{e_1}</math> whereas <math>\lambda_1</math> and <math>\lambda_2</math> are the most and the least significant eigenvalues. From, this it follows that <math>\kappa_{20} </math> contains both a certainty <math>|\kappa_{20}|=\lambda_1-\lambda_2</math> and the optimal direction in double angle representation since it is a complex number consisting of two real numbers. It follows also that if the gradient is represented as a complex number, and is remapped by squaring (i.e. the argument angles of the complex gradient is doubled), then averaging acts as an optimizer in the mapped ___domain, since it directly delivers both the optimal direction (in double angle representation) and the associated certainty. The complex number represents thus how much linear structure (linear symmetry) there is in image <math>I</math>, and the complex number is obtained directly by averaging the gradient in its (complex) double angle representation without computing the eigenvalues and the eigenvectors explicitly.
====3D results====
 
Applied against actual data
Likewise the following second order complex moment of the power spectrum of <math>I</math>, which happens to be always real because <math>I</math> is real,
<gallery>
 
Image:StepPlane3D.png |Step plane case
<math display="block">
Image:StepPlane3DST.png|Associated structure tensor in elliptical form
\kappa_{11} = \mu_{20}+\mu_{02} = \int w(r) |I_x(p-r)+i I_y(p-r)|^2 \, dr = \lambda_1+\lambda_2
Image:curve3D.png|Curve case
</math>
Image:curve3DST.png|Associated structure tensor in elliptical form
 
Image:Sphere3D.png|Sphere case
can be obtained, with <math>\lambda_1</math> and <math>\lambda_2</math> being the eigenvalues as before. Notice that this time the magnitude of the complex gradient is squared (which is always real).
Image:Sphere3DST.png|Associated structure tensor in elliptical form
 
</gallery>
However, decomposing the structure tensor in its eigenvectors yields its tensor components as
 
====Tensor addition====
<math display="block">
[[Image:TensorAddition.png|thumb|Tensor addition of sphere and step-edge case]]
S_w(p) =
Another desirable property of the structure tensor form is that the tensor
\lambda_1 e_1e_1^\text{T}+
addition equates itself to the adding of the elliptical forms. For example,
\lambda_2 e_2e_2^\text{T} =(\lambda_1 -\lambda_2)e_1e_1^\text{T}+
if the structure tensors for the sphere case and step-edge case are added,
\lambda_2( e_1e_1^\text{T}+e_2e_2^\text{T})= (\lambda_1 -\lambda_2)e_1e_1^\text{T}+
the resulting structure tensor is an elongated ellipsed along the direction
\lambda_2 E
of the step-edge case.<br clear="all"/>
</math>
 
where <math>E</math> is the identity matrix in 2D because the two eigenvectors are always orthogonal (and sum to unity). The first term in the last expression of the decomposition, <math>(\lambda_1 -\lambda_2)e_1e_1^\text{T}</math>, represents the linear symmetry component of the structure tensor containing all directional information (as a rank-1 matrix), whereas the second term represents the balanced body component of the tensor, which lacks any directional information (containing an identity matrix <math>E</math>). To know how much directional information there is in <math>I</math> is then the same as checking how large <math>\lambda_1-\lambda_2 </math> is compared to <math>\lambda_2</math>.
 
Evidently, <math>\kappa_{20}</math> is the complex equivalent of the first term in the tensor decomposition, whereas <math display="block">\tfrac 1 2 (|\kappa_{20}|-\kappa_{11}) = \lambda_2</math>is the equivalent of the second term. Thus the two scalars, comprising three real numbers,
 
<math display="block">
\begin{align}
\kappa_{20} &=& (\lambda_1-\lambda_2)\exp(i2\phi) &= w*(h*I)^2 \\
\kappa_{11} &=& \lambda_1+\lambda_2&=w*|h*I|^2 \\
\end{align}
</math>
where <math>h(x,y)=(x+iy)\exp(-(x^2+y^2)/(2\sigma^2)) </math> is the (complex) gradient filter, and <math>*</math> is convolution, constitute a complex representation of the 2D Structure Tensor. As discussed here and elsewhere <math>w</math> defines the local image which is usually a Gaussian (with a certain variance defining the outer scale), and <math>\sigma </math> is the (inner scale) parameter determining the effective frequency range in which the orientation <math>2\phi</math> is to be estimated.
 
The elegance of the complex representation stems from that the two components of the structure tensor can be obtained as averages and independently. In turn, this means that <math>\kappa_{20}</math> and <math>\kappa_{11}</math> can be used in a scale space representation to describe the evidence for presence of unique orientation and the evidence for the alternative hypothesis, the presence of multiple balanced orientations, without computing the eigenvectors and eigenvalues. A functional, such as squaring the complex numbers have to this date not been shown to exist for structure tensors with dimensions higher than two. In Bigun 91, it has been put forward with due argument that this is because complex numbers are commutative algebras whereas quaternions, the possible candidate to construct such a functional by, constitute a non-commutative algebra.<ref name=bigun91>
{{cite journal|author1=J. Bigun |author2=G. Granlund |author3=J. Wiklund |name-list-style=amp |title=Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow| journal= IEEE Transactions on Pattern Analysis and Machine Intelligence|volume=13|number=8|pages=775–790|year=1991 |doi=10.1109/34.85668}}</ref>
 
The complex representation of the structure tensor is frequently used in fingerprint analysis to obtain direction maps containing certainties which in turn are used to enhance them, to find the locations of the global (cores and deltas) and local (minutia) singularities, as well as automatically evaluate the quality of the fingerprints.
 
==The 3D structure tensor==
 
===Definition===
The structure tensor can be defined also for a function <math>I</math> of three variables ''p''=(''x'',''y'',''z'') in an entirely analogous way. Namely, in the continuous version we have <math display="inline">S_w(p) = \int w(r) S_0(p-r) \, dr</math>, where
<math display="block">
S_0(p) =
\begin{bmatrix}
(I_x(p))^2 & I_x(p)I_y(p) & I_x(p)I_z(p) \\[10pt]
I_x(p)I_y(p) & (I_y(p))^2 & I_y(p)I_z(p) \\[10pt]
I_x(p)I_z(p) & I_y(p)I_z(p) & (I_z(p))^2
\end{bmatrix}
</math>
where <math>I_x,I_y,I_z</math> are the three partial derivatives of <math>I</math>, and the integral ranges over <math>\mathbb{R}^3</math>.
 
In the discrete version,<math display="inline">S_w[p] = \sum_r w[r] S_0[p-r]</math>, where
<math display="block">
S_0[p] =
\begin{bmatrix}
(I_x[p])^2 & I_x[p]I_y[p] & I_x[p]I_z[p] \\[10pt]
I_x[p]I_y[p] & (I_y[p])^2 & I_y[p]I_z[p]\\[10pt]
I_x[p]I_z[p] & I_y[p]I_z[p] & (I_z[p])^2
\end{bmatrix}
</math>
and the sum ranges over a finite set of 3D indices, usually <math>\{-m \ldots +m\} \times \{-m \ldots +m\} \times \{-m \ldots +m\}</math> for some {{mvar|m}}.
 
===Interpretation===
As in the two-dimensional case, the eigenvalues <math>\lambda_1,\lambda_2,\lambda_3</math> of <math>S_w[p]</math>, and the corresponding eigenvectors <math>\hat{e}_1,\hat{e}_2,\hat{e}_3</math>, summarize the distribution of gradient directions within the neighborhood of ''p'' defined by the window <math>w</math>. This information can be visualized as an [[ellipsoid]] whose semi-axes are equal to the eigenvalues and directed along their corresponding eigenvectors.<ref name="Medioni"/><ref>{{Cite journal | last1=Westin|first1=C.-F. | last2=Maier|first2=S.E. | last3=Mamata|first3=H. | last4=Nabavi|first4=A. | last5=Jolesz|first5=F.A. | last6=Kikinis|first6=R. | date=June 2002 | title=Processing and visualization for diffusion tensor MRI | url=https://linkinghub.elsevier.com/retrieve/pii/S1361841502000531 | journal = Medical Image Analysis | language=en | volume=6 | issue=2 | pages=93–108 | doi=10.1016/S1361-8415(02)00053-1 | pmid=12044998| url-access=subscription }}</ref>
 
[[File:STgeneric.png|thumb|center|240px|Ellipsoidal representation of the 3D structure tensor.]]
 
In particular, if the ellipsoid is stretched along one axis only, like a cigar (that is, if <math>\lambda_1</math> is much larger than both <math>\lambda_2</math> and <math>\lambda_3</math>), it means that the gradient in the window is predominantly aligned with the direction <math>e_1</math>, so that the [[isosurface]]s of <math>I</math> tend to be flat and perpendicular to that vector. This situation occurs, for instance, when ''p'' lies on a thin plate-like feature, or on the smooth boundary between two regions with contrasting values.
 
{| cellborder=0px border=0px style="margin: 1em auto;"
|- valign=top
| [[File:STsurfel.png|thumb|180px|The structure tensor ellipsoid of a surface-like neighborhood ("[[surfel]]"), where <math>\lambda_1 >\!> \lambda_2 \approx \lambda_3</math>.]]||[[File:StepPlane3D.png|thumb|180px|A 3D window straddling a smooth boundary surface between two uniform regions of a 3D image.]]||[[File:StepPlane3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
 
If the ellipsoid is flattened in one direction only, like a pancake (that is, if <math>\lambda_3</math> is much smaller than both <math>\lambda_1</math> and <math>\lambda_2</math>), it means that the gradient directions are spread out but perpendicular to <math>e_3</math>; so that the isosurfaces tend to be like tubes parallel to that vector. This situation occurs, for instance, when ''p'' lies on a thin line-like feature, or on a sharp corner of the boundary between two regions with contrasting values.
 
{| cellborder=0px border=0px style="margin: 1em auto;"
|- valign=top
| [[File:STcurvel.png|thumb|180px|The structure tensor of a line-like neighborhood ("curvel"), where <math>\lambda_1 \approx \lambda_2 >\!> \lambda_3</math>.]]||[[File:curve3D.png|thumb|180px|A 3D window straddling a line-like feature of a 3D image.]]||[[File:curve3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
 
Finally, if the ellipsoid is roughly spherical (that is, if <math>\lambda_1\approx\lambda_2\approx\lambda_3</math>), it means that the gradient directions in the window are more or less evenly distributed, with no marked preference; so that the function <math>I</math> is mostly isotropic in that neighborhood. This happens, for instance, when the function has [[spherical symmetry]] in the neighborhood of ''p''. In particular, if the ellipsoid degenerates to a point (that is, if the three eigenvalues are zero), it means that <math>I</math> is constant (has zero gradient) within the window.
 
{| cellborder=0px border=0px style="margin: 1em auto;"
|- valign=top
| [[File:STball.png|thumb|180px|The structure tensor in an isotropic neighborhood, where <math>\lambda_1\approx\lambda_2\approx\lambda_3</math>.]]||[[File:Sphere3D.png|thumb|180px|A 3D window containing a spherical feature of a 3D image.]]||[[File:Sphere3DST.png|thumb|180px|The corresponding structure tensor ellipsoid.]]
|}
 
==The multi-scale structure tensor==
The structure tensor is an important tool in [[scale space]] analysis. The '''multi-scale structure tensor''' (or '''multi-scale second moment matrix''') of a function <math>I</math> is in contrast to other one-parameter scale-space features an image descriptor that is defined over ''two'' scale parameters.
One scale parameter, referred to as ''local scale'' <math>t</math>, is needed for determining the amount of pre-smoothing when computing the image gradient <math>(\nabla I)(x; t)</math>. Another scale parameter, referred to as ''integration scale'' <math>s</math>, is needed for specifying the spatial extent of the window function <math>w(\xi; s)</math> that determines the weights for the region in space over which the components of the outer product of the gradient by itself <math>(\nabla I)(\nabla I)^\text{T}</math> are accumulated.
 
More precisely, suppose that <math>I</math> is a real-valued signal defined over <math>\mathbb{R}^k</math>. For any local scale <math>t > 0</math>, let a multi-scale representation <math>I(x; t)</math> of this signal be given by <math>I(x; t) = h(x; t)*I(x)</math> where <math>h(x; t)</math> represents a pre-smoothing kernel. Furthermore, let <math>(\nabla I)(x; t)</math> denote the gradient of the [[scale space representation]].
Then, the ''multi-scale structure tensor/second-moment matrix'' is defined by<ref name=lin94book/><ref name=lingar97>{{cite journal
| author1=T. Lindeberg | author2=J. Garding
| name-list-style=amp
| title=Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure
| journal=Image and Vision Computing
| year=1997
| volume=15
| pages=415–434
| url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A472972&dswid=7790
| doi=10.1016/S0262-8856(97)01144-X
| issue=6
}}</ref><ref name=garlin96>
J. Garding and T. Lindeberg (1996). ''[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A473351&dswid=-5409 "Direct computation of shape cues using scale-adapted spatial derivative operators]'', International Journal of Computer Vision, volume 17, issue 2, pages 163–191.
</ref>
<math display="block">
\mu(x; t, s) =
\int_{\xi \in \mathbb{R}^k}
(\nabla I)(x-\xi; t) \, (\nabla I)^\text{T}(x-\xi; t) \,
w(\xi; s) \, d\xi
</math>
Conceptually, one may ask if it would be sufficient to use any self-similar families of smoothing functions <math>h(x; t)</math> and <math>w(\xi; s)</math>. If one naively would apply, for example, a box filter, however, then non-desirable artifacts could easily occur. If one wants the multi-scale structure tensor to be well-behaved over both increasing local scales <math>t</math> and increasing integration scales <math>s</math>, then it can be shown that both the smoothing function and the window function ''have to'' be Gaussian.<ref name=lin94book/> The conditions that specify this uniqueness are similar to the [[scale-space axioms]] that are used for deriving the uniqueness of the Gaussian kernel for a regular Gaussian [[scale space]] of image intensities.
 
There are different ways of handling the two-parameter scale variations in this family of image descriptors. If we keep the local scale parameter <math>t</math> fixed and apply increasingly broadened versions of the window function by increasing the integration scale parameter <math>s</math> only, then we obtain a ''true formal [[scale space representation]] of the directional data computed at the given local scale'' <math>t</math>.<ref name=lin94book/> If we couple the local scale and integration scale by a ''relative integration scale'' <math>r \geq 1</math>, such that <math>s = r t</math> then for any fixed value of <math>r</math>, we obtain a reduced self-similar one-parameter variation, which is frequently used to simplify computational algorithms, for example in [[corner detection]], [[interest point detection]], [[texture analysis]] and [[image registration|image matching]].
By varying the relative integration scale <math>r \geq 1</math> in such a self-similar scale variation, we obtain another alternative way of parameterizing the multi-scale nature of directional data obtained by increasing the integration scale.
 
A conceptually similar construction can be performed for discrete signals, with the convolution integral replaced by a convolution sum and with the continuous Gaussian kernel <math> g(x; t)</math> replaced by the [[discrete Gaussian kernel]] <math>T(n; t)</math>:
<math display="block">
\mu(x; t, s) =
\sum_{n \in \mathbb{Z}^k}
(\nabla I)(x-n; t) \, (\nabla I)^\text{T}(x-n; t) \,
w(n; s)
</math>
When quantizing the scale parameters <math>t</math> and <math>s</math> in an actual implementation, a finite geometric progression <math>\alpha^i</math> is usually used, with ''i'' ranging from 0 to some maximum scale index ''m''. Thus, the discrete scale levels will bear certain similarities to [[pyramid (image processing)|image pyramid]], although spatial subsampling may not necessarily be used in order to preserve more accurate data for subsequent processing stages.
 
==Applications==
The eigenvalues of the structure tensor play a significant role in many image processing algorithms, for problems like [[corner detection]], [[interest point detection]], and [[feature tracking]].<ref name="Medioni">
Although structure tensors are applicable to many ND domains, it is in the image
{{cite conference|author1=M. Nicolescu |author2=G. Medioni |name-list-style=amp |chapter=Motion Segmentation with Accurate Boundaries – A Tensor Voting Approach|title=Proc. IEEE Computer Vision and Pattern Recognition| volume=1| pages=382–389| year=2003}}
processing / computer vision domains that is of considerable interest. Using
</ref><ref>
gradient-based structure tensors, local patterns of contours and surfaces may be
{{cite journal
inferred through a diffusion process <ref>
|author=W. Förstner|title=A Feature Based Correspondence Algorithm for Image Processing
{{cite conference|author=S. Arseneau and J. Cooperstock|title=An Asymmetrical Diffusion Framework for Junction
|journal=International Archives of Photogrammetry and Remote Sensing|volume=26|pages=150–166|year=1986}}
Analysis|booktitle=British Machine Vision Conference|volume=2|pages=689–698|month=September|year=2006}}</ref>.
</ref><ref>
Diffusion aids to enforce local
{{cite conference|author1=C. Harris |author2=M. Stephens |name-list-style=amp |chapter=A Combined Corner and Edge Detector
structural estimates that serve such applications as defect detection in lumber,
|title=Proc. of the 4th ALVEY Vision Conference|pages=147–151|year=1988}}
occlusion detection in image sequences and aid in biometric identification of
</ref><ref>
fingerprints <ref>{{cite conference|author=S. Arseneau, and J. Cooperstock|title=An Improved Representation of Junctions through Asymmetric Tensor Diffusion|booktitle=International Symposium on Visual Computing|month=November|year=2006}}</ref>.
{{cite journal|author=K. Rohr|title=On 3D Differential Operators for Detecting Point Landmarks
|journal=Image and Vision Computing|volume=15|issue=3|pages=219–233|year=1997 | doi=10.1016/S0262-8856(96)01127-4}}
</ref><ref>
{{cite conference|author1=I. Laptev |author2=T. Lindeberg |name-list-style=amp |chapter=Space–time interest points
|title=International Conference on Computer Vision ICCV'03|chapter-url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A442088&dswid=2038 |doi=10.1109/ICCV.2003.1238378|pages=432–439|volume=I|year=2003}}
</ref><ref>
{{cite conference|author=B. Triggs|chapter=Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes
|title=Proc. European Conference on Computer Vision|volume=4|pages=100–113|year=2004}}
</ref><ref>
{{cite conference|author1=C. Kenney, M. Zuliani |author2=B. Manjunath |name-list-style=amp |chapter=An Axiomatic Approach to Corner Detection|title=Proc. IEEE Computer Vision and Pattern Recognition|pages=191–197|year=2005}}
</ref> The structure tensor also plays a central role in the [[Lucas–Kanade Optical Flow Method|Lucas-Kanade optical flow algorithm]], and in its extensions to estimate [[affine shape adaptation]];<ref name=lingar97/> where the magnitude of <math>\lambda_2</math> is an indicator of the reliability of the computed result. The tensor has been used for [[scale space]] analysis,<ref name=lin94book/> estimation of local surface orientation from monocular or binocular cues,<ref name=garlin96/> non-linear [[fingerprint enhancement]],<ref>
A. Almansa and T. Lindeberg (2000), ''[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A338874&dswid=-9161 Enhancement of fingerprint images using shape-adaptated scale-space operators]''. IEEE Transactions on Image Processing, volume 9, number 12, pages 2027–2042.
</ref> [[diffusion-based image processing]],<ref>[http://www.mia.uni-saarland.de/weickert/book.html J. Weickert (1998), Anisotropic diffusion in image processing, Teuber Verlag, Stuttgart.]</ref><ref>
{{cite journal|author1=D. Tschumperle | author2= R. Deriche| name-list-style=amp | title=Diffusion PDEs on Vector-Valued Images|journal=IEEE Signal Processing Magazine|pages=16–25|volume=19|issue=5|doi=10.1109/MSP.2002.1028349 | date=September 2002| bibcode= 2002ISPM...19...16T}}
</ref><ref>
{{cite conference|author1=S. Arseneau |author2=J. Cooperstock |name-list-style=amp |chapter=An Asymmetrical Diffusion Framework for Junction Analysis|title=British Machine Vision Conference|volume=2|pages=689–698|date=September 2006}}
</ref><ref>
{{cite conference|author1=S. Arseneau |author2=J. Cooperstock |name-list-style=amp |chapter=An Improved Representation of Junctions through Asymmetric Tensor Diffusion|title=International Symposium on Visual Computing|date=November 2006}}
</ref> and several other image processing problems. The structure tensor can be also applied in [[geology]] to filter [[Seismology|seismic]] data.<ref>{{Cite journal| last1=Yang|first1=Shuai| last2=Chen|first2=Anqing| last3=Chen|first3=Hongde| date=2017-05-25|title=Seismic data filtering using non-local means algorithm based on structure tensor| journal=Open Geosciences| volume=9|issue=1|pages=151–160|doi=10.1515/geo-2017-0013| issn=2391-5447| bibcode=2017OGeo....9...13Y| s2cid=134392619|doi-access=free}}</ref>
 
===Processing spatio-temporal video data with the structure tensor===
==References==
<references/>
 
The three-dimensional structure tensor has been used to analyze three-dimensional video data (viewed as a function of ''x'', ''y'', and time ''t'').<ref name="Jahne1993" />
==See Also==
If one in this context aims at image descriptors that are ''invariant'' under [[Galilean transformation]]s, to make it possible to compare image measurements that have been obtained under variations of a priori unknown image velocities <math>v = (v_x, v_y)^\text{T}</math>
<math display="block"> \begin{bmatrix} x' \\ y' \\ t' \end{bmatrix} = G \begin{bmatrix} x \\ y \\ t \end{bmatrix} = \begin{bmatrix} x - v_x \, t \\ y - v_y \, t \\ t \end{bmatrix} ,</math>
it is, however, from a computational viewpoint preferable to parameterize the components in the structure tensor/second-moment matrix <math>S</math> using the notion of ''Galilean diagonalization''<ref name=lin04icpr>
{{cite conference|author1=T. Lindeberg |author2=A. Akbarzadeh |author3=I. Laptev |name-list-style=amp |chapter=Galilean-corrected spatio-temporal interest operators|title=International Conference on Pattern Recognition ICPR'04|chapter-url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A441205&dswid=-545 |doi=10.1109/ICPR.2004.1334004|date=August 2004|volume=I| pages=57–62}}
</ref>
<math display="block"> S' = R_\text{space}^{-\text{T}} \, G^{-\text{T}} \, S \, G^{-1} \, R_\text{space}^{-1} = \begin{bmatrix} \nu_1 & \, & \, \\ \, & \nu_2 & \, \\ \, & \, & \nu_3 \end{bmatrix} </math>
where <math>G</math> denotes a Galilean transformation of spacetime and <math>R_\text{space}</math> a two-dimensional rotation over the spatial ___domain,
compared to the abovementioned use of eigenvalues of a 3-D structure tensor, which corresponds to an eigenvalue decomposition and a (non-physical) three-dimensional rotation of spacetime
<math display="block"> S'' = R_\text{spacetime}^{-\text{T}} \, S \, R_\text{spacetime}^{-1} = \begin{bmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & & \lambda_3 \end{bmatrix} .</math>
To obtain true Galilean invariance, however, also the shape of the spatio-temporal window function needs to be adapted,<ref name=lin04icpr/><ref>
{{cite conference|author1=I. Laptev |author2=T. Lindeberg |name-list-style=amp |title=Velocity adaptation of space–time interest points |conference=International Conference on Pattern Recognition ICPR'04 |url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A451230&dswid=-7763|doi=10.1109/ICPR.2004.971|date=August 2004|volume=I| pages=52–56}}
</ref> corresponding to the transfer of [[affine shape adaptation]]<ref name=lingar97/> from spatial to spatio-temporal image data.
In combination with local spatio-temporal histogram descriptors,<ref>
{{cite conference|author1=I. Laptev |author2=T. Lindeberg |name-list-style=amp |title=Local descriptors for spatio-temporal recognition|conference=ECCV'04 Workshop on Spatial Coherence for Visual Motion Analysis (Prague, Czech Republic) Springer Lecture Notes in Computer Science|url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A445261&dswid=-1233| doi=10.1007/11676959|date=May 2004|volume=3667| pages=91–103|url-access=subscription}}
</ref>
these concepts together allow for Galilean invariant recognition of spatio-temporal events.<ref>
{{cite conference|author1=I. Laptev |author2=B. Caputo |author3=C. Schuldt |author4=T. Lindeberg |name-list-style=amp |chapter=Local velocity-adapted motion events for spatio-temporal recognition |title=Computer Vision and Image Understanding|chapter-url=http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A335153&dswid=7950 | doi=10.1016/j.cviu.2006.11.023|year=2007|volume=108| pages= 207–229}}</ref>
 
==See also==
*[[Tensor]]
*[[Tensor operator]]
*[[Directional derivative]]
*[[Gaussian]]
*[[Corner detection]]
*[[Edge detection]]
*[[Lucas Kanade method|Lucas-Kanade method]]
*[[Affine shape adaptation]]
*[[Generalized structure tensor]]
 
==References==
[[Category:Tensors]]
<references/>
[[Category:Image processing]]
 
==Resources==
*[http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=12362&objectType=FILE Download MATLAB Source]
*[httphttps://www.cimcs.mcgillcmu.caedu/~arseneausarsen/structureTensorTutorial/ Structure Tensor Tutorial (Original)]
 
{{DEFAULTSORT:Structure Tensor}}
[[Category:Tensors]]
[[Category:Feature detection (computer vision)]]