Generalized structure tensor: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 3:
{{FeatureDetectionCompVisNavbox}}
 
Let the term image represent a function $<math>f(\xi(x,y),\eta(x,y))$</math> where $<math>\xi,\eta,x,y $</math>, and $<math>f$</math>, are real valued.
 
Generalized Structure Tensor, GST, is an extension of the Cartesian [[Structure Tensor]] to the [[Curvilinear coordinates]], $\xi,\eta$. It representsis themainly directionused alongas whicha theway imageto $f$detect canand undergoto anrepresent infinitesimalthe translation"direction" withparameters minimalof errorcurves, alongjust like the "lines"Cartesian fulfillingstructure tensor detects and represents the followingdirection conditions<refin name=bigun86>,<refCartesian name=bigun04pami3>:coordinates. Detection of curve families generated by pairs of locally orthogonal functions are best studied.
 
To be precise, GST represents the direction along which the image <math>f</math> can undergo an infinitesimal translation with minimal error, along the "lines" fulfilling the following conditions<ref name=bigun04pami3> {{cite article|
author = J. Bigun and T. Bigun and K. Nilsson|
title = Recognition by symmetry derivatives and the generalized structure tensor|
journal = IEEE trans. Pattern Analysis and Machine Intelligence|
pages = 1590--1605|
volume = 26|
year = 2004|
}}</ref>
:
 
1. The "lines" are ordinary lines in the curvilinear coordinate basis
$$ \cos(\theta) \xi(x,y)+\sin(\theta) \eta(x,y)= Constant$$
which are curves in Cartesian coordinates as depicted by the equation above. The error is measured in the $L^2$ norm and the minimality of the error refers to [[$L^2$ norm]].
 
<math> $$ \begin{align}\cos(\theta) \xi(x,y)+\sin(\theta) \eta(x,y)= Constant$$ \end{align}</math>
2. The functions $\xi(x,y), \eta(x,y)$ constitute a harmonic pair, i.e. they fulfill Cauchy-Riemann conditions. Thus, the curvilinear coordinates, of the Generalized Structure Tensor are locally orthogonal coordinates.
 
which are curves in Cartesian coordinates as depicted by the equation above. The error is measured in the $<math>L^2$</math> normsense and the minimality of the error refers thereby to [[$L^2$L2 norm]].
The curvilinear coordinates of GST are thereby invariants of physical processes, i.e. the latter transform the coordinates.
One of the most known such processes are in-plane rotations and zooming/dezoooming.
For the first process, it is related to the transformation $\xi=log(\sqrt{x^2+y^2})$. If any image $f$ consists in iso-curves that can be represented by circles, i.e. $f(\xi,\eta)=g(\xi)$, where $g $ is any real valued function defined on 1D, it is invariant to rotations around the origin. Likewise $f(\xi,\eta)=g(\eta)$ with $\eta=atan^{-1}(x,y)$ are invariant to scaling, i.e. zooming/dezooming with respect to the origin. Besides, $f(\xi,\eta)=g( \cos(\theta) \xi(x,y)+\sin(\theta) \eta(x,y))$ is invariant to a certain amount of rotation combined with scaling, where the amount is precised by the parameter $\theta$.
 
2. The functions $<math>\xi(x,y), \eta(x,y)$</math> constitute a harmonic pair, i.e. they fulfill Cauchy-Riemann conditions. Thus, the curvilinear coordinates, of the Generalized Structure Tensor using harmonic pairs are locally orthogonal coordinates.
The ordinary structure tensor is thus a representation of a translation too. Here the physical process is the ordinary translation, i.e. $\xi,\eta$ are the trival identity transformations, $\xi=x$, $\eta=y$.
 
Efficient detection of <math>\theta</math> in images is possible by image processing, if the pair <math>\xi</math>, <math>\eta</math> is given. Logarithmic spirals, including circles, can for instance be detected by (complex) convolutions<ref name=bigun04pami3 />. The spirals can be iso-curves in a gray valued image i.e. the image must not be a binary image, nor must its edges be marked.
 
The Generalized structure tensor can be used as an alternative to [[Hough Transform]] in [[image processing]] and [[computer vision]] to detect for example, circles, or junction points. The main differences comprise:
Image in the context of the GST means both an ordinary image and an image neighborhood therein (local image), the context determining. For example, a photograph as well as any neighborhood of it are images.
*Negative, as well as complex voting isare allowed,
*With one template multiple patterns belonging to the same family can be detected, because not nonly negative but also Complex Voting is allowed.
The curvilinear coordinates of GST can explain physical processes applied to images. Two of the most known such processes consist in rotation, and zooming. The first process is related to the transformation <math>\xi=\log(\sqrt{x^2+y^2})</math>. If an image <math>f</math> consists in iso-curves that can be explained by such a transformation, i.e. its iso-curves consist in circles <math>f(\xi,\eta)=g(\xi)</math>, where <math>g </math> is any real valued function defined on 1D, the image is invariant to rotations (around the origin).
 
Likewise the second process, zooming (comprising unzooming) is explained by <math>f(\xi,\eta)=g(\eta)</math> with <math>\eta=\tan^{-1}(x,y)</math>. Such a function <math>f</math> is invariant to scaling, i.e. zooming/dezooming (w.r.t. the origin).
<ref name=bigun86>
{{cite conference|
author=J. Bigun|
title= Pattern recognition by detection of local symmetries|
booktitle = Pattern recognition and artificial intelligence|
editor = E.S. Gelsema and L.N. Kanal|
publisher = North-Holland|
pages = 75-90|
year=1988
}}</ref>
 
In combination,
<ref name=bigun04pami3>
{{cite article|
author = J. Bigun and T. Bigun and K. Nilsson|
title = Recognition by symmetry derivatives and
the generalized structure tensor|
journal = IEEE trans. Pattern Analysis and Machine Intelligence|
pages = 1590--1605|
volume = 26|
year = 2004|
}}</ref>
 
<math>f(\xi,\eta)=g( \cos(\theta) \log(\sqrt{x^2+y^2})+\sin(\theta) \tan^{-1}(x,y))</math>
 
is invariant to a certain amount of rotation combined with scaling, where the amount is precised by the parameter <math>\theta</math>.
 
TheAnalogously, ordinarythe Cartesian structure tensor is thus a representation of a translation too. Here the physical process isconsists thein an ordinary translation, i.e.of a $\xi,\eta$certain areamount thealong trival<math>x</math> identitycombined transformations,with $\xi=x$,translation $\eta=along <math>y$.</math>,
 
<math> \begin{align}\cos(\theta) x+\sin(\theta) y= Constant \end{align}</math>
 
where the amount is precised by the parameter <math>\theta</math>. Evidently <math>\theta</math> here represents the direction of the line.
 
Image in the context of the GST means both an ordinary image and an image neighborhood therein (local image), the context determining. For example, a photograph as well as any neighborhood of it are images.
 
The Generalized structure tensor can be used as an alternative to [[Hough Transform]] in [[image processing]] and [[computer vision]]. The main differences comprise:
*Negative voting is allowed
*With one template multiple patterns belonging to the same family can be detected, because not nonly negative but also Complex Voting is allowed.
 
== See also ==
Line 58:
*[[Corner detection]]
*[[Edge detection]]
*[[Lucas Kanade method|Lucas-Kanade method]]
*[[Affine shape adaptation]]