Generalized structure tensor

In image analysis, the generalized structure tensor (GST) is an extension of the Cartesian structure tensor to curvilinear coordinates.^[1] It is mainly used to detect and to represent the "direction" parameters of curves, just as the Cartesian structure tensor detects and represents the direction in Cartesian coordinates. Curve families generated by pairs of locally orthogonal functions have been the best studied.

It is a widely known method in applicatons of image and video processing including computer vision, such as biometric identification by fingerprints,^[2] and studies of human tissue sections.^[3] ^[4]

GST in 2D and locally orthogonal bases

Let the term image represent a function $f(\xi (x,y),\eta (x,y))$ where $x,y$ are real variables and $\xi ,\eta$ , and $f$ , are real valued functions. GST represents the direction along which the image $f$ can undergo an infinitesimal translation with minimal (total least squares) error, along the "lines" fulfilling the following conditions:

1. The "lines" are ordinary lines in the curvilinear coordinate basis $\xi ,\eta$

\cos(\theta )\xi (x,y)+\sin(\theta )\eta (x,y)={\text{constant}}

which are curves in Cartesian coordinates as depicted by the equation above. The error is measured in the $L^{2}$ sense and the minimality of the error refers thereby to L2 norm.

2. The functions $\xi (x,y),\eta (x,y)$ constitute a harmonic pair, i.e. they fulfill Cauchy–Riemann equations,

{\begin{aligned}&{\frac {\partial \xi }{\partial x}}=-{\frac {\partial \eta }{\partial y}},\\[4pt]&{\frac {\partial \xi }{\partial y}}={\frac {\partial \eta }{\partial x}}.\end{aligned}}

Accordingly, such curvilinear coordinates $\xi ,\eta$ are locally orthogonal.

Then GST consists in

GST=(\lambda _{max}-\lambda _{min})\left[{\begin{array}{c}\cos(\theta )\\\sin(\theta )\\\end{array}}\right][\cos(\theta ),\sin(\theta )]+\lambda _{min}I

where $0\leq \lambda _{min}\leq \lambda _{max}$ are the (infinitesimal) errors of translation in the best direction (designated by the angle $\theta$ ) and the worst direction (designated by $\theta +\pi /2$ ). The matrix $I$ is the identity matrix.

Thus the Cartesian Structure tensor is a special case of the GST where $\xi =x$ , and $\eta =y$ .

Basic concept for its use in image processing and computer vision

Efficient detection of $\theta$ in images is possible by image processing for a pair $\xi$ , $\eta$ . Complex convolutions (or the corresponding matrix operations) and point-wise non-linear mappings are the basic computational elements of GST implementations. A total least square error estimation of $2\theta$ is obtained along with the two errors, $\lambda _{max}$ and $\lambda _{min}$ , in analogy with the Cartesian Structure tensor. The estimated $2\theta$ can be used as a shape feature whereas $\lambda _{max}-\lambda _{min}$ alone or in combination with $\lambda _{max}+\lambda _{min}$ can be used as a quality (confidence, certainty) measure for the estimation.

Logarithmic spirals, including circles, can for instance be detected by (complex) convolutions and non-linear mappings.^[1] The spirals can be in gray (valued) images or in a binary image, i.e. locations of edge elements of the concerned patterns, such as contours of circles or spirals, must not be known or marked otherwise.

Generalized structure tensor can be used as an alternative to Hough transform in image processing and computer vision to detect patterns whose local orientations can be modelled, for example junction points. The main differences comprise:

Negative, as well as complex voting are allowed;
With one template multiple patterns belonging to the same family can be detected;
Image binarization is not required.

Physical and mathemical interpretation

The curvilinear coordinates of GST can explain physical processes applied to images. A well known pair of processes consist in rotation, and zooming. These are related to the coordinate transformation $\xi =\log({\sqrt {x^{2}+y^{2}}})$ and $\eta =\tan ^{-1}(x,y)$ .

If an image $f$ consists in iso-curves that can be explained by only $\xi$ i.e. its iso-curves consist in circles $f(\xi ,\eta )=g(\xi )$ , where $g$ is any real valued differentiable function defined on 1D, the image is invariant to rotations (around the origin).

Zooming (comprising unzooming) operation is modeled similarly. If the image has iso-curves that look like a "star" or bicycle spokes, i.e. $f(\xi ,\eta )=g(\eta )$ for some differentable 1D function $g$ then, the image $f$ is invariant to scaling (w.r.t. the origin).

In combination,

$f(\xi ,\eta )=g(\cos(\theta )\log({\sqrt {x^{2}+y^{2}}})+\sin(\theta )\tan ^{-1}(x,y))$

is invariant to a certain amount of rotation combined with scaling, where the amount is precised by the parameter $\theta$ .

Analogously, the Cartesian structure tensor is a representation of a translation too. Here the physical process consists in an ordinary translation of a certain amount along $x$ combined with translation along $y$ ,

\cos(\theta )x+\sin(\theta )y={\text{constant}}

where the amount is specified by the parameter $\theta$ . Evidently $\theta$ here represents the direction of the line.

Generally, the estimated $\theta$ represents the direction (in $\xi ,\eta$ coordinates) along which infinitisemal translations leave the image invariant, in practice least variant. With every curvilinear coordinate basis pair, there is thus a pair of infinitesimal translators, a linear combination of which is a Differential operator. The latter are related to Lie algebra.

Miscelenous

Image in the context of the GST means both an ordinary image and an image neighborhood therein (local image), the context determining. For example, a photograph as well as any neighborhood of it are images.

References

^ ^a ^b J. Bigun and T. Bigun and K. Nilsson (2004). "Recognition by symmetry derivatives and the generalized structure tensor". IEEE trans. Pattern Analysis and Machine Intelligence. Vol. 26. pp. 1590–1605.
^ H. Fronthaler and K. Kollreider and J. Bigun (2008). "Local features for enhancement and minutiae extraction in fingerprints". Image Processing, IEEE Transactions on. Vol. 17, no. 3. IEEE. pp. 354–363. ISSN 1057-7149.
^ O. Schmitt, H. Birkholz (2010). "Improvement in cytoarchitectonic mapping by combining electrodynamic modeling with local orientation in high-resolution images of the cerebral cortex". Microsc. Res. Tech. Vol. 74. pp. 225–243.
^ O. Schmitt, M. Pakura, T. Aach, L. Homke, M. Bohme, S. Bock, S. Preusse (2004). "Analysis of nerve fibers and their distribution in histologic sections of the human brain". Microsc. Res. Tech. Vol. 63. pp. 220–243.{{cite news}}: CS1 maint: multiple names: authors list (link)

Resources

[bigun04pami3-1] J. Bigun and T. Bigun and K. Nilsson (2004). "Recognition by symmetry derivatives and the generalized structure tensor". IEEE trans. Pattern Analysis and Machine Intelligence. Vol. 26. pp. 1590–1605.

[fronthaler08tip-2] H. Fronthaler and K. Kollreider and J. Bigun (2008). "Local features for enhancement and minutiae extraction in fingerprints". Image Processing, IEEE Transactions on. Vol. 17, no. 3. IEEE. pp. 354–363. ISSN 1057-7149.

[Schmitt-3] O. Schmitt, H. Birkholz (2010). "Improvement in cytoarchitectonic mapping by combining electrodynamic modeling with local orientation in high-resolution images of the cerebral cortex". Microsc. Res. Tech. Vol. 74. pp. 225–243.

[Schmitt2-4] O. Schmitt, M. Pakura, T. Aach, L. Homke, M. Bohme, S. Bock, S. Preusse (2004). "Analysis of nerve fibers and their distribution in histologic sections of the human brain". Microsc. Res. Tech. Vol. 63. pp. 220–243.{{cite news}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]