Pyramid (image processing): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:34, 19 May 2011 edit Papadim.G (talk \| contribs) 282 edits No edit summary ← Previous edit		Latest revision as of 05:09, 17 April 2025 edit undo Citation bot (talk \| contribs) Bots 5,872,438 edits Add: bibcode, journal. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Linked from User:LinguisticMystic/cs/outline \| #UCB_webform_linked 1676/2277
(93 intermediate revisions by 54 users not shown)
Line 1: {{Short description\|Type of multi-scale signal representation}} [[File:image pyramid.svg\|thumb\|upright=1.2\|Visual representation of an image pyramid with 5 levels]] {{FeatureDetectionCompVisNavbox}} '''Pyramid''', or ''''pyramid representation'''', is a type of [[Scale model\|multi-scale]] [[Signal (information theory)\|signal]] [[Knowledge representation\|representation]] developed by the [[computer vision]], [[image processing]] and [[signal processing]] communities, in which a signal or an image is subject to repeated [[smoothing]] and [[Downsampling\|subsampling]]. ~~Historically, pyramid~~Pyramid representation is a predecessor to [[scale space]]\|scale-space representation]] and [[multiresolution analysis]]. ==Pyramid generation== There are two main types of pyramids: lowpass and bandpass. There are two main types of pyramids; lowpass pyramids and bandpass pyramids. A ''lowpass pyramid'' is generated by first smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of two along each coordinate direction. This smoothed image is then subjected to the same processing, resulting in a yet smaller image. As this process proceeds, the result will be a set of gradually more smoothed images, where in addition the spatial sampling density decreases level by level. If illustrated graphically, this multi-scale representation will look like a pyramid, from which the name has been obtained. A ''bandpass pyramid'' is obtained by forming the difference between adjacent levels in a pyramid, where in addition some kind of interpolation is performed between representations at adjacent levels of resolution, to enable the computation of pixelwise differences. A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times. Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution). If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle's resulting smaller image stacked one atop the other. A bandpass pyramid is made by forming the difference between images at adjacent levels in the pyramid and performing image interpolation between adjacent levels of resolution, to enable computation of pixelwise differences.<ref> E.H. Andelson and C.H. Anderson and J.R. Bergen and P.J. Burt and J.M. Ogden. [http://persci.mit.edu/pub_pdfs/RCA84.pdf "Pyramid methods in image processing"]. 1984. </ref> ==Pyramid generation kernels== A variety of different smoothing [[Kernel (image processing)\|kernels]] have been proposed for generating pyramids.<ref>{{Cite journal A variety of different smoothing kernels have proposed for generating pyramids.<ref>Burt, P.J. "Fast filter transforms for image processing", Computer Vision, Graphics and Image Processing, vol 16, pages 20-51, 1981.</ref><ref name=Crowley1981>Crowley, James "A representation for visual information", PhD thesis, Carnegie-Mellon University, Robotics Institute, Pittsburgh, Pennsylvania 1981.</ref><ref>Burt, Peter and Adelson, Ted, "[http://web.mit.edu/persci/people/adelson/pub_pdfs/pyramid83.pdf The Laplacian Pyramid as a Compact Image Code]", IEEE Trans. Communications, 9:4, 532–540, 1983.</ref><ref>Crowley, J. and Parker, A.C, "A Representation for Shape Based on Peaks and Ridges in the Difference of Low Pass Transform", IEEE Transactions on PAMI, 6(2), pp 156-170, March 1984.</ref><ref>Crowley, J. L. and Sanderson, A. C. "[http://www-prima.inrialpes.fr/Prima/Homepages/jlc/papers/Crowley-Sanderson-PAMI87.pdf Multiple resolution representation and probabilistic matching of 2-D gray-scale shape]", IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1), pp 113-121, 1987.</ref><ref>P. Meer, E. S. Baugher and A. Rosenfeld "Frequency ___domain analysis and synthesis of image generating kernels", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 9, pages 512-522, 1987.</ref> Among the suggestions that have been given, the ''binomial kernels'' arising from the [[binomial coefficient]]s stand out as a particularly useful and theoretically well-founded class.<ref name=Crowley1981/><ref>Lindeberg, Tony, "[http://www.nada.kth.se/~tony/abstracts/Lin90-PAMI.html Scale-space for discrete signals]," PAMI(12), No. 3, March 1990, pp. 234-254.</ref><ref>Lindeberg, Tony. [http://www.nada.kth.se/~tony/book.html Scale-Space Theory in Computer Vision], Kluwer Academic Publishers, 1994, ISBN 0-7923-9418-6</ref><ref>See the article on [[multi-scale approaches]] for a very brief theoretical statement</ref> Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If motivatived by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to an ''oversampled'' or ''hybrid pyramid''. With the increasing computational efficiency of [[CPU]]s available today, it is in some situations also feasible to use wider support [[Gaussian filter]]s as smoothing kernels in the pyramid generation steps.▼ \| last1 = Burt \| first1 = P. J. \| doi = 10.1016/0146-664X(81)90092-7 \| title = Fast filter transform for image processing \| journal = Computer Graphics and Image Processing \| volume = 16 \| pages = 20–51 \|date=May 1981 }}</ref><ref name=Crowley1981>{{Cite journal \|last=Crowley \|first=James L. \|title=A representation for visual information \|journal=Interim Report Carnegie-Mellon Univ \|publisher=Carnegie-Mellon University, Robotics Institute \|date=November 1981 \|bibcode=1981cmu..reptR....C \|id=tech. report CMU-RI-TR-82-07 \|url=http://www.ri.cmu.edu/publication_view.html?pub_id=37}}</ref><ref>{{cite journal \| last1 = Burt \| first1 = Peter \| last2 = Adelson \| first2 = Ted \| year = 1983 \| title = The Laplacian Pyramid as a Compact Image Code \| url = http://persci.mit.edu/pub_pdfs/pyramid83.pdf\| journal = IEEE Transactions on Communications\| volume = 9 \| issue = 4\| pages = 532–540 \| doi = 10.1109/TCOM.1983.1095851 \| citeseerx = 10.1.1.54.299 \| s2cid = 8018433 }}</ref><ref>{{Cite journal \| last1 = Crowley \| first1 = J. L. \| last2 = Parker \| first2 = A. C. \| author2-link = Alice C. Parker \| title = A representation for shape based on peaks and ridges in the difference of low-pass transform \| journal = IEEE Transactions on Pattern Analysis and Machine Intelligence \| volume = 6 \| issue = 2 \| pages = 156–170 \|date=March 1984 \| pmid = 21869180 \| doi = 10.1109/TPAMI.1984.4767500 \| citeseerx = 10.1.1.161.3102 \| s2cid = 14348919 ▲A}}</ref><ref>{{cite ~~variety~~journal of\| ~~different~~last1 ~~smoothing~~= ~~kernels~~Crowley ~~have~~\| ~~proposed~~first1 ~~for~~= ~~generating pyramids~~J.~~<ref>Burt,~~ ~~P.J~~L. ~~"Fast~~\| ~~filter~~last2 ~~transforms~~= ~~for~~Sanderson ~~image~~\| ~~processing",~~first2 ~~Computer~~= ~~Vision,~~A. ~~Graphics~~C. ~~and~~\| ~~Image~~year ~~Processing,~~= ~~vol~~1987 ~~16,~~\| ~~pages~~title ~~20-51, 1981.</ref><ref name~~=~~Crowley1981>Crowley,~~ ~~James~~Multiple "Aresolution representation ~~for~~and ~~visual~~probabilistic ~~information",~~matching ~~PhD~~of ~~thesis,~~2-D ~~Carnegie~~gray-~~Mellon~~scale ~~University,~~shape ~~Robotics~~\| ~~Institute,~~url ~~Pittsburgh,~~= ~~Pennsylvania 1981.</ref><ref>Burt, Peter and Adelson, Ted, "[~~http://~~web~~www-prima.~~mit~~inrialpes.~~edu~~fr/~~persci~~Prima/~~people~~Homepages/~~adelson~~jlc/~~pub_pdfs~~papers/~~pyramid83~~Crowley-Sanderson-PAMI87.pdf\| ~~The~~journal ~~Laplacian Pyramid as a Compact Image Code]",~~= IEEE ~~Trans.~~Transactions ~~Communications,~~on ~~9:4,~~Pattern ~~532–540,~~Analysis ~~1983.</ref><ref>Crowley,~~and J.Machine ~~and~~Intelligence ~~Parker,~~\| ~~A.C,~~volume "A= ~~Representation~~9 ~~for~~\| ~~Shape~~issue ~~Based~~= on1\| ~~Peaks~~pages ~~and~~= ~~Ridges~~113–121 in\| ~~the~~doi ~~Difference~~= of10.1109/tpami.1987.4767876 ~~Low~~\| ~~Pass~~pmid ~~Transform",~~= ~~IEEE~~21869381 ~~Transactions~~\| onciteseerx ~~PAMI,~~= ~~6(2),~~10.1.1.1015.9294 pp\| ~~156-170,~~s2cid ~~March~~= ~~1984.~~14999508 }}</ref><ref>~~Crowley,~~{{cite J.journal L.\| ~~and~~last1 ~~Sanderson,~~= A.Meer C.\| ~~"[http://www-prima.inrialpes.fr/Prima/Homepages/jlc/papers/Crowley-Sanderson-PAMI87.pdf~~first1 ~~Multiple~~= ~~resolution~~P. ~~representation~~\| ~~and~~last2 ~~probabilistic~~= ~~matching~~Baugher of\| ~~2-D~~first2 ~~gray-scale~~= ~~shape]",~~E. ~~IEEE~~S. ~~Transactions~~\| onlast3 ~~Pattern~~= ~~Analysis~~Rosenfeld ~~and~~\| ~~Machine~~first3 ~~Intelligence,~~= ~~9(1), pp 113-121, 1987.</ref><ref>P~~A. ~~Meer,~~\| E.year S.= ~~Baugher~~1987 ~~and~~\| A.title ~~Rosenfeld~~= "Frequency ___domain analysis and synthesis of image generating kernels", \| doi = 10.1109/tpami.1987.4767939 \| journal = IEEE Transactions on Pattern Analysis and Machine Intelligence, ~~vol~~\| volume = 9, \| issue = 4\| pages ~~512-522,~~= ~~1987.~~512–522 \| pmid = 21869409 \| s2cid = 5978760 }}</ref> Among the suggestions that have been given, the ''binomial kernels'' arising from the [[binomial coefficient]]s stand out as a particularly useful and theoretically well-founded class.<ref name=Crowley1981/><ref>Lindeberg, Tony, "[http://~~www~~kth.~~nada~~diva-portal.~~kth.se~~org/~~~tony~~smash/~~abstracts/Lin90-PAMI~~record.~~html~~jsf?pid=diva2%3A472968&dswid=77 Scale-space for discrete signals]," PAMI(12), No. 3, March 1990, pp. 234-254.</ref><ref>{{cite journal \| last1 = Haddad \| first1 = R. A. \| last2 = Akansu \| first2 = A. N. \| date = March 1991 \| title = A Class of Fast Gaussian Binomial Filters for Speech and Image Processing \| url = https://web.njit.edu/~akansu/PAPERS/Haddad-AkansuFastGaussianBinomialFiltersIEEE-TSP-March1991.pdf \| journal = IEEE Transactions on Signal Processing \| volume = 39 \| issue = 3\| pages = 723–727\| doi = 10.1109/78.80892 \| bibcode = 1991ITSP...39..723H }}</ref><ref>Lindeberg, Tony. [http://www.~~nada~~csc.kth.se/~tony/book.html Scale-Space Theory in Computer Vision], Kluwer Academic Publishers, 1994, {{ISBN \|0-7923-9418-6}} (see specifically Chapter 2 for an overview of Gaussian and Laplacian image pyramids and Chapter 3 for theory about generalized binomial kernels and discrete Gaussian kernels)</ref><ref name=LinBre03-ScSp/><ref>See the article on [[multi-scale approaches]] for a very brief theoretical statement</ref> Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If ~~motivatived~~motivated by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to an ''oversampled'' or ''hybrid pyramid''.<ref name=LinBre03-ScSp/> With the increasing computational efficiency of [[CPU]]s available today, it is in some situations also feasible to use wider ~~support~~supported [[Gaussian filter]]s as smoothing kernels in the pyramid generation steps. ===Gaussian pyramid=== In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average ([[Gaussian blur]]) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid. This technique is used especially in [[texture synthesis]]. ===Laplacian pyramid=== A Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels. Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels. This technique can be used in [[image compression]].<ref>{{cite journal \| last1 = Burt \| first1 = Peter J. \| last2 = Adelson \| first2 = Edward H. \| year = 1983 \| title = The Laplacian Pyramid as a Compact Image Code \| url = http://persci.mit.edu/pub_pdfs/pyramid83.pdf \| journal = IEEE Transactions on Communications \| volume = 31\| issue = 4\| pages = 532–540\| doi = 10.1109/TCOM.1983.1095851 \| citeseerx = 10.1.1.54.299 \| s2cid = 8018433 }}</ref> ===Steerable pyramid=== A steerable pyramid, developed by [[Eero Simoncelli\|Simoncelli]] and others, is an implementation of a multi-scale, multi-orientation [[band-pass filter]] bank used for applications including [[image compression]], [[texture synthesis]], and [[Outline of object recognition\|object recognition]]. It can be thought of as an orientation selective version of a Laplacian pyramid, in which a bank of [[steerable filter]]s are used at each level of the pyramid instead of a single Laplacian or [[Gaussian filter]].<ref>{{Cite web \|first=Eero \|last=Simoncelli \|url=http://www.cns.nyu.edu/~eero/STEERPYR/ \|title=The Steerable Pyramid \|publisher=cns.nyu.edu }}</ref><ref>{{Cite web \|first1=Roberto \|last1=Manduchi \|first2=Pietro \|last2=Perona \|first3=Doug \|last3=Shy \|title=Efficient Deformable Filter Banks \|url=http://www.vision.caltech.edu/publications/ManduchiPeronaShy_efficient_deformable.pdf \|publisher=[[California Institute of Technology]]/[[University of Padua]] \|year=1997 }} <br />Also in {{Cite journal \|journal= IEEE Transactions on Signal Processing\|title=Efficient Deformable Filter Banks \|volume=46 \|issue=4 \|pages=1168–1173 \|year=1998 \|doi=10.1109/78.668570\|last1=Manduchi \|first1=R. \|last2=Perona \|first2=P. \|last3=Shy \|first3=D. \|bibcode=1998ITSP...46.1168M \|citeseerx=10.1.1.5.3102 }}</ref><ref>{{cite book \| doi=10.1117/12.274510 \| chapter=Seven models of masking \| title=Human Vision and Electronic Imaging II \| date=1997 \| editor-last1=Rogowitz \| editor-first1=Bernice E. \| last1=Klein \| first1=Stanley A. \| last2=Carney \| first2=Thom \| last3=Barghout-Stein \| first3=Lauren \| last4=Tyler \| first4=Christopher W. \| volume=3016 \| pages=13–24 \| s2cid=8366504 \| editor-first2=Thrasyvoulos N. \| editor-last2=Pappas }}</ref> ==Applications of pyramids== ===Alternative representation=== In the early days of computer vision, pyramids were used as the main type of multi-scale representation for computing multi-scale image [[feature detection (computer vision)\|features]] from real-world image data. More recent techniques include [[scale space\|scale-space representation]], which has been popular among some researchers due to its theoretical foundation, the ability to decouple the subsampling stage from the multi-scale representation, the more powerful tools for theoretical analysis as well as the ability to compute a representation at ''any'' desired scale, thus avoiding the algorithmic problems of relating image representations at different resolution. Nevertheless, pyramids are still frequently used for expressing computationally efficient approximations to [[scale space\|scale-space representation]].<ref name=LinBre03-ScSp>~~Crowley~~Lindeberg, JT. and Bretzner, ~~Riff O~~L. [http://~~www~~kth.diva-~~prima~~portal.~~inrialpes.fr~~org/~~Prima~~smash/~~Homepages/jlc/papers/Crowley~~record.jsf?pid=diva2%3A440700&dswid=-~~ScaleSpace03.pdf~~2509 ~~Fast~~Real-time ~~computation~~scale ofselection ~~scale~~in ~~normalised~~hybrid ~~Gaussian~~multi-scale ~~receptive fields~~representations], Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 148-163, 2003.</ref><ref>~~Lindeberg~~Crowley, ~~T. and Bretzner~~J, LRiff O. [http://www-prima.~~nada~~inrialpes.~~kth.se~~fr/~~cvap~~Prima/~~abstracts~~Homepages/~~cvap279~~jlc/papers/Crowley-ScaleSpace03.~~html~~pdf ~~Real-time~~Fast ~~scale~~computation ~~selection~~of inscale ~~hybrid~~normalised ~~multi-scale~~Gaussian ~~representations~~receptive fields], Proc. Scale-Space'03, Isle of Skye, Scotland, Springer [[Lecture Notes in Computer Science]], volume 2695~~, pages 148-163~~, 2003.</ref><ref>{{cite journal \| last1 = Lowe, \| first1 = D. G., “[\| year = 2004 \| title = Distinctive image features from scale-invariant keypoints \| url = http://citeseer.ist.psu.edu/lowe04distinctive.html\| ~~Distinctive~~journal ~~image features from scale-invariant keypoints]”,~~= International Journal of Computer Vision, \| volume = 60, \| issue = 2,\| pppages = 91–110 \| doi=10.1023/B:VISI.0000029664.99615.94\| ~~91-110,~~citeseerx ~~2004~~= 10.1.1.73.2924 \| s2cid = 221242327 }}</ref> ===Detail manipulation=== Levels of a Laplacian pyramid can be added to or removed from the original image to amplify or reduce detail at different scales. However, detail manipulation of this form is known to produce halo artifacts in many cases, leading to the development of alternatives such as the [[bilateral filter]]. Some [[image compression]] file formats use the [[Adam7 algorithm]] or some other [[Interlacing (bitmaps)\|interlacing]] technique. These can be seen as a kind of image pyramid. Because those file format store the "large-scale" features first, and fine-grain details later in the file, a particular viewer displaying a small "thumbnail" or on a small screen can quickly download just enough of the image to display it in the available pixels—so one file can support many viewer resolutions, rather than having to store or generate a different file for each resolution. ==See also== * [[Mipmap]] * [[Scale space implementation]] * [[Level of detail (computer graphics)\|Level of detail]] * [[JPEG 2000#Multiple resolution representation]] ==References== Line 23 ⟶ 74: ==External links== * [http://fourier.eng.hmc.edu/e161/lectures/canny/node3.html Gaussian-Laplacian Pyramid Image Coding] - illustrates methods of [[Downsampling]], [[Upsampling]], and [[Gaussian function\|Gaussian]] [[convolution]] [http://www.cns.nyu.edu/~eero/steerpyr/ The Steerable Pyramid] by Eero Simoncelli [http://www.cse.yorku.ca/~kosta/CompVis_Notes/gaussian_pyramid.pdf The Gaussian Pyramid] - provides a brief introduction for the procedure and cites several sources * [http://www.prip.tuwien.ac.at/twist/irregulargraphpyramid.php Laplacian Irregular Graph Pyramid] - Figure 1 on this page illustrates an example of the Gaussian Pyramid * [https://archive.today/20130117073108/http://aspdf.com/ebook/the-laplacian-pyramid-as-a-compact-image-code-pdf.html The Laplacian Pyramid as a Compact Image Code] on [http://aspdf.com/ eBook Submission] [[Category:Image processing]] [[Category:Computer vision]] ~~[[ar:هرم الصور]]~~ ~~[[fr:Pyramide (traitement d'image)]]~~