Discrete cosine transform: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 08:25, 14 April 2025 edit 174.45.100.155 (talk) No edit summary Tags: Reverted references removed Mobile edit Mobile web edit ← Previous edit		Latest revision as of 19:07, 9 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →External links: HTTP to HTTPS for SourceForge Tag: AWB
(11 intermediate revisions by 7 users not shown)
Line 1: {{Short description\|Technique used in signal processing and data compression}} ~~{{Jagged 85 cleanup\|date=July 2022}}~~ A '''discrete cosine transform''' ('''DCT''') expresses a finite sequence of [[data points]] in terms of a sum of [[cosine]] functions oscillating at different [[frequency\|frequencies]]. The DCT, first proposed by [[Nasir Ahmed (engineer)\|Nasir Ahmed]] in 1972, is a widely used transformation technique in [[signal processing]] and [[data compression]]. It is used in most [[digital media]], including [[digital images]] (such as [[JPEG]] and [[HEIF]]), [[digital video]] (such as [[MPEG]] and {{nowrap\|[[H.26x]]}}), [[digital audio]] (such as [[Dolby Digital]], [[MP3]] and [[Advanced Audio Coding\|AAC]]), [[digital television]] (such as [[SDTV]], [[HDTV]] and [[Video on demand\|VOD]]), [[digital radio]] (such as [[AAC+]] and [[DAB+]]), and [[speech coding]] (such as [[AAC-LD]], [[Siren (codec)\|Siren]] and [[Opus (audio format)\|Opus]]). DCTs are also important to numerous other applications in [[science and engineering]], such as [[digital signal processing]], [[telecommunication]] devices, reducing [[network bandwidth]] usage, and [[spectral method]]s for the numerical solution of [[partial differential equations]]. Line 7 ⟶ 6: There are eight standard DCT variants, of which four are common. The most common variant of discrete cosine transform is the type-II DCT, which is often called simply ''the DCT''. This was the original DCT as first proposed by Ahmed. Its inverse, the type-III DCT, is correspondingly often called simply ''the inverse DCT'' or ''the IDCT''. Two related transforms are the [[discrete sine transform]] (DST), which is equivalent to a DFT of real and [[odd function]]s, and the [[modified discrete cosine transform]] (MDCT), which is based on a DCT of overlapping data. Multidimensional DCTs (MD DCTs) are developed to extend the concept of DCT to multidimensional signals. A variety of fast algorithms have been developed to reduce the computational complexity of implementing DCT. One of these is the integer DCT (IntDCT),<ref name="Stankovic"/> an [[integer]] approximation of the standard DCT,<ref name="Britanak2010" />{{rp\|pages= [https://books.google.com/books?id=iRlQHcK-r_kC&pg=PA141 ix, xiii, 1, 141–304]}} used in several [[ISO/IEC]] and [[ITU-T]] international standards.<ref name="Stankovic"/><ref name="Britanak2010"/> DCT compression, also known as block compression, compresses data in sets of discrete DCT blocks.<ref name="Alikhani"/> DCT blocks sizes including 8x8 [[pixels]] for the standard DCT, and varied integer DCT sizes between 4x4 and 32x32 pixels.<ref name="Stankovic"/><ref name="apple"/> The DCT has a strong ''energy compaction'' property,<ref name="pubDCT"/><ref name="pubRaoYip"/> capable of achieving high quality at high [[data compression ratio]]s.<ref name="Barbero"/><ref name="Lea">{{cite journal\|last1=Lea\|first1=William\|date=1994\|title=Video on demand: Research Paper 94/68\|url=https://researchbriefings.parliament.uk/ResearchBriefing/Summary/RP94-68\|journal=[[House of Commons Library]]\|access-date=20 September 2019}}</ref> However, blocky [[compression artifacts]] can appear when heavy DCT compression is applied. == History == The DCT was first conceived by [[Nasir Ahmed (engineer)\|Nasir Ahmed~~]], T. Natarajan and [[K. R. Rao~~]] while working at [[Kansas State University]]. The concept was proposed to the [[National Science Foundation]] in 1972. The DCT was originally intended for [[image compression]].<ref name="Ahmed" /><ref name="Stankovic"/> Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan~~, Wills Dietrich,~~ and ~~Jeremy Fries, and his friend Dr.~~ [[K. R. Rao]] at the [[University of Texas at Arlington]] in 1973.<ref name="Ahmed"/> They presented their results in a January 1974 paper, titled ''Discrete Cosine Transform''.<ref name="pubDCT" /><ref name="pubRaoYip" /><ref name="t81">{{cite web\|date=September 1992\|title=T.81 – Digital compression and coding of continuous-tone still images – Requirements and guidelines\|url=https://www.w3.org/Graphics/JPEG/itu-t81.pdf\|access-date=12 July 2019\|publisher=[[CCITT]]}}</ref> It described what is now called the type-II DCT (DCT-II),<ref name="Britanak2010" />{{rp\|page = [https://books.google.com/books?id=iRlQHcK-r_kC&pg=PA51 51]}} as well as the type-III inverse DCT (IDCT).<ref name="pubDCT"/> Since its introduction in 1974, there has been significant research on the DCT.<ref name="t81"/> In 1977, Wen-Hsiung Chen published a paper with C. Harrison Smith and Stanley C. Fralick presenting a fast DCT algorithm.<ref name="A Fast Computational Algorithm for">{{cite journal \|last1=Chen \|first1=Wen-Hsiung \|last2=Smith \|first2=C. H. \|last3=Fralick \|first3=S. C. \|title=A Fast Computational Algorithm for the Discrete Cosine Transform \|journal=[[IEEE Transactions on Communications]] \|date=September 1977 \|volume=25 \|issue=9 \|pages=1004–1009 \|doi=10.1109/TCOM.1977.1093941}}</ref><ref name="t81"/> Further developments include a 1978 paper by M. J. Narasimha and A. M. Peterson, and a 1984 paper by B. G. Lee.<ref name="t81"/> These research papers, along with the original 1974 Ahmed paper and the 1977 Chen paper, were cited by the [[Joint Photographic Experts Group]] as the basis for [[JPEG]]'s lossy image compression algorithm in 1992.<ref name="t81"/><ref name="chen">{{cite journal \|last1=Smith \|first1=C. \|last2=Fralick \|first2=S. \|title=A Fast Computational Algorithm for the Discrete Cosine Transform \|journal=IEEE Transactions on Communications \|date=1977 \|volume=25 \|issue=9 \|pages=1004–1009 \|doi=10.1109/TCOM.1977.1093941 \|issn=0090-6778}}</ref> Line 83 ⟶ 84: ===Visual media standards=== The DCT-II is an important image compression technique. It is used in image compression standards such as [[JPEG]], and [[video compression]] standards such as {{nowrap\|[[H.26x]]}}, [[MJPEG]], [[MPEG]], [[DV (video format)\|DV]], [[Theora]] and [[Daala]]. There, the two-dimensional DCT-II of <math>N \times N</math> blocks are computed and the results are [[Quantization (signal processing)\|quantized]] and [[Entropy encoding\|entropy coded]]. In this case, <math>N</math> is typically 8 and the DCT-II formula is applied to each row and column of the block. The result is an 8 × 8 transform coefficient array in which the <math>(0,0)</math> element (top-left) is the DC (zero-frequency) component and entries with increasing vertical and horizontal index values represent higher vertical and horizontal spatial frequencies. The integer DCT, an integer approximation of the DCT,<ref name="Britanak2010"/><ref name="Stankovic"/> is used in [[Advanced Video Coding]] (AVC),<ref name="Wang">{{cite journal \|last1=Wang \|first1=Hanli \|last2=Kwong \|first2=S. \|last3=Kok \|first3=C. \|title=Efficient prediction algorithm of integer DCT coefficients for {{nowrap\|H.264}}/AVC optimization \|journal=IEEE Transactions on Circuits and Systems for Video Technology \|date=2006 \|volume=16 \|issue=4 \|pages=547–552 \|doi=10.1109/TCSVT.2006.871390\|s2cid=2060937 }}</ref><ref name="Stankovic"/> introduced in 2003, and [[High Efficiency Video Coding]] (HEVC),<ref name="apple"/><ref name="Stankovic"/> introduced in 2013. The integer DCT is also used in the [[High Efficiency Image Format]] (HEIF), which uses a subset of the [[HEVC]] video coding format for coding still images.<ref name="apple"/> AVC uses 4 x 4 and 8 x 8 blocks. HEVC and HEIF use varied block sizes between 4 x 4 and 32 x 32 [[pixels]].<ref name="apple"/><ref name="Stankovic"/> {{As of\|2019}}, AVC is by far the most commonly used format for the recording, compression and distribution of video content, used by 91% of video developers, followed by HEVC which is used by 43% of developers.<ref name="Bitmovin">{{cite web \|url=https://cdn2.hubspot.net/hubfs/3411032/Bitmovin%20Magazine/Video%20Developer%20Report%202019/bitmovin-video-developer-report-2019.pdf \|title=Video Developer Report 2019 \|website=[[Bitmovin]] \|year=2019 \|access-date=5 November 2019}}</ref> ====Image formats==== Line 108 ⟶ 111: ! [[Video coding standard]] !! Year !! Common applications \|- \| {{nowrap\|[[H.261]]}}<ref name="video-standards">{{cite web\|first=Yao\|last=Wang\|archive-url=https://web.archive.org/web/20130123211453/http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt1.pdf\|archive-date=2013-01-23\|url=http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt1.pdf\|title=Video Coding Standards: Part I\|year=2006}}</ref><ref>{{cite web\|first=Yao\|last=Wang\|archive-url=https://web.archive.org/web/20130123211453/http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt2.pdf\|archive-date=2013-01-23\|url=http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt2.pdf\|title=Video Coding Standards: Part II\|year=2006}}</ref> \|\|1988\|\| First of a family of [[video coding standards]]. Used primarily in older [[video conferencing]] and [[video telephone]] products. ~~\| {{~~ \|- \| [[Motion JPEG]] (MJPEG)<ref>{{cite book \|last1=Hoffman \|first1=Roy \|title=Data Compression in Digital Systems \|date=2012 \|publisher=[[Springer Science & Business Media]] \|isbn=9781461560319 \|page=255 \|url=https://books.google.com/books?id=FOfTBwAAQBAJ}}</ref> \|\|1992\|\| [[QuickTime]], [[video editing]], [[non-linear editing]], [[digital cameras]] \|- \| [[MPEG-1]] Video<ref name="Rao">{{cite book \| last1 = Rao \| first1 = K.R. \| author-link1 = K. R. Rao \| last2 = Hwang \| first2 = J. J. \| date = 1996-07-18 \| title = Techniques and Standards for Image, Video, and Audio Coding \| language = en \| publisher = Prentice Hall \| at = JPEG: Chapter 8; {{nowrap\|H.261}}: Chapter 9; MPEG-1: Chapter 10; MPEG-2: Chapter 11 \| isbn = 978-0133099072 \| lccn = 96015550 \| oclc = 34617596 \| ol = OL978319M \| s2cid = 56983045 \| df = dmy-all}}</ref> \|\|1993\|\| [[Digital video]] distribution on [[CD]] or [[Internet video]] \|- \| [[MPEG-2 Video]] ({{nowrap\|H.262}})<ref name="Rao"/> \|\|1995\|\| Storage and handling of digital images in broadcast applications, [[digital television]], [[HDTV]], cable, satellite, high-speed [[Internet]], [[DVD]] video distribution \|- \| [[DV (video format)\|DV]] \|\|1995\|\| [[Camcorders]], [[digital cassettes]] \|- \| [[H.263]] ([[MPEG-4 Part 2]])<ref name="video-standards"/> \|\|1996\|\| [[Video telephony]] over [[public switched telephone network]] (PSTN), {{nowrap\|[[H.320]]}}, [[Integrated Services Digital Network]] (ISDN)<ref>{{cite news \|last1=Davis \|first1=Andrew \|title=The H.320 Recommendation Overview \|url=https://www.eetimes.com/document.asp?doc_id=1275886 \|access-date=7 November 2019 \|work=[[EE Times]] \|date=13 June 1997}}</ref><ref>{{cite book \|title=IEEE WESCANEX 97: communications, power, and computing : conference proceedings \|date=May 22–23, 1997 \|publisher=[[Institute of Electrical and Electronics Engineers]] \|___location=University of Manitoba, Winnipeg, Manitoba, Canada \|isbn=9780780341470 \|page=30 \|url=https://books.google.com/books?id=8vhEAQAAIAAJ \|quote={{~~now~~nowrap\|H.263}} is similar to, but more complex than {{nowrap\|H.261}}. It is currently the most widely used international video compression standard for video telephony on ISDN (Integrated Services Digital Network) telephone lines.}}</ref> \|- \| [[Advanced Video Coding]] (AVC, {{nowrap\|H.264}}, [[MPEG-4]])<ref name="Stankovic"/><ref name="Wang"/> \|\|2003\|\| Popular [[HD video]] recording, compression and distribution format, [[Internet video]], [[YouTube]], [[Blu-ray Discs]], [[HDTV]] broadcasts, [[web browsers]], [[streaming television]], [[mobile devices]], consumer devices, [[Netflix]],<ref name="Encodes">{{cite news \|author=Netflix Technology Blog \|title=More Efficient Mobile Encodes for Netflix Downloads \|url=https://medium.com/netflix-techblog/more-efficient-mobile-encodes-for-netflix-downloads-625d7b082909 \|access-date=20 October 2019 \|work=[[Medium.com]] \|publisher=[[Netflix]] \|date=19 April 2017}}</ref> [[video telephony]], [[FaceTime]]<ref name="AppleInsider standards 1">{{cite web\|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html\|date=June 8, 2010\|access-date=June 9, 2010\|title=Inside iPhone 4: FaceTime video calling\|publisher=[[Apple community#AppleInsider\|AppleInsider]]\|author=Daniel Eran Dilger}}</ref> \|- \| [[Theora]] \|\|2004\|\| Internet video, web browsers Line 274 ⟶ 286: \qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math> Some authors further multiply the <math>x_0 </math> and <math> x_{N-1} </math> terms by <math> \sqrt{2\,}\, ,</math> and correspondingly multiply the <math> X_0 </math> and <math> X_{N-1}</math> terms by <math> 1/\sqrt{2\,} \,,</math> which, if one further multiplies by an overall scale factor of <math> display="inline">\sqrt{\tfrac{2}{N-1\,}\,} ,</math>, makes the DCT-I matrix [[orthogonal matrix\|orthogonal]] but breaks the direct correspondence with a real-even [[Discrete Fourier transform\|DFT]]. The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of <math> 2(N-1) </math> real numbers with even symmetry. For example, a DCT-I of <math>N = 5 </math> real numbers <math> a\ b\ c\ d\ e </math> is exactly equivalent to a DFT of eight real numbers {{not a typo\|<math> a\ b\ c\ d\ e\ d\ c\ b </math>}} (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.) Note, however, that the DCT-I is not defined for <math> N </math> less than 2, while all other DCT types are defined for any positive <math> N .</math>. Thus, the DCT-I corresponds to the boundary conditions: <math> x_n </math> is even around <math> n = 0 </math> and even around <math> n = N - 1 </math>; similarly for <math> X_k .</math>~~<!--[[User:Kvng/RTH]]-->~~. === DCT-II === Line 287 ⟶ 299: \qquad \text{ for } ~ k = 0,\ \dots\ N-1 ~.</math> The DCT-II is probably the most commonly used form, and is often simply referred to as "the ''DCT"''.<ref name="pubDCT"/><ref name="pubRaoYip"/> This transform is exactly equivalent (up to an overall scale factor of 2) to a DFT of <math>4N</math> real inputs of even symmetry, where the even-indexed elements are zero. That is, it is half of the DFT of the <math>4N</math> inputs <math> y_n ,</math> where <math> y_{2n} = 0 ,</math>, <math> y_{2n+1} = x_n </math> for <math> 0 \leq n < N ,</math>, <math> y_{2N} = 0 ,</math>, and <math> y_{4N-n} = y_n </math> for <math> 0 < n < 2N .</math>. DCT-II transformation is also possible using ~~2{{mvar\|N}}~~<math>2N</math> signal followed by a multiplication by half shift. This is demonstrated by [[John Makhoul\|Makhoul]].{{cn\|date=April 2025}} Some authors further multiply the <math> X_0 </math> term by <math> 1/\sqrt{N\,} \, </math> and multiply the rest of the matrix by an overall scale factor of <math display="inline">\sqrt{{2}/{N}}</math> (see below for the corresponding change in DCT-III). This makes the DCT-II matrix [[orthogonal matrix\|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted input. This is the normalization used by [[Matlab]]~~, for example, see~~.<ref>{{cite web \|url=https://www.mathworks.com/help/signal/ref/dct.html \|title=Discrete cosine transform - MATLAB dct \|website=www.mathworks.com \|access-date=2019-07-11}}</ref> In many applications, such as [[JPEG]], the scaling is arbitrary because scale factors can be combined with a subsequent computational step (e.g. the [[Quantization (signal processing)\|quantization]] step in JPEG<ref>{{cite book \|isbn=9780442012724 \|title=JPEG: Still Image Data Compression Standard \|last1=Pennebaker \|first1=William B. \|last2=Mitchell \|first2=Joan L. \|date=31 December 1992\|publisher=Springer }}</ref>), and a scaling can be chosen that allows the DCT to be computed with fewer multiplications.<ref>{{cite journal \|url=https://search.ieice.org/bin/summary.php?id=e71-e_11_1095 \|first1=Y. \|last1=Arai \|first2=T. \|last2=Agui \|first3=M. \|last3=Nakajima \|title=A fast DCT-SQ scheme for images \|journal=IEICE Transactions \|volume=71 \|issue=11 \|pages= 1095–1097 \|year=1988}}</ref><ref>{{cite journal \|doi=10.1016/j.sigpro.2008.01.004 \|title=Type-II/III DCT/DST algorithms with reduced number of arithmetic operations \|year=2008 \|last1=Shao \|first1=Xuancheng \|last2=Johnson \|first2=Steven G. \|journal=Signal Processing \|volume=88 \|issue=6 \|pages=1553–1564 \|arxiv=cs/0703150 \|bibcode=2008SigPr..88.1553S \|s2cid=986733}}</ref> The DCT-II implies the boundary conditions: <math> x_n </math> is even around <math> n = -1/2 </math> and even around <math> n = N - 1/2 \,;</math>; <math> X_k </math> is even around <math> k = 0 </math> and odd around <math> k = N .</math>. === DCT-III === Line 301 ⟶ 313: \qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math> Because it is the inverse of DCT-II up to a scale factor (see below), this form is sometimes simply referred to as "the inverse DCT" ("IDCT").<ref name="pubRaoYip"/> Some authors divide the <math>x_0</math> term by <math>\sqrt{2}</math> instead of by 2 (resulting in an overall <math>x_0/\sqrt{2}</math> term) and multiply the resulting matrix by an overall scale factor of <math display="inline"> \sqrt{2/N}</math> (see above for the corresponding change in DCT-II), so that the DCT-II and DCT-III are transposes of one another. This makes the DCT-III matrix [[orthogonal matrix\|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted output. Line 316 ⟶ 328: A variant of the DCT-IV, where data from different transforms are ''overlapped'', is called the [[modified discrete cosine transform]] (MDCT).<ref>{{harvnb\|Malvar\|1992}}</ref> The DCT-IV implies the boundary conditions: <math> x_n </math> is even around <math>n = -1/2</math> and odd around <math>n = N - 1/2;</math>; similarly for <math>X_k.</math>.<!--[[User:Kvng/RTH]]--> === DCT V-VIII === Line 494 ⟶ 506: Specialized DCT algorithms, on the other hand, see widespread use for transforms of small, fixed sizes such as the {{nobr\| 8 × 8 }} DCT-II used in [[JPEG]] compression, or the small DCTs (or MDCTs) typically used in audio compression. (Reduced code size may also be a reason to use a specialized DCT for embedded-device applications.) In fact, even the DCT algorithms using an ordinary FFT are sometimes equivalent to pruning the redundant operations from a larger FFT of real-symmetric data, and they can even be optimal from the perspective of arithmetic counts. For example, a type-II DCT is equivalent to a DFT of size <math>~ 4N ~</math> with real-even symmetry whose even-indexed elements are zero. One of the most common methods for computing this via an FFT (e.g. the method used in [[FFTPACK]] and [[Fastest Fourier Transform in the West\|FFTW]]) was described by {{harvtxt\|Narasimha\|Peterson\|1978}} and {{harvtxt\|Makhoul\|1980}}, and this method in hindsight can be seen as one step of a radix-4 decimation-in-time Cooley–Tukey algorithm applied to the "logical" real-even DFT corresponding to the DCT-II.{{efn\| The radix-4 step reduces the size <math>~ 4N ~</math> DFT to four size <math>~ N ~</math> DFTs of real data, two of which are zero, and two of which are equal to one another by the even symmetry. Hence giving a single size <math>~ N ~</math> FFT of real data plus <math>~ \mathcal{O}(N) ~</math> [[butterfly (FFT algorithm)\|butterflies]], once the trivial and / or duplicate parts are eliminated and / or merged. }} Line 505 ⟶ 517: ==Example of IDCT== [[File:DCT filter comparison.png\|thumb\|right\|An example showing eight different filters applied to a test image (top left) by multiplying its DCT spectrum (top right) with each filter.]] Consider this {{resx\|8x8}} [[grayscale image]] of capital letter A. [[File:letter-a-8x8.png\|frame\|center\|Original size, scaled 10x (nearest neighbor), scaled 10x (bilinear).]] Line 621 ⟶ 633: * Takuya Ooura: General Purpose FFT Package, [http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html FFT Package 1-dim / 2-dim]. Free C & FORTRAN libraries for computing fast DCTs (types II–III) in one, two or three dimensions, power of 2 sizes. * Tim Kientzle: Fast algorithms for computing the 8-point DCT and IDCT, [http://drdobbs.com/parallel/184410889 Algorithm Alley]. * [~~http~~https://ltfat.sourceforge.net/ LTFAT] is a free Matlab/Octave toolbox with interfaces to the FFTW implementation of the DCTs and DSTs of type I-IV. {{Compression Methods\|state=expanded}} Line 634 ⟶ 646: [[Category:Data compression]] [[Category:Image compression]] ~~[[Category:Indian inventions]]~~ [[Category:H.26x]] [[Category:JPEG]]