Discrete cosine transform: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:41, 14 April 2025 edit Jessicapierce (talk \| contribs) Extended confirmed users, Rollbackers 131,548 edits m Reverted edits by 174.45.100.155 (talk) to last version by Kvng Tag: Rollback ← Previous edit		Latest revision as of 19:07, 9 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →External links: HTTP to HTTPS for SourceForge Tag: AWB
(10 intermediate revisions by 6 users not shown)
Line 1: {{Short description\|Technique used in signal processing and data compression}} ~~{{Jagged 85 cleanup\|date=July 2022}}~~ A '''discrete cosine transform''' ('''DCT''') expresses a finite sequence of [[data points]] in terms of a sum of [[cosine]] functions oscillating at different [[frequency\|frequencies]]. The DCT, first proposed by [[Nasir Ahmed (engineer)\|Nasir Ahmed]] in 1972, is a widely used transformation technique in [[signal processing]] and [[data compression]]. It is used in most [[digital media]], including [[digital images]] (such as [[JPEG]] and [[HEIF]]), [[digital video]] (such as [[MPEG]] and {{nowrap\|[[H.26x]]}}), [[digital audio]] (such as [[Dolby Digital]], [[MP3]] and [[Advanced Audio Coding\|AAC]]), [[digital television]] (such as [[SDTV]], [[HDTV]] and [[Video on demand\|VOD]]), [[digital radio]] (such as [[AAC+]] and [[DAB+]]), and [[speech coding]] (such as [[AAC-LD]], [[Siren (codec)\|Siren]] and [[Opus (audio format)\|Opus]]). DCTs are also important to numerous other applications in [[science and engineering]], such as [[digital signal processing]], [[telecommunication]] devices, reducing [[network bandwidth]] usage, and [[spectral method]]s for the numerical solution of [[partial differential equations]]. Line 12 ⟶ 11: == History == The DCT was first conceived by [[Nasir Ahmed (engineer)\|Nasir Ahmed~~]], T. Natarajan and [[K. R. Rao~~]] while working at [[Kansas State University]]. The concept was proposed to the [[National Science Foundation]] in 1972. The DCT was originally intended for [[image compression]].<ref name="Ahmed" /><ref name="Stankovic"/> Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan~~, Wills Dietrich,~~ and ~~Jeremy Fries, and his friend Dr.~~ [[K. R. Rao]] at the [[University of Texas at Arlington]] in 1973.<ref name="Ahmed"/> They presented their results in a January 1974 paper, titled ''Discrete Cosine Transform''.<ref name="pubDCT" /><ref name="pubRaoYip" /><ref name="t81">{{cite web\|date=September 1992\|title=T.81 – Digital compression and coding of continuous-tone still images – Requirements and guidelines\|url=https://www.w3.org/Graphics/JPEG/itu-t81.pdf\|access-date=12 July 2019\|publisher=[[CCITT]]}}</ref> It described what is now called the type-II DCT (DCT-II),<ref name="Britanak2010" />{{rp\|page = [https://books.google.com/books?id=iRlQHcK-r_kC&pg=PA51 51]}} as well as the type-III inverse DCT (IDCT).<ref name="pubDCT"/> Since its introduction in 1974, there has been significant research on the DCT.<ref name="t81"/> In 1977, Wen-Hsiung Chen published a paper with C. Harrison Smith and Stanley C. Fralick presenting a fast DCT algorithm.<ref name="A Fast Computational Algorithm for">{{cite journal \|last1=Chen \|first1=Wen-Hsiung \|last2=Smith \|first2=C. H. \|last3=Fralick \|first3=S. C. \|title=A Fast Computational Algorithm for the Discrete Cosine Transform \|journal=[[IEEE Transactions on Communications]] \|date=September 1977 \|volume=25 \|issue=9 \|pages=1004–1009 \|doi=10.1109/TCOM.1977.1093941}}</ref><ref name="t81"/> Further developments include a 1978 paper by M. J. Narasimha and A. M. Peterson, and a 1984 paper by B. G. Lee.<ref name="t81"/> These research papers, along with the original 1974 Ahmed paper and the 1977 Chen paper, were cited by the [[Joint Photographic Experts Group]] as the basis for [[JPEG]]'s lossy image compression algorithm in 1992.<ref name="t81"/><ref name="chen">{{cite journal \|last1=Smith \|first1=C. \|last2=Fralick \|first2=S. \|title=A Fast Computational Algorithm for the Discrete Cosine Transform \|journal=IEEE Transactions on Communications \|date=1977 \|volume=25 \|issue=9 \|pages=1004–1009 \|doi=10.1109/TCOM.1977.1093941 \|issn=0090-6778}}</ref> Line 287 ⟶ 286: \qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math> Some authors further multiply the <math>x_0 </math> and <math> x_{N-1} </math> terms by <math> \sqrt{2\,}\, ,</math> and correspondingly multiply the <math> X_0 </math> and <math> X_{N-1}</math> terms by <math> 1/\sqrt{2\,} \,,</math> which, if one further multiplies by an overall scale factor of <math> display="inline">\sqrt{\tfrac{2}{N-1\,}\,} ,</math>, makes the DCT-I matrix [[orthogonal matrix\|orthogonal]] but breaks the direct correspondence with a real-even [[Discrete Fourier transform\|DFT]]. The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of <math> 2(N-1) </math> real numbers with even symmetry. For example, a DCT-I of <math>N = 5 </math> real numbers <math> a\ b\ c\ d\ e </math> is exactly equivalent to a DFT of eight real numbers {{not a typo\|<math> a\ b\ c\ d\ e\ d\ c\ b </math>}} (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.) Note, however, that the DCT-I is not defined for <math> N </math> less than 2, while all other DCT types are defined for any positive <math> N .</math>. Thus, the DCT-I corresponds to the boundary conditions: <math> x_n </math> is even around <math> n = 0 </math> and even around <math> n = N - 1 </math>; similarly for <math> X_k .</math>~~<!--[[User:Kvng/RTH]]-->~~. === DCT-II === Line 300 ⟶ 299: \qquad \text{ for } ~ k = 0,\ \dots\ N-1 ~.</math> The DCT-II is probably the most commonly used form, and is often simply referred to as "the ''DCT"''.<ref name="pubDCT"/><ref name="pubRaoYip"/> This transform is exactly equivalent (up to an overall scale factor of 2) to a DFT of <math>4N</math> real inputs of even symmetry, where the even-indexed elements are zero. That is, it is half of the DFT of the <math>4N</math> inputs <math> y_n ,</math> where <math> y_{2n} = 0 ,</math>, <math> y_{2n+1} = x_n </math> for <math> 0 \leq n < N ,</math>, <math> y_{2N} = 0 ,</math>, and <math> y_{4N-n} = y_n </math> for <math> 0 < n < 2N .</math>. DCT-II transformation is also possible using ~~2{{mvar\|N}}~~<math>2N</math> signal followed by a multiplication by half shift. This is demonstrated by [[John Makhoul\|Makhoul]].{{cn\|date=April 2025}} Some authors further multiply the <math> X_0 </math> term by <math> 1/\sqrt{N\,} \, </math> and multiply the rest of the matrix by an overall scale factor of <math display="inline">\sqrt{{2}/{N}}</math> (see below for the corresponding change in DCT-III). This makes the DCT-II matrix [[orthogonal matrix\|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted input. This is the normalization used by [[Matlab]]~~, for example, see~~.<ref>{{cite web \|url=https://www.mathworks.com/help/signal/ref/dct.html \|title=Discrete cosine transform - MATLAB dct \|website=www.mathworks.com \|access-date=2019-07-11}}</ref> In many applications, such as [[JPEG]], the scaling is arbitrary because scale factors can be combined with a subsequent computational step (e.g. the [[Quantization (signal processing)\|quantization]] step in JPEG<ref>{{cite book \|isbn=9780442012724 \|title=JPEG: Still Image Data Compression Standard \|last1=Pennebaker \|first1=William B. \|last2=Mitchell \|first2=Joan L. \|date=31 December 1992\|publisher=Springer }}</ref>), and a scaling can be chosen that allows the DCT to be computed with fewer multiplications.<ref>{{cite journal \|url=https://search.ieice.org/bin/summary.php?id=e71-e_11_1095 \|first1=Y. \|last1=Arai \|first2=T. \|last2=Agui \|first3=M. \|last3=Nakajima \|title=A fast DCT-SQ scheme for images \|journal=IEICE Transactions \|volume=71 \|issue=11 \|pages= 1095–1097 \|year=1988}}</ref><ref>{{cite journal \|doi=10.1016/j.sigpro.2008.01.004 \|title=Type-II/III DCT/DST algorithms with reduced number of arithmetic operations \|year=2008 \|last1=Shao \|first1=Xuancheng \|last2=Johnson \|first2=Steven G. \|journal=Signal Processing \|volume=88 \|issue=6 \|pages=1553–1564 \|arxiv=cs/0703150 \|bibcode=2008SigPr..88.1553S \|s2cid=986733}}</ref> The DCT-II implies the boundary conditions: <math> x_n </math> is even around <math> n = -1/2 </math> and even around <math> n = N - 1/2 \,;</math>; <math> X_k </math> is even around <math> k = 0 </math> and odd around <math> k = N .</math>. === DCT-III === Line 314 ⟶ 313: \qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math> Because it is the inverse of DCT-II up to a scale factor (see below), this form is sometimes simply referred to as "the inverse DCT" ("IDCT").<ref name="pubRaoYip"/> Some authors divide the <math>x_0</math> term by <math>\sqrt{2}</math> instead of by 2 (resulting in an overall <math>x_0/\sqrt{2}</math> term) and multiply the resulting matrix by an overall scale factor of <math display="inline"> \sqrt{2/N}</math> (see above for the corresponding change in DCT-II), so that the DCT-II and DCT-III are transposes of one another. This makes the DCT-III matrix [[orthogonal matrix\|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted output. Line 329 ⟶ 328: A variant of the DCT-IV, where data from different transforms are ''overlapped'', is called the [[modified discrete cosine transform]] (MDCT).<ref>{{harvnb\|Malvar\|1992}}</ref> The DCT-IV implies the boundary conditions: <math> x_n </math> is even around <math>n = -1/2</math> and odd around <math>n = N - 1/2;</math>; similarly for <math>X_k.</math>.<!--[[User:Kvng/RTH]]--> === DCT V-VIII === Line 507 ⟶ 506: Specialized DCT algorithms, on the other hand, see widespread use for transforms of small, fixed sizes such as the {{nobr\| 8 × 8 }} DCT-II used in [[JPEG]] compression, or the small DCTs (or MDCTs) typically used in audio compression. (Reduced code size may also be a reason to use a specialized DCT for embedded-device applications.) In fact, even the DCT algorithms using an ordinary FFT are sometimes equivalent to pruning the redundant operations from a larger FFT of real-symmetric data, and they can even be optimal from the perspective of arithmetic counts. For example, a type-II DCT is equivalent to a DFT of size <math>~ 4N ~</math> with real-even symmetry whose even-indexed elements are zero. One of the most common methods for computing this via an FFT (e.g. the method used in [[FFTPACK]] and [[Fastest Fourier Transform in the West\|FFTW]]) was described by {{harvtxt\|Narasimha\|Peterson\|1978}} and {{harvtxt\|Makhoul\|1980}}, and this method in hindsight can be seen as one step of a radix-4 decimation-in-time Cooley–Tukey algorithm applied to the "logical" real-even DFT corresponding to the DCT-II.{{efn\| The radix-4 step reduces the size <math>~ 4N ~</math> DFT to four size <math>~ N ~</math> DFTs of real data, two of which are zero, and two of which are equal to one another by the even symmetry. Hence giving a single size <math>~ N ~</math> FFT of real data plus <math>~ \mathcal{O}(N) ~</math> [[butterfly (FFT algorithm)\|butterflies]], once the trivial and / or duplicate parts are eliminated and / or merged. }} Line 518 ⟶ 517: ==Example of IDCT== [[File:DCT filter comparison.png\|thumb\|right\|An example showing eight different filters applied to a test image (top left) by multiplying its DCT spectrum (top right) with each filter.]] Consider this {{resx\|8x8}} [[grayscale image]] of capital letter A. [[File:letter-a-8x8.png\|frame\|center\|Original size, scaled 10x (nearest neighbor), scaled 10x (bilinear).]] Line 634 ⟶ 633: * Takuya Ooura: General Purpose FFT Package, [http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html FFT Package 1-dim / 2-dim]. Free C & FORTRAN libraries for computing fast DCTs (types II–III) in one, two or three dimensions, power of 2 sizes. * Tim Kientzle: Fast algorithms for computing the 8-point DCT and IDCT, [http://drdobbs.com/parallel/184410889 Algorithm Alley]. * [~~http~~https://ltfat.sourceforge.net/ LTFAT] is a free Matlab/Octave toolbox with interfaces to the FFTW implementation of the DCTs and DSTs of type I-IV. {{Compression Methods\|state=expanded}} Line 647 ⟶ 646: [[Category:Data compression]] [[Category:Image compression]] ~~[[Category:Indian inventions]]~~ [[Category:H.26x]] [[Category:JPEG]]