Discrete cosine transform: Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted references removed Mobile edit Mobile web edit
Bender the Bot (talk | contribs)
m External links: HTTP to HTTPS for SourceForge
 
(11 intermediate revisions by 7 users not shown)
Line 1:
{{Short description|Technique used in signal processing and data compression}}
{{Jagged 85 cleanup|date=July 2022}}
 
A '''discrete cosine transform''' ('''DCT''') expresses a finite sequence of [[data points]] in terms of a sum of [[cosine]] functions oscillating at different [[frequency|frequencies]]. The DCT, first proposed by [[Nasir Ahmed (engineer)|Nasir Ahmed]] in 1972, is a widely used transformation technique in [[signal processing]] and [[data compression]]. It is used in most [[digital media]], including [[digital images]] (such as [[JPEG]] and [[HEIF]]), [[digital video]] (such as [[MPEG]] and {{nowrap|[[H.26x]]}}), [[digital audio]] (such as [[Dolby Digital]], [[MP3]] and [[Advanced Audio Coding|AAC]]), [[digital television]] (such as [[SDTV]], [[HDTV]] and [[Video on demand|VOD]]), [[digital radio]] (such as [[AAC+]] and [[DAB+]]), and [[speech coding]] (such as [[AAC-LD]], [[Siren (codec)|Siren]] and [[Opus (audio format)|Opus]]). DCTs are also important to numerous other applications in [[science and engineering]], such as [[digital signal processing]], [[telecommunication]] devices, reducing [[network bandwidth]] usage, and [[spectral method]]s for the numerical solution of [[partial differential equations]].
Line 7 ⟶ 6:
 
There are eight standard DCT variants, of which four are common.
The most common variant of discrete cosine transform is the type-II DCT, which is often called simply ''the DCT''. This was the original DCT as first proposed by Ahmed. Its inverse, the type-III DCT, is correspondingly often called simply ''the inverse DCT'' or ''the IDCT''. Two related transforms are the [[discrete sine transform]] (DST), which is equivalent to a DFT of real and [[odd function]]s, and the [[modified discrete cosine transform]] (MDCT), which is based on a DCT of overlapping data. Multidimensional DCTs (MD DCTs) are developed to extend the concept of DCT to multidimensional signals. A variety of fast algorithms have been developed to reduce the computational complexity of implementing DCT. One of these is the integer DCT (IntDCT),<ref name="Stankovic"/> an [[integer]] approximation of the standard DCT,<ref name="Britanak2010" />{{rp|pages= [https://books.google.com/books?id=iRlQHcK-r_kC&pg=PA141 ix, xiii, 1, 141–304]}} used in several [[ISO/IEC]] and [[ITU-T]] international standards.<ref name="Stankovic"/><ref name="Britanak2010"/>
 
DCT compression, also known as block compression, compresses data in sets of discrete DCT blocks.<ref name="Alikhani"/> DCT blocks sizes including 8x8 [[pixels]] for the standard DCT, and varied integer DCT sizes between 4x4 and 32x32 pixels.<ref name="Stankovic"/><ref name="apple"/> The DCT has a strong ''energy compaction'' property,<ref name="pubDCT"/><ref name="pubRaoYip"/> capable of achieving high quality at high [[data compression ratio]]s.<ref name="Barbero"/><ref name="Lea">{{cite journal|last1=Lea|first1=William|date=1994|title=Video on demand: Research Paper 94/68|url=https://researchbriefings.parliament.uk/ResearchBriefing/Summary/RP94-68|journal=[[House of Commons Library]]|access-date=20 September 2019}}</ref> However, blocky [[compression artifacts]] can appear when heavy DCT compression is applied.
 
== History ==
The DCT was first conceived by [[Nasir Ahmed (engineer)|Nasir Ahmed]], T. Natarajan and [[K. R. Rao]] while working at [[Kansas State University]]. The concept was proposed to the [[National Science Foundation]] in 1972. The DCT was originally intended for [[image compression]].<ref name="Ahmed" /><ref name="Stankovic"/> Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan, Wills Dietrich, and Jeremy Fries, and his friend Dr. [[K. R. Rao]] at the [[University of Texas at Arlington]] in 1973.<ref name="Ahmed"/> They presented their results in a January 1974 paper, titled ''Discrete Cosine Transform''.<ref name="pubDCT" /><ref name="pubRaoYip" /><ref name="t81">{{cite web|date=September 1992|title=T.81 – Digital compression and coding of continuous-tone still images – Requirements and guidelines|url=https://www.w3.org/Graphics/JPEG/itu-t81.pdf|access-date=12 July 2019|publisher=[[CCITT]]}}</ref> It described what is now called the type-II DCT (DCT-II),<ref name="Britanak2010" />{{rp|page = [https://books.google.com/books?id=iRlQHcK-r_kC&pg=PA51 51]}} as well as the type-III inverse DCT (IDCT).<ref name="pubDCT"/>
 
Since its introduction in 1974, there has been significant research on the DCT.<ref name="t81"/> In 1977, Wen-Hsiung Chen published a paper with C. Harrison Smith and Stanley C. Fralick presenting a fast DCT algorithm.<ref name="A Fast Computational Algorithm for">{{cite journal |last1=Chen |first1=Wen-Hsiung |last2=Smith |first2=C. H. |last3=Fralick |first3=S. C. |title=A Fast Computational Algorithm for the Discrete Cosine Transform |journal=[[IEEE Transactions on Communications]] |date=September 1977 |volume=25 |issue=9 |pages=1004–1009 |doi=10.1109/TCOM.1977.1093941}}</ref><ref name="t81"/> Further developments include a 1978 paper by M. J. Narasimha and A. M. Peterson, and a 1984 paper by B. G. Lee.<ref name="t81"/> These research papers, along with the original 1974 Ahmed paper and the 1977 Chen paper, were cited by the [[Joint Photographic Experts Group]] as the basis for [[JPEG]]'s lossy image compression algorithm in 1992.<ref name="t81"/><ref name="chen">{{cite journal |last1=Smith |first1=C. |last2=Fralick |first2=S. |title=A Fast Computational Algorithm for the Discrete Cosine Transform |journal=IEEE Transactions on Communications |date=1977 |volume=25 |issue=9 |pages=1004–1009 |doi=10.1109/TCOM.1977.1093941 |issn=0090-6778}}</ref>
Line 83 ⟶ 84:
 
===Visual media standards===
The DCT-II is an important image compression technique. It is used in image compression standards such as [[JPEG]], and [[video compression]] standards such as {{nowrap|[[H.26x]]}}, [[MJPEG]], [[MPEG]], [[DV (video format)|DV]], [[Theora]] and [[Daala]]. There, the two-dimensional DCT-II of <math>N \times N</math> blocks are computed and the results are [[Quantization (signal processing)|quantized]] and [[Entropy encoding|entropy coded]]. In this case, <math>N</math> is typically 8 and the DCT-II formula is applied to each row and column of the block. The result is an 8&nbsp;× 8 transform coefficient array in which the <math>(0,0)</math> element (top-left) is the DC (zero-frequency) component and entries with increasing vertical and horizontal index values represent higher vertical and horizontal spatial frequencies.

The integer DCT, an integer approximation of the DCT,<ref name="Britanak2010"/><ref name="Stankovic"/> is used in [[Advanced Video Coding]] (AVC),<ref name="Wang">{{cite journal |last1=Wang |first1=Hanli |last2=Kwong |first2=S. |last3=Kok |first3=C. |title=Efficient prediction algorithm of integer DCT coefficients for {{nowrap|H.264}}/AVC optimization |journal=IEEE Transactions on Circuits and Systems for Video Technology |date=2006 |volume=16 |issue=4 |pages=547–552 |doi=10.1109/TCSVT.2006.871390|s2cid=2060937 }}</ref><ref name="Stankovic"/> introduced in 2003, and [[High Efficiency Video Coding]] (HEVC),<ref name="apple"/><ref name="Stankovic"/> introduced in 2013. The integer DCT is also used in the [[High Efficiency Image Format]] (HEIF), which uses a subset of the [[HEVC]] video coding format for coding still images.<ref name="apple"/> AVC uses 4&nbsp;x 4 and 8&nbsp;x 8 blocks. HEVC and HEIF use varied block sizes between 4&nbsp;x 4 and 32&nbsp;x 32 [[pixels]].<ref name="apple"/><ref name="Stankovic"/> {{As of|2019}}, AVC is by far the most commonly used format for the recording, compression and distribution of video content, used by 91% of video developers, followed by HEVC which is used by 43% of developers.<ref name="Bitmovin">{{cite web |url=https://cdn2.hubspot.net/hubfs/3411032/Bitmovin%20Magazine/Video%20Developer%20Report%202019/bitmovin-video-developer-report-2019.pdf |title=Video Developer Report 2019 |website=[[Bitmovin]] |year=2019 |access-date=5 November 2019}}</ref>
 
====Image formats====
Line 108 ⟶ 111:
! [[Video coding standard]] !! Year !! Common applications
|-
| {{nowrap|[[H.261]]}}<ref name="video-standards">{{cite web|first=Yao|last=Wang|archive-url=https://web.archive.org/web/20130123211453/http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt1.pdf|archive-date=2013-01-23|url=http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt1.pdf|title=Video Coding Standards: Part I|year=2006}}</ref><ref>{{cite web|first=Yao|last=Wang|archive-url=https://web.archive.org/web/20130123211453/http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt2.pdf|archive-date=2013-01-23|url=http://eeweb.poly.edu/~yao/EL6123/coding_standards_pt2.pdf|title=Video Coding Standards: Part II|year=2006}}</ref> ||1988|| First of a family of [[video coding standards]]. Used primarily in older [[video conferencing]] and [[video telephone]] products.
| {{
|-
| [[Motion JPEG]] (MJPEG)<ref>{{cite book |last1=Hoffman |first1=Roy |title=Data Compression in Digital Systems |date=2012 |publisher=[[Springer Science & Business Media]] |isbn=9781461560319 |page=255 |url=https://books.google.com/books?id=FOfTBwAAQBAJ}}</ref> ||1992|| [[QuickTime]], [[video editing]], [[non-linear editing]], [[digital cameras]]
|-
| [[MPEG-1]] Video<ref name="Rao">{{cite book | last1 = Rao | first1 = K.R. | author-link1 = K. R. Rao | last2 = Hwang | first2 = J. J. | date = 1996-07-18 | title = Techniques and Standards for Image, Video, and Audio Coding | language = en | publisher = Prentice Hall | at = JPEG: Chapter 8; {{nowrap|H.261}}: Chapter 9; MPEG-1: Chapter 10; MPEG-2: Chapter 11 | isbn = 978-0133099072 | lccn = 96015550 | oclc = 34617596 | ol = OL978319M | s2cid = 56983045 | df = dmy-all}}</ref> ||1993|| [[Digital video]] distribution on [[CD]] or [[Internet video]]
|-
| [[MPEG-2 Video]] ({{nowrap|H.262}})<ref name="Rao"/> ||1995|| Storage and handling of digital images in broadcast applications, [[digital television]], [[HDTV]], cable, satellite, high-speed [[Internet]], [[DVD]] video distribution
|-
| [[DV (video format)|DV]] ||1995|| [[Camcorders]], [[digital cassettes]]
|-
| [[H.263]] ([[MPEG-4 Part 2]])<ref name="video-standards"/> ||1996|| [[Video telephony]] over [[public switched telephone network]] (PSTN), {{nowrap|[[H.320]]}}, [[Integrated Services Digital Network]] (ISDN)<ref>{{cite news |last1=Davis |first1=Andrew |title=The H.320 Recommendation Overview |url=https://www.eetimes.com/document.asp?doc_id=1275886 |access-date=7 November 2019 |work=[[EE Times]] |date=13 June 1997}}</ref><ref>{{cite book |title=IEEE WESCANEX 97: communications, power, and computing : conference proceedings |date=May 22–23, 1997 |publisher=[[Institute of Electrical and Electronics Engineers]] |___location=University of Manitoba, Winnipeg, Manitoba, Canada |isbn=9780780341470 |page=30 |url=https://books.google.com/books?id=8vhEAQAAIAAJ |quote={{nownowrap|H.263}} is similar to, but more complex than {{nowrap|H.261}}. It is currently the most widely used international video compression standard for video telephony on ISDN (Integrated Services Digital Network) telephone lines.}}</ref>
|-
| [[Advanced Video Coding]] (AVC, {{nowrap|H.264}}, [[MPEG-4]])<ref name="Stankovic"/><ref name="Wang"/> ||2003|| Popular [[HD video]] recording, compression and distribution format, [[Internet video]], [[YouTube]], [[Blu-ray Discs]], [[HDTV]] broadcasts, [[web browsers]], [[streaming television]], [[mobile devices]], consumer devices, [[Netflix]],<ref name="Encodes">{{cite news |author=Netflix Technology Blog |title=More Efficient Mobile Encodes for Netflix Downloads |url=https://medium.com/netflix-techblog/more-efficient-mobile-encodes-for-netflix-downloads-625d7b082909 |access-date=20 October 2019 |work=[[Medium.com]] |publisher=[[Netflix]] |date=19 April 2017}}</ref> [[video telephony]], [[FaceTime]]<ref name="AppleInsider standards 1">{{cite web|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html|date=June 8, 2010|access-date=June 9, 2010|title=Inside iPhone 4: FaceTime video calling|publisher=[[Apple community#AppleInsider|AppleInsider]]|author=Daniel Eran Dilger}}</ref>
|-
| [[Theora]] ||2004|| Internet video, web browsers
Line 274 ⟶ 286:
\qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math>
 
Some authors further multiply the <math>x_0 </math> and <math> x_{N-1} </math> terms by <math> \sqrt{2\,}\, ,</math> and correspondingly multiply the <math> X_0 </math> and <math> X_{N-1}</math> terms by <math> 1/\sqrt{2\,} \,,</math> which, if one further multiplies by an overall scale factor of <math> display="inline">\sqrt{\tfrac{2}{N-1\,}\,} ,</math>, makes the DCT-I matrix [[orthogonal matrix|orthogonal]] but breaks the direct correspondence with a real-even [[Discrete Fourier transform|DFT]].
 
The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of <math> 2(N-1) </math> real numbers with even symmetry. For example, a DCT-I of <math>N = 5 </math> real numbers <math> a\ b\ c\ d\ e </math> is exactly equivalent to a DFT of eight real numbers {{not a typo|<math> a\ b\ c\ d\ e\ d\ c\ b </math>}} (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.)
 
Note, however, that the DCT-I is not defined for <math> N </math> less than 2, while all other DCT types are defined for any positive <math> N .</math>.
 
Thus, the DCT-I corresponds to the boundary conditions: <math> x_n </math> is even around <math> n = 0 </math> and even around <math> n = N - 1 </math>; similarly for <math> X_k .</math><!--[[User:Kvng/RTH]]-->.
 
=== DCT-II ===
Line 287 ⟶ 299:
\qquad \text{ for } ~ k = 0,\ \dots\ N-1 ~.</math>
 
The DCT-II is probably the most commonly used form, and is often simply referred to as "the ''DCT"''.<ref name="pubDCT"/><ref name="pubRaoYip"/>
 
This transform is exactly equivalent (up to an overall scale factor of 2) to a DFT of <math>4N</math> real inputs of even symmetry, where the even-indexed elements are zero. That is, it is half of the DFT of the <math>4N</math> inputs <math> y_n ,</math> where <math> y_{2n} = 0 ,</math>, <math> y_{2n+1} = x_n </math> for <math> 0 \leq n < N ,</math>, <math> y_{2N} = 0 ,</math>, and <math> y_{4N-n} = y_n </math> for <math> 0 < n < 2N .</math>. DCT-II transformation is also possible using 2{{mvar|N}}<math>2N</math> signal followed by a multiplication by half shift. This is demonstrated by [[John Makhoul|Makhoul]].{{cn|date=April 2025}}
 
Some authors further multiply the <math> X_0 </math> term by <math> 1/\sqrt{N\,} \, </math> and multiply the rest of the matrix by an overall scale factor of <math display="inline">\sqrt{{2}/{N}}</math> (see below for the corresponding change in DCT-III). This makes the DCT-II matrix [[orthogonal matrix|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted input. This is the normalization used by [[Matlab]], for example, see.<ref>{{cite web |url=https://www.mathworks.com/help/signal/ref/dct.html |title=Discrete cosine transform - MATLAB dct |website=www.mathworks.com |access-date=2019-07-11}}</ref> In many applications, such as [[JPEG]], the scaling is arbitrary because scale factors can be combined with a subsequent computational step (e.g. the [[Quantization (signal processing)|quantization]] step in JPEG<ref>{{cite book |isbn=9780442012724 |title=JPEG: Still Image Data Compression Standard |last1=Pennebaker |first1=William B. |last2=Mitchell |first2=Joan L. |date=31 December 1992|publisher=Springer }}</ref>), and a scaling can be chosen that allows the DCT to be computed with fewer multiplications.<ref>{{cite journal |url=https://search.ieice.org/bin/summary.php?id=e71-e_11_1095 |first1=Y. |last1=Arai |first2=T. |last2=Agui |first3=M. |last3=Nakajima |title=A fast DCT-SQ scheme for images |journal=IEICE Transactions |volume=71 |issue=11 |pages= 1095–1097 |year=1988}}</ref><ref>{{cite journal |doi=10.1016/j.sigpro.2008.01.004 |title=Type-II/III DCT/DST algorithms with reduced number of arithmetic operations |year=2008 |last1=Shao |first1=Xuancheng |last2=Johnson |first2=Steven G. |journal=Signal Processing |volume=88 |issue=6 |pages=1553–1564 |arxiv=cs/0703150 |bibcode=2008SigPr..88.1553S |s2cid=986733}}</ref>
 
The DCT-II implies the boundary conditions: <math> x_n </math> is even around <math> n = -1/2 </math> and even around <math> n = N - 1/2 \,;</math>; <math> X_k </math> is even around <math> k = 0 </math> and odd around <math> k = N .</math>.
 
=== DCT-III ===
Line 301 ⟶ 313:
\qquad \text{ for } ~ k = 0,\ \ldots\ N-1 ~.</math>
 
Because it is the inverse of DCT-II up to a scale factor (see below), this form is sometimes simply referred to as "the inverse DCT" ("IDCT").<ref name="pubRaoYip"/>
 
Some authors divide the <math>x_0</math> term by <math>\sqrt{2}</math> instead of by 2 (resulting in an overall <math>x_0/\sqrt{2}</math> term) and multiply the resulting matrix by an overall scale factor of <math display="inline"> \sqrt{2/N}</math> (see above for the corresponding change in DCT-II), so that the DCT-II and DCT-III are transposes of one another. This makes the DCT-III matrix [[orthogonal matrix|orthogonal]], but breaks the direct correspondence with a real-even DFT of half-shifted output.
Line 316 ⟶ 328:
A variant of the DCT-IV, where data from different transforms are ''overlapped'', is called the [[modified discrete cosine transform]] (MDCT).<ref>{{harvnb|Malvar|1992}}</ref>
 
The DCT-IV implies the boundary conditions: <math> x_n </math> is even around <math>n = -1/2</math> and odd around <math>n = N - 1/2;</math>; similarly for <math>X_k.</math>.<!--[[User:Kvng/RTH]]-->
 
=== DCT V-VIII ===
Line 494 ⟶ 506:
Specialized DCT algorithms, on the other hand, see widespread use for transforms of small, fixed sizes such as the {{nobr| 8 × 8 }} DCT-II used in [[JPEG]] compression, or the small DCTs (or MDCTs) typically used in audio compression. (Reduced code size may also be a reason to use a specialized DCT for embedded-device applications.)
 
In fact, even the DCT algorithms using an ordinary FFT are sometimes equivalent to pruning the redundant operations from a larger FFT of real-symmetric data, and they can even be optimal from the perspective of arithmetic counts. For example, a type-II DCT is equivalent to a DFT of size <math>~ 4N ~</math> with real-even symmetry whose even-indexed elements are zero. One of the most common methods for computing this via an FFT (e.g. the method used in [[FFTPACK]] and [[Fastest Fourier Transform in the West|FFTW]]) was described by {{harvtxt|Narasimha|Peterson|1978}} and {{harvtxt|Makhoul|1980}}, and this method in hindsight can be seen as one step of a radix-4 decimation-in-time Cooley–Tukey algorithm applied to the "logical" real-even DFT corresponding to the DCT-II.{{efn|
The radix-4 step reduces the size <math>~ 4N ~</math> DFT to four size <math>~ N ~</math> DFTs of real data, two of which are zero, and two of which are equal to one another by the even symmetry. Hence giving a single size <math>~ N ~</math> FFT of real data plus <math>~ \mathcal{O}(N) ~</math> [[butterfly (FFT algorithm)|butterflies]], once the trivial and / or duplicate parts are eliminated and / or merged.
}}
Line 505 ⟶ 517:
==Example of IDCT==
[[File:DCT filter comparison.png|thumb|right|An example showing eight different filters applied to a test image (top left) by multiplying its DCT spectrum (top right) with each filter.]]
Consider this {{resx|8x8}} [[grayscale image]] of capital letter A.
[[File:letter-a-8x8.png|frame|center|Original size, scaled 10x (nearest neighbor), scaled 10x (bilinear).]]
 
Line 621 ⟶ 633:
* Takuya Ooura: General Purpose FFT Package, [http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html FFT Package 1-dim / 2-dim]. Free C & FORTRAN libraries for computing fast DCTs (types II–III) in one, two or three dimensions, power of 2 sizes.
* Tim Kientzle: Fast algorithms for computing the 8-point DCT and IDCT, [http://drdobbs.com/parallel/184410889 Algorithm Alley].
* [httphttps://ltfat.sourceforge.net/ LTFAT] is a free Matlab/Octave toolbox with interfaces to the FFTW implementation of the DCTs and DSTs of type I-IV.
 
{{Compression Methods|state=expanded}}
Line 634 ⟶ 646:
[[Category:Data compression]]
[[Category:Image compression]]
[[Category:Indian inventions]]
[[Category:H.26x]]
[[Category:JPEG]]