Video coding format: Difference between revisions

Content deleted Content added
m Reverted 1 edit by 105.245.169.229 (talk) to last revision by Uzume
Videoffan (talk | contribs)
mNo edit summary
Line 6:
Some video coding formats are documented by a detailed [[technical specification]] document known as a '''video coding specification'''. Some such specifications are written and approved by [[standardization organization]]s as [[technical standard]]s, and are thus known as a '''video coding standard'''. There are [[de facto standard|''de facto'' standards]] and formal standards.
 
Video content encoded using a particular video coding format is normally bundled with an audio stream (encoded using an [[audio coding format]]) inside a [[container format (digital)#Multimedia container formats|multimedia container format]] such as [[Audio Video Interleave|AVI]], [[MP4 file format|MP4]], [[Flash Video|FLV]], [[RealMedia]], or [[Matroska]]. As such, the user normally does not have a [[H.264/MPEG-4 AVC|H.264]] file, but instead has a [[video file format|video file]], which is an MP4 container of H.264-encoded video, normally alongside [[Advanced Audio Coding|AAC]]-encoded audio. Multimedia container formats can contain one of several different video coding formats; for example, the MP4 container format can contain video coding formats such as [[MPEG-2 Part 2]] or H.264. Another example is the initial specification for the file type [[WebM]], which specifies the container format (Matroska), but also exactly which video ([[VP8]]) and audio ([[Vorbis]]) compression format is inside the Matroska container, even though Matroska is capable of containing [[VP9]] video, and [[Opus (audio format)|Opus]] audio support was later added to the [[WebM]] specification.
 
==Distinction between ''format'' and ''codec''==
A ''format'' is the layout plan for data produced or consumed by a ''codec''.
 
Although video coding formats such as H.264 are sometimes referred to as ''codecs'', there is a clear conceptual difference between a specification and its implementations. Video coding formats are described in specifications, and software, [[firmware]], or hardware to encode/decode data in a given video coding format from/to uncompressed video are implementations of those specifications. As an analogy, the video coding format [[H.264]] (specification) is to the [[codec]] [[OpenH264]] (specific implementation) what the [[C (programming language)|C Programming Language]] (specification) is to the compiler [[GNU Compiler Collection|GCC]] (specific implementation). Note that for each specification (e.g., [[H.264]]), there can be many codecs implementing that specification (e.g., [[x264]], OpenH264, [[H.264/MPEG-4 AVC products and implementations]]).
 
This distinction is not consistently reflected terminologically in the literature. The H.264 specification calls [[H.261]], [[H.262]], [[H.263]], and [[H.264]] ''video coding standards'' and does not contain the word ''codec''.<ref name="h264" /> The [[Alliance for Open Media]] clearly distinguishes between the [[AV1]] video coding format and the accompanying codec they are developing, but calls the video coding format itself a ''[[video codec]] specification''.<ref>{{cite web|url=http://aomedia.org/|publisher=Alliance for Open Media|title=Front Page|access-date=May 23, 2016}}</ref> The [[VP9]] specification calls the video coding format VP9 itself a ''codec''.<ref>{{cite web|url=https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf|title=VP9 Bitstream & Decoding Process Specification|author1=Adrian Grange |author2=Peter de Rivaz |author3=Jonathan Hunt |name-list-style=amp }}</ref>
 
As an example of conflation, Chromium's<ref>{{cite web|url=https://www.chromium.org/audio-video|title=Audio/Video|publisher=The Chromium Projects
|access-date=May 23, 2016}}</ref> and Mozilla's<ref>{{cite web|url=https://developer.mozilla.org/en-US/docs/Web/HTML/Supported_media_formats|title=Media formats supported by the HTML audio and video elements|publisher=Mozilla|access-date=May 23, 2016}}</ref> pages listing their video formatformats support both call video coding formats, such as H.264 ''codecs''. As another example, in Cisco's announcement of a free-as-in-beer video codec, the press release refers to the H.264 video coding format as a ''codec'' ("choice of a common video codec"), but calls Cisco's implementation of a H.264 encoder/decoder a ''codec'' shortly thereafter ("open-source our H.264 codec").<ref>{{cite web|url=https://blogs.cisco.com/collaboration/open-source-h-264-removes-barriers-webrtc|title=Open-Sourced H.264 Removes Barriers to WebRTC|publisher=Cisco|access-date=May 23, 2016|date=October 30, 2013|author=Rowan Trollope|archive-date=May 14, 2019|archive-url=https://web.archive.org/web/20190514053018/https://blogs.cisco.com/collaboration/open-source-h-264-removes-barriers-webrtc|url-status=dead}}</ref>
 
A video coding format does not dictate all [[algorithm]]s used by a [[codec]] implementing the format. For example, a large part of how video compression typically works is by finding [[Video compression picture types|similarities between video frames]] (block-matching), and then achieving compression by copying previously-coded similar subimages (such as [[macroblock]]s) and adding small differences when necessary. Finding optimal combinations of such predictors and differences is an [[NP-hard]] problem,<ref>{{cite web|url=http://shodhganga.inflibnet.ac.in/bitstream/10603/8175/8/08_chapter%203.pdf|title=Chapter 3 : Modified A* Prune Algorithm for finding K-MCSP in video compression|publisher=Shodhganga.inflibnet.ac.in|access-date=January 6, 2015}}</ref> meaning that it is practically impossible to find an optimal solution. Though the video coding format must support such compression across frames in the bitstream format, by not needlessly mandating specific algorithms for finding such block-matches and other encoding steps, the codecs implementing the video coding specification have some freedom to optimize and innovate in their choice of algorithms. For example, section 0.5 of the H.264 specification says that encoding algorithms are not part of the specification.<ref name="h264">{{cite web|url=http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.264-200305-S!!PDF-E&type=items|title=SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS : Infrastructure of audiovisual services – Coding of moving video : Advanced video coding for generic audiovisual services|publisher=Itu.int|access-date=January 6, 2015}}</ref> Free choice of algorithm also allows different [[Analysis of algorithms|space–time complexity]] trade-offs for the same video coding format, so a live feed can use a fast but space-inefficient algorithm, and a one-time [[DVD]] encoding for later mass production can trade long encoding-time for space-efficient encoding.
 
==History==
The concept of [[analog video]] compression dates back to 1929, when R.D. Kell in [[United Kingdom|Britain]] proposed the concept of transmitting only the portions of the scene that changed from frame-to-frame. The concept of [[digital video]] compression dates back to 1952, when [[Bell Labs]] researchers B.M. Oliver and [[Chris Harrison (American football)|C.W. Harrison]] proposed the use of [[differential pulse-code modulation]] (DPCM) in video coding. In 1959, the concept of [[inter-frame]] [[motion compensation]] was proposed by [[NHK]] researchers Y. Taki, M. Hatori and S. Tanaka, who proposed predictive inter-frame video coding in the [[temporal dimension]].<ref name="ITU">{{cite web |title=History of Video Compression |url=https://www.itu.int/wftp3/av-arch/jvt-site/2002_07_Klagenfurt/JVT-D068.doc |website=[[ITU-T]] |publisher=Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) |date=July 2002 |pages=11, 24–9, 33, 40–1, 53–6 |access-date=November 3, 2019}}</ref> In 1967, [[University of London]] researchers A.H. Robinson and C. Cherry proposed [[run-length encoding]] (RLE), a [[lossless compression]] scheme, to reduce the transmission bandwidth of [[analog television]] signals.<ref name="robinson">{{cite journal |author1-last=Robinson |author1-first=A. H. |author2-last=Cherry |author2-first=C. |title=Results of a prototype television bandwidth compression scheme |journal=[[Proceedings of the IEEE]] |publisher=[[IEEE]] |volume=55 |number=3 |date=1967 |pages=356–364 |doi=10.1109/PROC.1967.5493}}</ref>
 
The earliest digital video coding algorithms were either for [[uncompressed video]] or used [[lossless compression]], both methods inefficient and impractical for digital video coding.<ref name="Ghanbari">{{cite book |last1=Ghanbari |first1=Mohammed |title=Standard Codecs: Image Compression to Advanced Video Coding |date=2003 |publisher=[[Institution of Engineering and Technology]] |isbn=9780852967102 |pages=1–2 |url=https://books.google.com/books?id=7XuU8T3ooOAC&pg=PA1}}</ref><ref name="Lea">{{cite book |last1=Lea |first1=William |title=Video on demand: Research Paper 94/68 |date=1994 |publisher=[[House of Commons Library]] |url=https://researchbriefings.parliament.uk/ResearchBriefing/Summary/RP94-68 |access-date=September 20, 2019}}</ref> Digital video was introduced in the 1970s,<ref name="Ghanbari"/> initially using uncompressed [[pulse-code modulation]] (PCM), requiring high [[bitrate]]s around 45{{ndash}}200 [[Mbit/s]] for [[standard-definition]] (SD) video,<ref name="Ghanbari"/><ref name="Lea"/> which was up to 2,000 times greater than the [[telecommunication]] [[Bandwidth (computing)|bandwidth]] (up to 100{{nbsp}}[[kilobits per second|kbit/s]]) available until the 1990s.<ref name="Lea"/> Similarly, uncompressed [[high-definition video|high-definition]] (HD) [[1080p]] video requires bitrates exceeding 1{{nbsp}}[[Gbit/s]], significantly greater than the bandwidth available in the 2000s.<ref>{{cite book |last1=Lee |first1=Jack |title=Scalable Continuous Media Streaming Systems: Architecture, Design, Analysis and Implementation |date=2005 |publisher=[[John Wiley & Sons]] |isbn=9780470857649 |page=25 |url=https://books.google.com/books?id=7fuvu52cyNEC&pg=PA25}}</ref>
 
===Motion-compensated DCT===
Line 32:
The other key development was motion-compensated hybrid coding.<ref name="ITU"/> In 1974, Ali Habibi at the [[University of Southern California]] introduced hybrid coding,<ref name="Habibi">{{cite journal |last1=Habibi |first1=Ali |title=Hybrid Coding of Pictorial Data |journal=IEEE Transactions on Communications |date=1974 |volume=22 |issue=5 |pages=614–624 |doi=10.1109/TCOM.1974.1092258}}</ref><ref>{{cite journal |last1=Chen |first1=Z. |last2=He |first2=T. |last3=Jin |first3=X. |last4=Wu |first4=F. |title=Learning for Video Compression |journal=IEEE Transactions on Circuits and Systems for Video Technology |date=2019 |volume=30 |issue=2 |pages=566–576 |doi=10.1109/TCSVT.2019.2892608 |arxiv=1804.09869 |s2cid=13743007 }}</ref><ref>{{cite book |last1=Pratt |first1=William K. |title=Advances in Electronics and Electron Physics: Supplement |date=1984 |publisher=[[Academic Press]] |isbn=9780120145720 |page=158 |url=https://books.google.com/books?id=OX00AAAAIAAJ |quote=A significant advance in image coding methodology occurred with the introduction of the concept of hybrid transform/DPCM coding (Habibi, 1974).}}</ref> which combines predictive coding with transform coding.<ref name="ITU"/><ref>{{cite book |last1=Ohm |first1=Jens-Rainer |title=Multimedia Signal Coding and Transmission |date=2015 |publisher=Springer |isbn=9783662466919 |pages=364 |url=https://books.google.com/books?id=e7xnBwAAQBAJ&pg=PA364}}</ref> He examined several transform coding techniques, including the DCT, [[Hadamard transform]], [[Fourier transform]], slant transform, and [[Karhunen-Loeve transform]].<ref name="Habibi"/> However, his algorithm was initially limited to [[intra-frame]] coding in the spatial dimension. In 1975, John A. Roese and Guner S. Robinson extended Habibi's hybrid coding algorithm to the temporal dimension, using transform coding in the spatial dimension and predictive coding in the temporal dimension, developing [[inter-frame]] motion-compensated hybrid coding.<ref name="ITU"/><ref name="Roese">{{cite journal |last1=Roese |first1=John A. |last2=Robinson |first2=Guner S. |editor-first1=Andrew G. |editor-last1=Tescher |title=Combined Spatial And Temporal Coding Of Digital Image Sequences |journal=Efficient Transmission of Pictorial Information |date=October 30, 1975 |volume=0066 |pages=172–181 |doi=10.1117/12.965361 |publisher=International Society for Optics and Photonics|bibcode=1975SPIE...66..172R |s2cid=62725808 }}</ref> For the spatial transform coding, they experimented with different transforms, including the DCT and the [[fast Fourier transform]] (FFT), developing inter-frame hybrid coders for them, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-[[bit]] per [[pixel]] for a [[videotelephone]] scene with image quality comparable to a typical intra-frame coder requiring 2-bit per pixel.<ref>{{cite book |last1=Huang |first1=T. S. |title=Image Sequence Analysis |date=1981 |publisher=[[Springer Science & Business Media]] |isbn=9783642870378 |page=29 |url=https://books.google.com/books?id=bAirCAAAQBAJ&pg=PA29}}</ref><ref name="Roese"/>
 
The DCT was applied to video encoding by Wen-Hsiung Chen,<ref name="Stankovic">{{cite journal |last1=Stanković |first1=Radomir S. |last2=Astola |first2=Jaakko T. |title=Reminiscences of the Early Work in DCT: Interview with K.R. Rao |journal=Reprints from the Early Days of Information Sciences |date=2012 |volume=60 |url=http://ticsp.cs.tut.fi/reports/ticsp-report-60-reprint-rao-corrected.pdf |access-date=October 13, 2019}}</ref> who developed a fast DCT algorithm with C.H. Smith and S.C. Fralick in 1977,<ref>{{cite journal |last1=Chen |first1=Wen-Hsiung |last2=Smith |first2=C. H. |last3=Fralick |first3=S. C. |title=A Fast Computational Algorithm for the Discrete Cosine Transform |journal=[[IEEE Transactions on Communications]] |date=September 1977 |volume=25 |issue=9 |pages=1004–1009 |doi=10.1109/TCOM.1977.1093941}}</ref><ref name="t81">{{cite web |title=T.81 – Digital compression and coding of continuous-tone still images – Requirements and guidelines |url=https://www.w3.org/Graphics/JPEG/itu-t81.pdf |publisher=[[CCITT]] |date=September 1992 |access-date=July 12, 2019}}</ref> and founded [[Compression Labs, Inc.|Compression Labs]] to commercialize DCT technology.<ref name="Stankovic"/> In 1979, [[Anil K. Jain (electrical engineer, born 1946)|Anil K. Jain]] and Jaswant R. Jain further developed motion-compensated DCT video compression.<ref>{{cite book |last1=Cianci |first1=Philip J. |title=High Definition Television: The Creation, Development and Implementation of HDTV Technology |date=2014 |publisher=McFarland |isbn=9780786487974 |page=63 |url=https://books.google.com/books?id=0mbsfr38GTgC&pg=PA63}}</ref><ref name="ITU"/> This led to Chen developing a practical video compression algorithm, called motion-compensated DCT, or adaptive scene coding, in 1981.<ref name="ITU"/> Motion-compensated DCT later became the standard coding technique for video compression from the late 1980s onwards.<ref name="Ghanbari"/><ref name="Li">{{cite book |last1=Li |first1=Jian Ping |title=Proceedings of the International Computer Conference 2006 on Wavelet Active Media Technology and Information Processing: Chongqing, China, 29-31 August 2006 |date=2006 |publisher=[[World Scientific]] |isbn=9789812709998 |page=847 |url=https://books.google.com/books?id=FZiK3zXdK7sC&pg=PA847}}</ref>
 
===Video coding standards===