HTTP compression: Difference between revisions

Content deleted Content added
Bender the Bot (talk | contribs)
m HTTP → HTTPS for Internet Assigned Numbers Authority, replaced: http://www.iana.org/ → https://www.iana.org/ (2)
Monkbot (talk | contribs)
m Task 18 (cosmetic): eval 25 templates: del empty params (13×); hyphenate params (16×);
Line 1:
{{HTTP}}
'''HTTP compression''' is a capability that can be built into [[web server]]s and [[web client]]s to improve transfer speed and bandwidth utilization.<ref>{{cite web|url=http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/d52ff289-94d3-4085-bc4e-24eb4f312e0e.mspx?mfr=true|title=Using HTTP Compression (IIS 6.0)|accessdateaccess-date=9 February 2010|publisher=Microsoft Corporation}}</ref>
 
HTTP data is [[Data compression|compressed]] before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include [[gzip]] and [[Deflate]]; however, a full list of available schemes is maintained by the [[Internet Assigned Numbers Authority|IANA]].<ref>RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."</ref> Additionally, third parties develop new methods and include them in their products, such as the Google [[Shared Dictionary Compression for HTTP]] (SDCH) scheme implemented in the [[Google Chrome]] browser and used on Google servers.
Line 34:
 
==Content-Encoding tokens==
The official list of tokens available to servers and client is maintained by IANA,<ref>{{cite web|url=https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding|title=Hypertext Transfer Protocol Parameters - HTTP Content Coding Registry|publisher=IANA|accessdateaccess-date=18 April 2014}}</ref> and it includes:
 
*br – [[Brotli]], a compression algorithm specifically designed for HTTP content encoding, defined in RFC 7932 and implemented in Mozilla Firefox release 44 and Chromium release 50
Line 40:
*deflate – compression based on the [[DEFLATE|deflate]] algorithm (described in RFC 1951), a combination of the [[LZ77_and_LZ78#LZ77|LZ77]] algorithm and Huffman coding, wrapped inside the [[zlib]] data format (RFC 1950);
*exi – W3C [[Efficient XML Interchange]]
*[[gzip]] – GNU zip format (described in RFC 1952). Uses the [[DEFLATE|deflate]] algorithm for compression, but the data format and the checksum algorithm differ from the "deflate" content-encoding. This method is the most broadly supported as of March 2011.<ref>{{cite web|url=http://www.vervestudios.co/projects/compression-tests/results|title=Compression Tests: Results|last=|first=|date=|website=|publisher=Verve Studios, Co|archive-url=https://web.archive.org/web/20120321182910/http://www.vervestudios.co/projects/compression-tests/results|archive-date=21 March 2012|accessdateaccess-date=19 July 2012}}</ref>
*[[Identity function|identity]] – No transformation is used. This is the default value for content coding.
*[[Pack200|pack200-gzip]] – Network Transfer Format for Java Archives<ref>{{cite web|url=https://jcp.org/en/jsr/detail?id=200|title=JSR 200: Network Transfer Format for Java Archives|publisher=The Java Community Process Program}}</ref>
Line 47:
In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients:
 
*[[bzip2]] – compression based on the free bzip2 format, supported by [[lighttpd]]<ref>{{cite web|url=http://redmine.lighttpd.net/projects/1/wiki/Docs_ModCompress|title=ModCompress - Lighttpd|publisher=lighty labs|accessdateaccess-date=18 April 2014}}</ref>
*[[Lempel–Ziv–Markov_chain_algorithm|lzma]] – compression based on (raw) LZMA is available in Opera 20, and in elinks via a compile-time option<ref>[http://elinks.or.cz/documentation/html/manual.html-chunked/ch01s07.html#CONFIG-LZMA elinks LZMA decompression]</ref>
*peerdist<ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/dd304322%28v=PROT.10%29.aspx|title=[MS-PCCRTP]: Peer Content Caching and Retrieval: Hypertext Transfer Protocol (HTTP) Extensions|publisher=Microsoft|accessdateaccess-date=19 April 2014}}</ref> – Microsoft Peer Content Caching and Retrieval
*[[rsync]]<ref>{{cite web |title=rproxy: Protocol Definition for HTTP rsync Encoding |url=https://rproxy.samba.org/doc/protocol/protocol.html |website=rproxy.samba.org}}</ref> - [[Delta_encoding#Delta_encoding_in_HTTP|delta encoding in HTTP]], implemented by a pair of ''rproxy'' proxies.
*[[sdch]]<ref>{{cite web|url=http://lists.w3.org/Archives/Public/ietf-http-wg/2008JulSep/att-0441/Shared_Dictionary_Compression_over_HTTP.pdf|title=A Proposal for Shared Dictionary Compression Over HTTP|publisher=Google|last1=Butler|first1=Jon|author2=Wei-Hsin Lee|last3=McQuade|first3=Bryan|last4=Mixter|first4=Kenneth}}</ref><ref>{{cite web|url=https://groups.google.com/forum/#!forum/SDCH|title=SDCH Mailing List|publisher=Google Groups}}</ref> – Google Shared Dictionary Compression for HTTP, based on [[VCDIFF]] (RFC 3284)
*xpress - Microsoft compression protocol used by Windows&nbsp;8 and later for Windows Store application updates. [[LZ77_and_LZ78#LZ77|LZ77]]-based compression optionally using a Huffman encoding.<ref>{{cite web|url=https://msdn.microsoft.com/en-us/library/Hh554002.aspx|title=[MS-XCA]: Xpress Compression Algorithm|accessdateaccess-date=29 August 2015}}</ref>
*[[XZ Utils|xz]] - LZMA2-based content compression, supported by a non-official Firefox patch;<ref>{{cite web|url=https://wiki.mozilla.org/LZMA2_Compression|title=LZMA2 Compression - MozillaWiki|accessdateaccess-date=18 April 2014}}</ref> and fully implemented in mget since 2013-12-31.<ref>{{cite web|url=https://github.com/rockdaboot/mget|title=mget GitHub project page|last=|first=|access-date=|website=|publisher=|accessdate=6 January 2017}}</ref>
 
==Servers that support HTTP compression==
Line 80:
 
==Problems preventing the use of HTTP compression==
A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted<ref name="google-use-compression">{{cite web|url=https://developers.google.com/speed/articles/use-compression|title=Use compression to make the web faster|accessdateaccess-date=22 May 2013|publisher=Google Developers}}</ref> daily due to increase in page load time when users do not receive compressed content. This occurs when anti-virus software interferes with connections to force them to be uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy&nbsp;– a common configuration in corporate environments&nbsp;– was the mainstream browser most prone to failing back to uncompressed HTTP.<ref name="google-use-compression" />
 
Another problem found while deploying HTTP compression on large scale is due to the '''deflate''' encoding definition: while HTTP 1.1 defines the '''deflate''' encoding as data compressed with deflate (RFC 1951) inside a [[zlib]] formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream,<ref>{{cite web|url=https://stackoverflow.com/questions/9170338/why-are-major-web-sites-using-gzip/9186091#9186091|title=deflate - Why are major web sites using gzip?|publisher=Stack Overflow|accessdateaccess-date=18 April 2014}}</ref> making its deployment unreliable.<ref>{{cite web|url=http://www.vervestudios.co/projects/compression-tests/|title=Compression Tests: About|last=|first=|date=|website=|publisher=Verve Studios|archive-url=https://web.archive.org/web/20150102111552/http://www.vervestudios.co/projects/compression-tests/|archive-date=2 January 2015|accessdateaccess-date=18 April 2014}}</ref><ref>{{cite web|url=http://zoompf.com/blog/2012/02/lose-the-wait-http-compression|title=Lose the wait: HTTP Compression|publisher=Zoompf Web Performance|accessdateaccess-date=18 April 2014}}</ref> For this reason, some software, including the Apache HTTP Server, only implement '''gzip''' encoding.
 
==Security implications==
Line 91:
In 2012, a general attack against the use of data compression, called [[CRIME]], was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined.
 
In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link.<ref name=Gooin20130801>{{cite web|last=Goodin|first=Dan|title=Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages |url=https://arstechnica.com/security/2013/08/gone-in-30-seconds-new-attack-plucks-secrets-from-https-protected-pages/|work=Ars Technica|publisher=Condé Nast|accessdateaccess-date=2 August 2013|date=1 August 2013}}</ref> All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used.<ref>{{cite web|last=Leyden|first=John|title=Step into the BREACH: New attack developed to read encrypted web data |url=https://www.theregister.co.uk/2013/08/02/breach_crypto_attack/|work=The Register|accessdateaccess-date=2 August 2013|date=2 August 2013}}</ref> Unlike previous instances of [[CRIME (security exploit)|CRIME]], which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users.<ref name=Gooin20130801/>
 
As of 2016, the TIME attack and the HEIST attack are now public knowledge.<ref>{{cite web|last=Sullivan|first=Nick|title=CRIME, TIME, BREACH and HEIST: A brief history of compression oracle attacks on HTTPS |url=https://www.helpnetsecurity.com/2016/08/11/compression-oracle-attacks-https/|accessdateaccess-date=16 August 2016|date=11 August 2016}}</ref><ref>{{cite web|last=Goodin|first=Dan|title= HEIST exploit — New attack steals SSNs, e-mail addresses, and more from HTTPS pages|url=https://arstechnica.com/security/2016/08/new-attack-steals-ssns-e-mail-addresses-and-more-from-https-pages/|accessdateaccess-date=16 August 2016|date=3 August 2016}}</ref><ref>{{cite web|last=Be'ery|first=Tal|title=A Perfect Crime? TIME will tell.|url=https://www.owasp.org/images/e/eb/A_Perfect_CRIME_TIME_Will_Tell_-_Tal_Beery.pdf}}</ref><ref>{{cite web|last=Vanhoef|first=Mathy|title=HEIST: HTTP Encrypted Information can be Stolen through TCP-windows|url=https://www.blackhat.com/docs/us-16/materials/us-16-VanGoethem-HEIST-HTTP-Encrypted-Information-Can-Be-Stolen-Through-TCP-Windows-wp.pdf}}</ref>
 
==References==