Data URI scheme: Difference between revisions

Content deleted Content added
Brianiac (talk | contribs)
m URL equivocation
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(727 intermediate revisions by more than 100 users not shown)
Line 1:
{{Redirect|data:|the [[WP:IW|interwiki]] shortcut to Wikidata|d:}}
{{lowercase|title=data: URL}}
{{Short description|Web page in-line data scheme}}
{{Lowercase title}}The '''data URI scheme''' is a [[Uniform resource identifier|uniform resource identifier (URI)]] scheme that provides a way to include data in-line in [[Web page]]s as if they were external resources. It is a form of file literal or [[here document]]. This technique allows normally separate elements such as images and style sheets to be fetched in a single [[Hypertext Transfer Protocol|Hypertext Transfer Protocol (HTTP)]] request, which may be more efficient than multiple HTTP requests,<ref>{{cite web|url=http://blog.teamtreehouse.com/using-data-uris-speed-website|title=Using Data URIs to Speed Up Your Website|date=27 March 2014|publisher=Treehouse Blog}}</ref> and used by several browser extensions to package images as well as other multimedia content in a single HTML file for page saving.<ref>{{cite web|url=https://chrome.google.com/webstore/detail/singlefile/mpiodijhokgodhhofbcjdecpffjipkle|title=SingleFile - Chrome Web Store|website=Chrome Web Store|access-date=25 August 2018}}</ref><ref>{{cite web|url=https://addons.mozilla.org/en-US/firefox/addon/single-file/|title=SingleFile – Add-ons for Firefox|website=Firefox Add-ons|access-date=25 August 2018}}</ref> {{As of|2024}}, data URIs are fully supported by all major browsers.<ref>{{cite web|url=http://caniuse.com/#feat=datauri|title=Can I use...|first=Alexis|last=Deveria|date=July 2015|access-date=31 August 2015}}</ref>
 
==Syntax==
'''data: URL'''s, [[IETF]] standard (RFC 2397), are a kind of [[Uniform Resource Locator|URL]] (even though they do not ''locate'' a resource) that allows inclusion of small data items inline, as if they had been included externally. They tend to be far simpler than alternative inclusion methods, such as [[MIME]] with cid: or mid:.
The syntax of data URIs is defined in [[Request for Comments|Request for Comments (RFC)]] 2397, published in August 1998,<ref>{{cite web|url=http://tools.ietf.org/html/rfc2397|title=RFC 2397 - The "data" URL scheme|author=Masinter, L|publisher=[[Internet Engineering Task Force]]|date=August 1998|access-date=2008-08-12}}</ref> and follows the [[Uniform resource identifier#Generic syntax|URI scheme syntax]]. A data URI consists of:
 
<pre>data:content/type;base64,</pre>
It is currently supported by [[Mozilla Application Suite|Mozilla]] (and its derivatives like [[Mozilla Firefox|Firefox]]), [[Opera (web browser)|Opera]], [[Safari (web browser)|Safari]] and [[Konqueror]]. [[Microsoft]]'s [[Internet Explorer]], as of version 6, does not support data: URLs.
 
* The '''scheme''', <code>data</code>. It is followed by a colon (<code>:</code>).
===Advantages===
* An optional '''media type'''. The media type part may include one or more parameters, in the format <code>attribute=value</code>, separated by semicolons (<code>;</code>) . A common media type parameter is <code>charset</code>, specifying the character set of the media type, where the value is from the IANA list of [[character set]] names.<ref>{{cite web|url=https://www.iana.org/assignments/character-sets/character-sets.xhtml|title=Character Sets|editor1-first=Ned|editor1-last=Freed|editor2-first=Martin|editor2-last=Dürst|publisher=[[Internet Assigned Numbers Authority]]|date=20 December 2013|access-date=31 August 2015}}</ref> If one is not specified, the [[media type]] of the data URI is assumed to be <code>text/plain;charset=US-ASCII</code>.
* An optional '''base64 extension''' <code>base64</code>, separated from the preceding part by a semicolon. When present, this indicates that the data content of the URI is [[binary data]], encoded in [[ASCII]] format using the [[Base64]] scheme for [[binary-to-text encoding]]. The base64 extension is distinguished from any media type parameters by virtue of not having a <code>=value</code> component and by coming after any media type parameters. Since Base64 encoded data is approximately 33% larger than original data, it is recommended to use Base64 data URIs only if the server supports [[HTTP compression]] or embedded files are smaller than 1KB.
* The '''data''', separated from the preceding part by a comma (<code>,</code>). The data is a sequence of zero or more [[octet (computing)|octets]] represented as characters. The comma is required in a data URI, even when the data part has zero length. The characters permitted within the data part include ASCII upper and lowercase letters, digits, and many ASCII punctuation and special characters. Note that this may include characters, such as colon, semicolon, and comma which are delimiters in the URI components preceding the data part. Other octets must be [[percent-encoding|percent-encoded]]. If the data is Base64-encoded, then the data part may contain only valid Base64 characters.<ref name="rfc3986">{{cite web|url=http://tools.ietf.org/html/rfc3986|title=Uniform Resource Identifiers (URI): Generic Syntax|author1-first=Tim|author1-last=Berners-Lee|author1-link=Tim Berners-Lee|author2-first=Roy|author2-last=Fielding|author2-link=Roy Fielding|author3-first=Larry|author3-last=Masinter|publisher=[[Internet Engineering Task Force]]|date=January 2005|access-date=31 August 2015}}</ref> Note that Base64-encoded <code>data:</code> URIs use the standard Base64 character set (with '<code>+</code>' and '<code>/</code>' as characters 62 and 63) rather than the so-called "[[Base64#URL_applications|URL-safe Base64]]" character set.
 
Examples of data URIs showing most of the features are:
* HTTP headers are not required for embedded data, so data: URLs can use fewer network resources when the overhead of encoding the inline content as a data: URL is smaller than the HTTP headers that would otherwise be required.
* Web browsers are typically limited to four concurrent connections to a server, so inline data frees up a download connection for other content.
* Browsers manage fewer cache entries for a file that contains data: URLs.
* Environments with limited or restricted access to external resources may embed content when it is disallowed or impractical to reference externally. For example, an advanced HTML editing field could accept a pasted or inserted image and convert it to a data: URL to hide the complexity of external resources from the user.
 
:<pre>data:text/vnd-example+xyz;foo=bar;base64,R0lGODdh</pre>
===Disadvantages===
:<pre>data:text/plain;charset=UTF-8;page=21,the%20data:1234,5678</pre> (outputs: "the data:1234,5678")
:<pre></pre>
:<pre>data:image/svg+xml;utf8,<svg width='10'... </svg></pre>
 
The minimal data URI is <code>data:,</code>, consisting of the
* Embedded content must be extracted and decoded before changes may be made, then re-encoded and re-embedded afterward.
scheme, no media-type, and zero-length data.
* Base64-encoded data: URLs are roughly 50% larger in size than their binary equivalent.
* URL-encoded data: URLs can be up to three times larger (in extreme cases) than the original text content.
* Information that is embedded more than once is redownloaded as part of the containing file, and does not benefit from the browser's cache.
* Browser limits to URL length provide a maximum data size. For example, URLs in Opera are limited to around 4100 characters.
 
Thus, within the overall URI syntax, a data URI consists of a '''scheme''' and a '''path''', with no '''authority''' part, '''query string''', or '''fragment'''. The optional '''media type''', the optional '''base64''' indicator, and the data are all parts of the
==Format==
URI path.
data:[<mediatype>][;base64],<data>
 
==Examples of use==
The <mediatype> is an internet media type specification (with optional parameters.) The appearance of ";base64" means that the data is encoded as [[base64]]. Without ";base64", the data (as a sequence of octets) is represented using [[ASCII]] encoding for [[octet]]s inside the range of safe URL characters and using the standard %xx hex encoding of URLs for octets outside that range. If <mediatype> is omitted, it defaults to text/plain;charset=US-ASCII. As a shorthand, "text/plain" can be omitted but the charset parameter supplied.
 
===HTML===
The "data" URL scheme has no relative URL forms.
An [[HTML]] fragment embedding a '''base64''' encoded '''PNG''' picture of a small red dot: [[File:Red-dot-5px.png]]
 
<syntaxhighlight lang="html">
==Examples==
<img alt="" src="
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" style="width:36pt;height:36pt" />
</syntaxhighlight>
 
In this example, the lines are broken for formatting purposes. In actual URIs,
===XHTML===
including data URIs, control characters (ASCII 0 to 31, and 127) and spaces (ASCII 32) are "excluded characters". This means that [[whitespace character]]s are not permitted in data URIs. However, in the context of HTML 4 and HTML 5, linefeeds within an element attribute value (such as the "src" above) are ignored{{Citation needed|reason=linefeeds are signifcant in the title attribute, so not ignored in HTML attributes|date=August 2017}}. So the data URI above would be processed ignoring the linefeeds, giving the correct result. But note that this is an HTML feature, not a data URI feature, and in other contexts, it is not possible to rely on whitespace within the URI being ignored.
 
An [[XHTMLHTML]] fragment embedding a '''utf8''' encoded '''SVG''' picture of a small imagered dot: ([[newlineFile:Red-dot.svg]]s added for clarity):
<img
src="
AAAC8IyPqcvt3wCcDkiLc7C0qwyGHhSWpjQu5yqmCYsapyuvUUlvONmOZtfzgFz
ByTB10QgxOR0TqBQejhRNzOfkVJ+5YiUqrXF5Y5lKh/DeuNcP5yLWGsEbtLiOSp
a/TPg7JpJHxyendzWTBfX0cxOnKPjgBzi4diinWGdkF8kjdfnycQZXZeYGejmJl
ZeGl9i2icVqaNVailT6F5iJ90m6mvuTS4OK05M0vDk0Q4XUtwvKOzrcd3iq9uis
F81M1OIcR7lEewwcLp7tuNNkM3uNna3F2JQFo97Vriy/Xl4/f1cf5VWzXyym7PH
hhx4dbgYKAAA7"
alt="Larry" />
 
<syntaxhighlight lang="html">
A compatible browser should display this image:
<img alt="Red dot" src="data:image/svg+xml;utf8,
<svg width='10' height='10' xmlns='http://www.w3.org/2000/svg'>
<circle style='fill:red' cx='5' cy='5' r='5'/>
</svg>"/>
</syntaxhighlight>
 
In this example, the image data is encoded with utf8 and hence the image data can broken into multiple lines for easy reading. Single quote has to be used in the SVG data as double quote is used for encapsulating the image source.
[[Image:DataUrlLarry.gif]]
 
A [[favicon]] can also be made with utf8 encoding and SVG data which has to appear in the 'head' section of the HTML:
Note that as an URL, the data: URL should be formattable with [[whitespace]]s, but there are practical issues with how that relates to base64 encoding [http://bugzilla.mozilla.org/show_bug.cgi?id=73026#c12]. Authors should ignore using whitespaces for base64 encoded data: URLs.
 
<syntaxhighlight lang="html">
<link rel="icon" href='data:image/svg+xml;utf8,
<svg width="10" height="10" xmlns="http://www.w3.org/2000/svg">
<circle style="fill:red" cx="5" cy="5" r="5"/>
</svg>'/>
</syntaxhighlight>
 
===CSS===
A [[Cascading Style Sheets|Cascading Style Sheets (CSS)]] rule that includes a background image:
<syntaxhighlight lang="css">
ul.checklist li.complete {
padding-left: 20px;
background: white url('\
ORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEU\
AAAD///+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8\
yw83NDDeNGe4Ug9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAEl\
FTkSuQmCC') no-repeat scroll left top;
}
</syntaxhighlight>
 
In this example, the <code>\ + <linefeed></code> line terminators
A [[CSS]] rule that includes a background image (again, [[newline]]s added for clarity):
are a feature of CSS, indicating continuation on the next line. These would be removed by the CSS stylesheet processor, and the data URI would be reconstituted without whitespace, making it correct, since whitespace is not allowed within the data component of a data:
 
URI.
ul.checklist > li.complete { margin-left: 20px; background:
url(
ABlBMVEUAAAD///+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8yw83NDDeN
Ge4Ug9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAElFTkSuQmCC) top left no-repeat; }
 
===JavaScript===
 
A [[JavaScript]] statement that opens an embedded subwindow, as for a footnote link:
 
<syntaxhighlight lang="javascript">
window.open('data:text/html;charset=utf-8,%3C!DOCTYPE%20HTML%20PUBLIC%20%22-'+
window.open('data:text/html;charset=utf-8,' +
'%2F%2FW3C%2F%2FDTD%20HTML%204.0%2F%2FEN%22%3E%0D%0A%3Chtml%20lang%3D%22en'+
encodeURIComponent( // Escape for URL formatting
'%22%3E%0D%0A%3Chead%3E%3Ctitle%3EEmbedded%20Window%3C%2Ftitle%3E%3C%2Fhea'+
'<!DOCTYPE html>'+
'd%3E%0D%0A%3Cbody%3E%3Ch1%3E42%3C%2Fh1%3E%3C%2Fbody%3E%0D%0A%3C%2Fhtml%3E'+
'<html lang="en">'+
'%0D%0A','_blank','height=300,width=400');
'<head><title>Embedded Window</title></head>'+
'<body><h1>42</h1></body>'+
'</html>'
)
);
</syntaxhighlight>
 
===SVG===
[[File:35_mm_angle_of_view_vs_focal_length.svg|thumb|link={{filepath:35_mm_angle_of_view_vs_focal_length.svg}}|Example of an SVG image with embedded JPEG images]]
A [[Scalable Vector Graphic]] image containing an embedded JPEG image encoded in Base64:
 
<syntaxhighlight lang="xml">
<svg>
<image width="64" height="24" href="data:image/jpeg;base64,
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDADIiJSwlHzIsKSw4NTI7S31RS0VFS5ltc1p9tZ++u7Kf
r6zI4f/zyNT/16yv+v/9////////wfD/////////////2wBDATU4OEtCS5NRUZP/zq/O////////
////////////////////////////////////////////////////////////wAARCAAYAEADAREA
AhEBAxEB/8QAGQAAAgMBAAAAAAAAAAAAAAAAAQMAAgQF/8QAJRABAAIBBAEEAgMAAAAAAAAAAQIR
AAMSITEEEyJBgTORUWFx/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/EABQRAQAAAAAAAAAAAAAAAAAA
AAD/2gAMAwEAAhEDEQA/AOgM52xQDrjvAV5Xv0vfKUALlTQfeBm0HThMNHXkL0Lw/swN5qgA8yT4
MCS1OEOJV8mBz9Z05yfW8iSx7p4j+jA1aD6Wj7ZMzstsfvAas4UyRHvjrAkC9KhpLMClQntlqFc2
X1gUj4viwVObKrddH9YDoHvuujAEuNV+bLwFS8XxdSr+Cq3Vf+4F5RgQl6ZR2p1eAzU/HX80YBYy
JLCuexwJCO2O1bwCRidAfWBSctswbI12GAJT3yiwFR7+MBjGK2g/WAJR3FdF84E2rK5VR0YH/9k="/>
</svg>
</syntaxhighlight>
 
==Malware and phishing==
The data URI can be utilized to construct attack pages that attempt to obtain usernames and passwords from unsuspecting web users. It can also be used to get around [[cross-site scripting]] (XSS) restrictions, embedding the attack payload fully inside the address bar, and hosted via URL shortening services rather than needing a full website that is controlled by a third party.<ref>Phishing without a webpage – researcher reveals how a link itself can be malicious, Naked Security by Sophos, 31 AUG 2012 https://nakedsecurity.sophos.com/2012/08/31/phishing-without-a-webpage-researcher-reveals-how-a-link-itself-can-be-malicious/ {{Webarchive|url=https://web.archive.org/web/20160416153147/https://nakedsecurity.sophos.com/2012/08/31/phishing-without-a-webpage-researcher-reveals-how-a-link-itself-can-be-malicious/ |date=2016-04-16 }}</ref> As a result, some browsers now block webpages from navigating to data URIs.<ref>{{cite web|title=Data URLs - HTTP &#124; MDN|url=https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs#Common_problems|website=MDN Web Docs|publisher=Mozilla|access-date=11 May 2018}}</ref>
 
==See alsoReferences==
{{reflist}}
*[[URL]]
 
{{URI scheme}}
==External links==
{{Web browsers}}
*[http://www.mozilla.org/quality/networking/docs/aboutdata.html About data: URLs and the mozilla implementation]
*[http://www.mozilla.org/quality/networking/testing/datatests.html data: URL tests]
*[http://software.hixie.ch/utilities/cgi/data/data The data: URI kitchen]
*[http://www.scalora.org/projects/uriencoder/ Convert files to data: URI HTML or JavaScript source code]
 
{{DEFAULTSORT:Data Uri Scheme}}
[[Category:URI scheme]]
[[Category:InternetURI standardsschemes]]
[[Category:Internet Standards]]