Query string: Difference between revisions

Content deleted Content added
m Tracking: an -> a
 
(584 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Part of a URL that assigns values to specified parameters}}
In the [[World Wide Web]], a '''query string''' is the part of a [[Uniform Resource Locator|URL]] that contains data to be passed to [[Common Gateway Interface|CGI]] programs.
 
A '''query string''' is a part of a uniform resource locator ([[URL]]) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.
[[Image:Url.png|frame|The [[Mozilla]] URL ___location bar showing an URL with the query string <code>title=Main_page&action=raw]]
 
[[File:Query string.png|frame|center|An [[address bar]] on [[Google Chrome]] showing a URL (Uniform Resource Locator) with the query string <code>?title=Query_string&action=edit</code>]]
When a [[web page]] is requested via the [[HyperText Transfer Protocol]], the server locates a file in its [[file system]] based on the requested [[Uniform Resource Locator|URL]]. This file may be a regular file or a program. In the second case, the server may (depending on its configuration) run the program, sending its output as the required page. The query string is a part of the URL which is passed to the program. This way, the URL can encode some data that is accessible to the program generating the web page.
 
A web server can handle a [[HTTPS|Hypertext Transfer Protocol]] (HTTP) request either by reading a file from its [[file system]] based on the [[URL]] path or by handling the request using logic that is specific to the type of resource. In cases where special logic is invoked, the query string will be available to that logic for use in its processing, along with the path component of the URL.
==Structure==
The URLs of documents to be generated by programs may contain a query string that is passed to the program. A typical such URL is as follows:
 
== Structure ==
:<code><nowiki>http://server/path/program?query_string</nowiki></code>
A typical URL containing a query string is as follows:{{quote|1=<code><nowiki>https://example.com/over/there?name=ferret</nowiki></code>}}
 
When a server receives a request for such a page, it runsmay run a program, (ifpassing configuredthe toquery dostring, so)which passingin thethis case is <code>query_stringname=ferret</code>, unchanged to the program in some way. The question mark is used as a separator, and is not part of the query string.<ref>{{cite The query string is passed as is to the program.web
| url = http://tools.ietf.org/html/rfc3986#section-3
| title = RFC 3986
| author = T. Berners-Lee
| author2 = R. Fielding
| author3 = L. Masinter
| date = January 2005
| at = "Syntax Components" (section 3)}}</ref><ref>{{cite web
| url = http://tools.ietf.org/html/rfc3986#section-3.4
| title = RFC 3986
| author = T. Berners-Lee
| author2 = R. Fielding
| author3 = L. Masinter
| date = January 2005
| at = "Query" (section 3.4)}}</ref>
 
[[Web Framework|Web frameworks]] may provide methods for parsing multiple parameters in the query string, separated by some delimiter.<ref name="w3c-recom" /> In the example URL below, multiple query parameters are separated by the [[ampersand]], "<code>&amp;</code>":
A link in a Web page may have a URL that contains a query string. However, query strings have been introduced to the aim of passing the content of a [[Web form]] to a program. In particular, when a form containing the fields <code>field<sub>1</sub></code>, <code>field<sub>2</sub></code>, <code>field<sub>3</sub></code> is submitted, the content of the fields are encoded as a query string as follows:
 
{{quote|1=<code><nowiki>https://example.com/path/to/page?name=ferret&amp;color=purple</nowiki></code>}}
:<code>field<sub>1</sub>=value<sub>1</sub>&field<sub>2</sub>=value<sub>2</sub>&field<sub>3</sub>=value<sub>3</sub>...</code>
 
The exact structure of the query string is not standardized. Methods used to parse the query string may differ between websites.
* The query string is composed of a series of field=value pairs.
* The field-value pairs are each separated by an [[equal sign]].
* The series of pairs is separated by the [[ampersand]], '&' (also by ';' in the newer [[W3C]] recommendations [http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2])</div>
 
ForA eachlink [[Fieldin (computera science)|field]]web ofpage themay form,have thea queryURL stringthat contains a pairquery field=valuestring. Web[[HTML]] formsdefines maythree includeways fields that are not visible to thea user, andagent thesecan fields are included ingenerate the query string when the form is submitted.:
* an [[form (HTML)|HTML form]] via the {{tag|form}} element
* a [[Image map#Server-side|server-side image map]] via the {{code|ismap}} attribute on the {{tag|img|open}} element with an {{tag|img|open|params=ismap}} construction
* an indexed search via the now deprecated {{tag|isindex|open}} element
 
=== Web forms ===
Technically, the form content is encoded as a query string when the form submission method is GET. The same encoding is used by default when the submission method is POST, but the result is not sent as a query string, that is, is not added to the action URL of the form. Rather, the string is sent as the body of the request.
One of the original uses was to contain the content of an [[form (HTML)|HTML form]], also known as web form. In particular, when a form containing the fields <code>field1</code>, <code>field2</code>, <code>field3</code> is submitted, the content of the fields is encoded as a query string as follows:
{{quote|1=<code>field1=value1&amp;field2=value2&amp;field3=value3...</code>}}
* The query string is composed of a series of field-value pairs.
* Within each pair, the field name and value are separated by an [[equals sign]], "<code>=</code>".
* The series of pairs is separated by the [[ampersand]], "<code>&amp;</code>" ([[semicolons]] "<code>;</code>" are not recommended by the [[W3C]] anymore, see below).
 
While there is no definitive standard, most [[web framework]]s allow multiple values to be associated with a single field (e.g. <code>field1=value1&field1=value2&field2=value3</code>).<ref>{{cite web|url=https://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#getParameterValues(java.lang.String) |title=ServletRequest (Java EE 6 )|website=docs.oracle.com |date=2011-02-10|access-date=2013-09-08}}</ref><ref>{{cite web|url=https://stackoverflow.com/questions/1746507/authoritative-position-of-duplicate-http-get-query-keys |title=uri – Authoritative position of duplicate HTTP GET query keys|website=Stack Overflow |date=2013-06-09|access-date=2013-09-08}}</ref>
==URL encoding==
Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character <code>#</code> is used to locate a point within a page; the character <code>=</code> is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as [[URL encoding]].
 
For each [[Field (computer science)|field]] of the form, the query string contains a pair <code><var>field</var>=<var>value</var></code>. Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted.
In particular, [[Request for Comments|RFC]] 1738 specifies that &ldquo;only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL&rdquo;. All characters in a query string can be replaced by their hexadecimal value precedeed by the symbol <code>%</code>. For example, the equal sign can be replaced by <code>%3D</code>. All characters can be replaced this way; for the characters that are forbidden in a query string, this is not only possible but necessary.
 
This convention is a [[W3C]] recommendation.<ref name="w3c-recom">[https://www.w3.org/TR/REC-html40/interact/forms.html#form-content-type Forms in HTML documents]. W3.org. Retrieved on 2013-09-08.</ref> In the recommendations of 1999, W3C recommended that all web servers support [[semicolon]] separators in addition to [[ampersand]] separators<ref>[http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Performance, Implementation, and Design Notes]. W3.org. Retrieved on 2013-09-08.</ref> to allow [[application/x-www-form-urlencoded]] query strings in URLs within HTML documents without having to entity escape ampersands. Since 2014, W3C recommends to use only [[ampersand]] as query separator.<ref name="w3c-recom-2014">{{cite web | url=https://www.w3.org/TR/2014/REC-html5-20141028/forms.html#url-encoded-form-data | title=4.10 Forms — HTML5 }}</ref>
The space character can be also represented by <code>+</code>.
 
The form content is only encoded in the URL's query string when the form submission method is [[GET (HTTP)|GET]]. The same encoding is used by default when the submission method is [[POST (HTTP)|POST]], but the result is submitted as the [[HTTP request]] body rather than being included in a modified URL.<ref name="html5" />
==RFC==
As defined in RFC 1738, an URL of scheme <code>http</code> can contain a ''searchpart'' following the rest of the URL and separated from it by a <code>?</code> character. RFC 3986 specifies that the ''query component'' of an [[Uniform Resource Identifier|URI]] is the part between the <code>?</code> and the end of the URI or the character <code>#</code>. The term ''query string'' is of common usage for referring to this part for the case of HTTP URLs.
 
=== Indexed search ===
==Example==
Before [[form (HTML)|forms]] were added to HTML, browsers rendered the –{{tag|isindex|open}} element as a single-line text-input control. The text entered into this control was sent to the server as a query string addition to a [[GET (HTTP)|GET]] request for the base URL or another URL specified by the {{code|action}} attribute.<ref>{{cite web |title=&lt;isindex&gt; |url=https://developer.mozilla.org/en-US/docs/Web/HTML/Element/isindex |website=HTML (HyperText Markup Language) |access-date=2015-11-21 |archive-date=2017-10-19 |archive-url=https://web.archive.org/web/20171019030835/https://developer.mozilla.org/en-US/docs/Web/HTML/Element/isindex |url-status=dead }}</ref> This was intended to allow web servers to use the provided text as query criteria so they could return a list of matching pages.<ref>{{cite web |title=HTML/Elements/isindex |url=https://www.w3.org/wiki/HTML/Elements/isindex |website=W3C Wiki |access-date=2020-03-20 |archive-date=2021-06-22 |archive-url=https://web.archive.org/web/20210622024419/https://www.w3.org/wiki/HTML/Elements/isindex |url-status=dead }}</ref>
If a form embedded in an [[HTML]] page as follows:
<form action=cgi-bin/test.cgi method=get>
<input type=text name=first>
<input type=text name=second>
<input type=submit>
 
When the text input into the indexed search control is submitted, it is encoded as a query string as follows:
and the user inserts the strings &ldquo;this is a field&rdquo; and &ldquo;was it clear (already)?&rdquo; in the two [[textfield]]s and presses the submit button, the program <code>test.cgi</code> will receive the following query string:
{{quote|1=<code>argument1+argument2+argument3...</code>}}
firstname=this+is+a+field&secondname=was+it+clear+%28already%29%3F
* The query string is composed of a series of arguments by parsing the text into words at the spaces.
* The series is separated by the [[plus sign]], '<code>+</code>'.
 
Though the {{tag|isindex|open}} element is deprecated and most browsers no longer support or render it, there are still some vestiges of indexed search in existence. For example, this is the source of the special handling of [[plus sign]], '<code>+</code>' within browser URL percent encoding (which today, with the deprecation of indexed search, is all but redundant with <code>%20</code>). Also some web servers supporting [[Common Gateway Interface|CGI]] (e.g., [[Apache HTTP Server|Apache]]) will process the query string into command line arguments if it does not contain an [[equals sign]], '<code>=</code>' (as per section 4.4 of CGI 1.1). Some CGI scripts still depend on and use this historic behavior for URLs embedded in HTML.
In [[UNIX]]-based [[web server]]s, the program receives the query string as an [[environment variable]] named <code>QUERY_STRING</code>
 
==Tracking URL encoding ==
{{Main|Percent-encoding}}
A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the server [[log file]]s.
 
Some [[Character (computing)|characters]] cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character <code>#</code> can be used to further specify a subsection (or [[Fragment identifier|fragment]]) of a document. In HTML forms, the character <code>=</code> is used to separate a name from a value. The URI generic syntax uses [[Percent-encoding#Percent-encoding reserved characters|URL encoding]] to deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as '<code>+</code>' or "<code>%20</code>".<ref name="w3schools" />
These facts allow query strings to be used to track users in a manner similar to that provided by [[HTTP cookie]]s. For this to work, every time the user download a page, a unique identifier is chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.
 
[[HTML 5]] specifies the following transformation for submitting HTML forms with the "GET" method to a web server. The following is a brief summary of the algorithm:
* Characters that cannot be converted to the correct charset are replaced with HTML [[numeric character reference]]s<ref name="html5 urlencoded" />
* SPACE is encoded as '<code>+</code>' or '<code>%20</code>'
* Letters (<code>A</code>–<code>Z</code> and <code>a</code>–<code>z</code>), numbers (<code>0</code>–<code>9</code>) and the characters '<code>~</code>','<code>-</code>','<code>.</code>' and '<code>_</code>' are left as-is
* <code>+</code> is encoded by %2B
* All other characters are encoded as a <code>%HH</code> [[hexadecimal]] representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
 
The octet corresponding to the tilde ("<code>~</code>") is permitted in query strings by RFC3986 but required to be percent-encoded in HTML forms to "<code>%7E</code>".
 
The encoding of SPACE as '<code>+</code>' and the selection of "as-is" characters distinguishes this encoding from RFC 3986.
 
== Example ==
If a [[Form (web)|form]] is embedded in an [[HTML]] page as follows:
<syntaxhighlight lang="html">
<form action="/cgi-bin/test.cgi" method="get">
<input type="text" name="first" />
<input type="text" name="second" />
<input type="submit" />
</form>
</syntaxhighlight>
 
and the user inserts the strings "this is a field" and "was it clear (already)?" in the two [[Text box|text fields]] and presses the submit button, the program <code>test.cgi</code> (the program specified by the <code>action</code> [[HTML attribute|attribute]] of the <code>form</code> [[HTML element|element]] in the above example) will receive the following query string:
<code>first=this+is+a+field&amp;second=was+it+clear+%28already%29%3F</code>.
 
If the form is processed on the [[web server|server]] by a [[Common Gateway Interface|CGI]] [[Scripting language|script]], the script may typically receive the query string as an [[environment variable]] named <code>QUERY_STRING</code>.
 
== Tracking ==
A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the server [[computer data logging|log files]].
 
These facts allow query strings to be used to track users in a manner similar to that provided by [[HTTP cookie]]s. For this to work, every time the user downloads a page, a unique identifier must be chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.
 
For example, when a web page containing the following is requested:
<syntaxhighlight lang="html">
<a href="frank.html">see my page!</a>
<a href="cicciofoo.html">minesee ismy betterpage!</a>
<a href="bar.html">mine is better</a>
</syntaxhighlight>
 
a unique string, such as <code>sdfsd23423e0a72cb2a2c7</code> is chosen, and the page is modified as follows:
<syntaxhighlight lang="html">
<a href="frank.html?sdfsd23423">see my page!</a>
<a href="cicciofoo.html?sdfsd23423e0a72cb2a2c7">minesee ismy betterpage!</a>
<a href="bar.html?e0a72cb2a2c7">mine is better</a>
</syntaxhighlight>
 
The addition of the query string dodoes not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the page <code>frankfoo.html?sdfsd23423e0a72cb2a2c7</code> to the server, which ignores what follows <code>?</code> and sends the page <code>frankfoo.html</code> as expected, adding the query string to its links as well.
 
This way, any subsequent page request from this user will carry the same query string <code>sdfsd23423e0a72cb2a2c7</code>, making it possible to establish that all these pages have been viewed by the same user. Query strings are often used in association with [[web beacon]]s.
 
The main differences between query strings used for tracking and HTTP cookies are that:
# Query strings form part of the URL, and are therefore included if the user saves or sends the URL to another user; cookies can be maintained across browsing sessions, but are not saved or sent with the URL.
# If the user arrives toat the same web server by two (or more) independent paths, it will be assigned two different query strings, while the stored cookies are the same.
# The user can disable cookies, in which case using cookies for tracking does not work. However, using query strings for tracking should work in all situations.
# Different query strings passed by different visits to the page will mean that the pages are never served from the browser (or proxy, if present) cache thereby increasing the load on the web server and slowing down the user experience.
 
== Compatibility issues ==
==See also==
*According to the [[HyperTextHypertext Transfer Protocol|HTTP]] specification:
<blockquote>Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.<ref>[https://tools.ietf.org/html/rfc7230#section-3.1.1 HTTP/1.1 Message Syntax and Routing]. ietf.org. Retrieved on 2014-07-31.</ref></blockquote>
* [[Common Gateway Interface]]
 
If the URL is too long, the web server fails with the [[List of HTTP status codes#414|414 Request-URI Too Long]] HTTP status code.
 
The common workaround for these problems is to use [[POST (HTTP)|POST]] instead of [[GET (HTTP)|GET]] and store the parameters in the request body. The length limits on request bodies are typically much higher than those on URL length. For example, the limit on POST size, by default, is 2&nbsp;MB on IIS 4.0 and 128&nbsp;KB on IIS 5.0. The limit is configurable on Apache2 using the <code>LimitRequestBody</code> directive, which specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2&nbsp;GB) that are allowed in a request body.<ref>[https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody core – Apache HTTP Server]. Httpd.apache.org. Retrieved on 2013-09-08.</ref>
 
== See also ==
{{div-col}}
* [[Clean URL]]
* [[DoubleClick Click Identifier|Click identifier]]
* [[Common Gateway Interface]] (CGI)
* [[HTTP cookie]]
* [[HyperText Transfer Protocol]] (HTTP)
* [[Semantic URL]]s
* [[URI fragment]]
* [[URI normalization]]
* [[URI scheme]]
* [[UTM parameters]]
* [[Web beacon]]
{{div-col-end}}
 
==External linksReferences ==
{{Reflist | refs=
* RFC 1738
<ref name="html5">[https://www.w3.org/TR/html52/sec-forms.html#form-submission-algorithm], HTML5.2, W3C recommendation, 14 December 2017</ref>
* RFC 3986
<ref name="w3schools">{{cite web|title=HTML URL Encoding Reference|url=https://www.w3schools.com/tags/ref_urlencode.asp|publisher=W3Schools|access-date=May 1, 2013}}</ref>
 
<ref name="html5 urlencoded">The [https://www.w3.org/TR/html52/sec-forms.html#application-x-www-form-urlencoded-encoding-algorithm ''application/x-www-form-urlencoded'' encoding algorithm], HTML5.2, W3C recommendation, 14 December 2017</ref>
[[Category:World Wide Web]]
}}
 
[[Category:URL]]
[[es:Query string]]
[[Category:String (computer science)]]
[[nl:Querystring]]