Content deleted Content added
Fixed table. →Character and entity references |
m Replace em-dash with with en-dash. |
||
(33 intermediate revisions by 19 users not shown) | |||
Line 1:
{{Short description|
{{Redirect2|.htm|.html||HTM (disambiguation){{!}}HTM}}
{{pp-vandalism|small=yes}}
{{Infobox file format
| name = HTML
| icon =
| icon_size =
| _noextcode = on
| extension = {{unbulleted list|<code>.html</code>|<code>.htm</code>}}
Line 27:
}}
{{HTML}}
'''Hypertext Markup Language''' ('''HTML''') is the standard [[markup language]]{{efn|Even though HTML can be run in a browser, it is not viewed as a [[programming language]] in programming language discourse.<ref>{{Cite book |author-link=Felienne Hermans|last1=Hermans |first1=Felienne |last2=Schlesinger |first2=Ari |title=Proceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software |chapter=A Case for Feminism in Programming Language Design |date=2024-10-17 |language=en |publisher=ACM |pages=205–222 |doi=10.1145/3689492.3689809 |isbn=979-8-4007-1215-9}}</ref>}} for documents designed to be displayed in a [[web browser]]. It defines the content and structure of [[web content]]. It is often assisted by technologies such as [[Cascading Style Sheets]] (CSS) and [[scripting language]]s such as [[JavaScript
[[Web browser]]s receive HTML documents from a [[web server]] or from local storage and [[browser engine|render]] the documents into multimedia web pages. HTML describes the structure of a [[web page]] [[Semantic Web|semantically]] and originally included cues for its appearance.
[[HTML element]]s are the building blocks of HTML pages. With HTML constructs, [[HTML element#Images and objects|images]] and other objects such as [[Fieldset|interactive forms]] may be embedded into the rendered page. HTML provides a means to create [[structured document]]s by denoting structural [[semantics]] for text such as headings, paragraphs, lists, [[Hyperlink|links]], quotes, and other items. HTML elements are delineated by ''tags'', written using [[Bracket#Angle brackets|angle brackets]]. Tags such as {{code|lang=html|code=<img>}} and {{code|lang=html|<input>}} directly introduce content into the page. Other tags such as {{code|lang=html|code=<p>}} and {{code|lang=html|code=</p>}} surround and provide information about document text and may include sub-element tags. [[Web browser|Browsers]] do not display the HTML tags, but use them to interpret the content of the page.
HTML can embed programs written in a [[scripting language]] such as [[JavaScript]], which affects the behavior and content of web pages. The inclusion of CSS defines the look and layout of content. The [[World Wide Web Consortium]] (W3C), former maintainer of the HTML and current maintainer of the CSS standards, has encouraged the use of [[CSS]] over explicit presentational HTML {{as of|1997|lc=y|since=y|post=.}}<ref name="deprecated">{{cite web|title=HTML 4.0 Specification — W3C Recommendation — Conformance: requirements and recommendations |url=https://www.w3.org/TR/REC-html40-971218/conform.html#deprecated|date=December 18, 1997|publisher=World Wide Web Consortium|url-status=live|archive-url=https://web.archive.org/web/20150705040855/http://www.w3.org/TR/REC-html40-971218/conform.html|archive-date=July 5, 2015|access-date=July 6, 2015}}</ref> A form of HTML, known as [[HTML5]], is used to display video and audio, primarily using the {{code|lang=html|<canvas>}} element, together with JavaScript.
Line 42:
In 1980, [[physicist]] [[Tim Berners-Lee]], a contractor at [[CERN]], proposed and prototyped [[ENQUIRE]], a system for CERN researchers to use and share documents. In 1989, Berners-Lee wrote a memo proposing an [[Internet]]-based [[hypertext]] system.<ref>Tim Berners-Lee, "[https://www.w3.org/History/1989/proposal.html Information Management: A Proposal]". CERN (March 1989, May 1990). W3C.</ref> Berners-Lee specified HTML and wrote the browser and server software in late 1990. That year, Berners-Lee and CERN [[data system]]s engineer [[Robert Cailliau]] collaborated on a joint request for funding, but the project was not formally adopted by CERN. In his personal notes of 1990, Berners-Lee listed "some of the many areas in which hypertext is used"; an [[encyclopedia]] is the first entry.<ref>{{cite web |url=https://www.w3.org/DesignIssues/Uses.html |title=Intended Uses |first1=Tim |last1=Berners-Lee |website=W3C}}</ref>
The first publicly available description of HTML was a document called "HTML Tags",<ref>{{cite web |title=Tags used in HTML |url=http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html |website=info.cern.ch |access-date=2 March 2023 |date=October 1991}}</ref> first mentioned on the Internet by Tim Berners-Lee in late 1991.<ref name="tagshtml" /><ref>{{cite web|title=Re: status. Re: X11 BROWSER for WWW |url=http://lists.w3.org/Archives/Public/www-talk/1991SepOct/0003.html|last=Berners-Lee|first=Tim|date=October 29, 1991|publisher=World Wide Web Consortium|url-status=live|archive-url=https://web.archive.org/web/20070524045009/http://lists.w3.org:80/Archives/Public/www-talk/1991SepOct/0003.html|archive-date=May 24, 2007|access-date=April 8, 2007}}</ref> It describes 18 elements comprising the initial, relatively simple design of HTML. Except for the hyperlink tag, these were strongly influenced by [[
HTML is a [[markup language]] that [[web browser]]s use to interpret and [[Typesetting|compose]] text, images, and other material into visible or audible web pages. Default characteristics for every item of HTML markup are defined in the browser, and these characteristics can be altered or enhanced by the web page designer's additional use of [[CSS]]. Many of the text elements are mentioned in the 1988 ISO technical report TR 9537 ''Techniques for using SGML'', which describes the features of early text formatting languages such as that used by the [[TYPSET and RUNOFF|RUNOFF command]] developed in the early 1960s for the [[Compatible Time-Sharing System|CTSS]] (Compatible Time-Sharing System) operating system. These formatting commands were derived from the commands used by typesetters to manually format documents. However, the SGML concept of generalized markup is based on elements (nested annotated ranges with attributes) rather than merely print effects, with separate structure and markup. HTML has been progressively moved in this direction with CSS.
Line 65:
==== HTML 4 ====
:; December 18, 1997
:;*
:;* Transitional, in which deprecated elements are allowed
:;* Frameset, in which mostly only [[Framing (World Wide Web)|frame]] related elements are allowed.
:
:;April 24, 1998
: HTML 4.0<ref>{{cite web |url=https://www.w3.org/TR/1998/REC-html40-19980424/|title=HTML 4.0 Specification|publisher=World Wide Web Consortium|date=April 24, 1998|access-date=November 16, 2008}}</ref> was reissued with minor edits without incrementing the version number.
:; December 24, 1999 :; May 2000
:;
==== HTML 5 ====
Line 94 ⟶ 95:
; : Although its syntax closely resembles that of [[SGML]], [[HTML5]] has abandoned any attempt to be an SGML application and has explicitly defined its own "html" serialization, in addition to an alternative XML-based XHTML5 serialization.<ref>{{cite web|url=https://www.w3.org/blog/2008/01/html5-is-html-and-xml/|title=HTML5, one vocabulary, two serializations|date=15 January 2008 |access-date=February 25, 2009}}</ref>
; 2011 HTML5 – Last Call :
; : On 14 February 2011, the W3C extended the charter of its HTML Working Group with clear milestones for HTML5. In May 2011, the working group advanced HTML5 to "Last Call", an invitation to communities inside and outside W3C to confirm the technical soundness of the specification. The W3C developed a comprehensive test suite to achieve broad interoperability for the full specification by 2014, which was the target date for recommendation.<ref name="w3c2014">{{cite web|url=https://www.w3.org/2011/02/htmlwg-pr.html|title=W3C Confirms May 2011 for HTML5 Last Call, Targets 2014 for HTML5 Standard|publisher=[[World Wide Web Consortium]]|access-date=18 February 2011|date=14 February 2011}}</ref> In January 2011, the WHATWG renamed its "HTML5" living standard to "HTML". The W3C nevertheless
; 2012 HTML5 – Candidate Recommendation :
; : In July 2012, WHATWG and [[W3C]] decided on a degree of separation. W3C will continue the HTML5 specification work, focusing on a single definitive standard, which is considered a "snapshot" by WHATWG. The WHATWG organization will continue its work with HTML5 as a "Living Standard". The concept of a living standard is that it is never complete and is always being updated and improved. New features can be added but functionality will not be removed.<ref>{{cite web|url=http://www.netmagazine.com/news/html5-gets-splits-122102|title=HTML5 gets the splits|publisher=Net magazine |first1=Craig |last1=Grannell |date=July 23, 2012 |access-date=23 July 2012 |archive-url=https://web.archive.org/web/20120725214739/http://www.netmagazine.com/news/html5-gets-splits-122102 |url-status=dead |archive-date=Jul 25, 2012 }}</ref>
Line 146 ⟶ 147:
{{Main|HTML element}}
[[File:HTML element content categories.svg|thumb|HTML element content categories]]
HTML documents imply a structure of nested [[HTML element]]s. These are indicated in the document by HTML ''tags'', enclosed in angle brackets
In the simple, general case, the extent of an element is indicated by a pair of tags: a "start tag" {{code|lang=html|code=<p>}} and "end tag" {{code|lang=html|code=</p>}}. The text content of the element, if any, is placed between these tags.
Line 154 ⟶ 155:
The start tag may also include the element's ''attributes'' within the tag. These indicate other information, such as identifiers for sections within the document, identifiers used to bind style information to the presentation of the document, and for some tags such as the {{code|lang=html|code=<img>}} used to embed images, the reference to the image resource in the format like this: {{code|lang=html|code=<img src="example.com/example.jpg">}}
Some elements, such as the [[line breaking character|line break]] {{code|lang=html|code=<br
Many tags, particularly the closing end tag for the very commonly used paragraph element {{code|lang=html|code=<p>}}, are optional. An HTML browser or other agent can infer the closure for the end of an element from the context and the structural rules defined by the HTML standard. These rules are complex and not widely understood by most HTML authors.
The general form of an HTML element is therefore: {{code|lang=html|code=<tag attribute1="value1" attribute2="value2">''content''</tag>}}. Some HTML elements are defined as ''empty elements'' and take the form {{code|lang=html|code=<tag attribute1="value1" attribute2="value2">}}. Empty elements may enclose no content, for instance, the {{code|lang=html|code=<br
The name of an HTML element is the name used in the tags.
The end tag's name is preceded by a slash character
==== Element examples ====
Line 199:
===== Line breaks =====
{{code|lang=html|code=<br
<syntaxhighlight lang="html"><p>This <br> is a paragraph <br> with <br> line breaks</p></syntaxhighlight>
Line 208:
===== Inputs =====
There are many possible ways a user can give
<input type="text"> <!-- This is for text input -->
<input type="file"> <!-- This is for uploading files -->
Line 249:
Escaping also allows for characters that are not easily typed, or that are not available in the document's [[character encoding]], to be represented within the element and attribute content. For example, the acute-accented <code>e</code> (<code>é</code>), a character typically found only on Western European and South American keyboards, can be written in any HTML document as the entity reference <code>&eacute;</code> or as the numeric references <code>&#xE9;</code> or <code>&#233;</code>, using characters that are available on all keyboards and are supported in all character encodings. [[Unicode]] character encodings such as [[UTF-8]] are compatible with all modern browsers and allow direct access to almost all the characters of the world's writing systems.<ref>{{cite web|title=''The Unicode Standard'': A Technical Introduction |publisher=Unicode |url=https://www.unicode.org/standard/principles.html|access-date=2010-03-16}}</ref>
{| class="wikitable"
|+HTML escape sequence examples
!Named
!Decimal
Line 295:
|{{Code|code= }}
|{{Code|code= }}
|
|[[Non-breaking space|Non-Breaking Space]]
|
Line 320:
|
|-
|{{Code|code=‡
|{{Code|code=‡
|{{Code|code=‡
|{{Code|code=‡}}
| [[Dagger (mark)|Double dagger]]
Line 339:
=== Document type declaration ===
HTML documents are required to start with a [[
The original purpose of the doctype was to enable the parsing and validation of HTML documents by SGML tools based on the [[
[[HTML5]] does not define a DTD; therefore, in HTML5 the doctype declaration is simpler and shorter:<ref>{{cite web |url=https://www.w3.org/TR/html/syntax.html#doctype-syntax |access-date=2013-08-19 |title=The HTML syntax |work=HTML Standard }}</ref>
Line 363:
Semantic HTML is a way of writing HTML that emphasizes the meaning of the encoded information over its presentation (look). HTML has included semantic markup from its inception,<ref>{{cite book|last1=Berners-Lee|first1=Tim|last2=Fischetti|first2=Mark|title=Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor|url=https://archive.org/details/weavingweborigin00bern_0|url-access=registration|isbn=978-0-06-251587-2|publisher=Harper|___location=San Francisco|year=2000}}</ref> but has also included presentational markup, such as {{code|lang=html|code=<font>}}, {{code|lang=html|code=<i>}} and {{code|lang=html|code=<center>}} tags. There are also the semantically neutral [[div and span]] tags. Since the late 1990s, when [[Cascading Style Sheets]] were beginning to work in most browsers, web authors have been encouraged to avoid the use of presentational HTML markup with a view to the [[separation of content and presentation]].<ref>{{cite web|url=https://www.w3.org/MarkUp/Guide/Style.html|title=Adding a touch of style|last=Raggett|first=Dave|year=2002|publisher=W3C|access-date=October 2, 2009}} This article notes that presentational HTML markup may be useful when targeting browsers "before Netscape 4.0 and Internet Explorer 4.0". See the [[list of web browsers]] to confirm that these were both released in 1997.</ref>
In a 2001 discussion of the [[Semantic Web]], [[Tim Berners-Lee]] and others gave examples of ways in which intelligent software "agents" may one day automatically crawl the web and find, filter, and correlate previously unrelated, published facts for the benefit of human users.<ref>{{cite magazine |author=Berners-Lee |first1=Tim |last2=Hendler |first2=James |last3=Lassila |first3=Ora |date=May 1, 2001 |title=The Semantic Web |url=http://www.scientificamerican.com/article.cfm?id=the-semantic-web |magazine=Scientific American |access-date=October 2, 2009}}</ref> Such agents are not commonplace even now, but some of the ideas of [[Web 2.0]], [[Mashup (web application hybrid)|mashups]] and [[Price comparison service|price comparison websites]] may be coming close{{citation needed|date=February 2025}}. The main difference between these web application hybrids and Berners-Lee's semantic agents lies in the fact that the current [[Feed aggregator|aggregation]] and hybridization of information is usually designed by [[web developer]]s, who already know the web locations and the [[Application programming interface|API semantics]] of the specific data they wish to mash, compare and combine.
An important type of web agent that does crawl and read web pages automatically, without prior knowledge of what it might find, is the [[web crawler]] or search-engine spider. These software agents are dependent on the semantic clarity of web pages they find as they use various techniques and [[algorithm]]s to read and index millions of web pages a day and provide web users with [[Web search engine|search facilities]] without which the World Wide Web's usefulness would be greatly reduced.
Line 405:
Like HTML 4.01, XHTML 1.0 has three sub-specifications: strict, transitional, and frameset.
Aside from the different opening declarations for a document, the differences between an HTML 4.01 and XHTML 1.0 document—in each of the corresponding DTDs—are largely syntactic. The underlying syntax of HTML allows many shortcuts that XHTML does not, such as elements with optional opening or closing tags, and even empty elements which must not have an end tag. By contrast, XHTML requires all elements to have an opening tag and a closing tag. XHTML, however, also introduces a new shortcut: an XHTML tag may be opened and closed within the same tag, by including a slash before the end of the tag like this: {{code|lang=html|code=<br
To understand the subtle differences between HTML and XHTML, consider the transformation of a valid and well-formed XHTML 1.0 document that adheres to Appendix C (see below) into a valid HTML 4.01 document. Making this translation requires the following steps:
Line 413:
# If present, '''remove the XML declaration.''' (Typically this is: {{code|lang=xml|code=<?xml version="1.0" encoding="utf-8"?>}}).
# '''Ensure that the document's MIME type is set to <code>text/html</code>.''' For both HTML and XHTML, this comes from the HTTP <code>Content-Type</code> header sent by the server.
# '''Change the XML empty-element syntax to an HTML style empty element''' ({{code|lang=html|code=<br
Those are the main changes necessary to translate a document from XHTML 1.0 to HTML 4.01. To translate from HTML to XHTML would also require the addition of any omitted opening or closing tags. Whether coding in HTML or XHTML it may just be best to always include the optional tags within an HTML document rather than remembering which tags can be omitted.
Line 422:
* Include both <code>xml:lang</code> and <code>lang</code> attributes on any elements assigning language.
* Use the empty-element syntax only for elements specified as empty in HTML.
*
* Include explicit close tags for elements that permit content but are left empty (for example, {{code|lang=html|code=}}, not {{code|lang=html|code=<div />}}).
* Omit the XML declaration.
Line 435:
** Inline elements and plain text are allowed directly in: <code>body</code>, <code>blockquote</code>, <code>form</code>, <code>noscript</code> and <code>noframes</code>
* '''Presentation related elements'''
** underline (<code>u</code>) (Deprecated. can confuse a visitor with a hyperlink.)
** strike-through (<code>s</code>)
** <code>center</code> (Deprecated. use CSS instead.)
Line 476:
== WHATWG HTML versus HTML5 ==
{{Main|#Transition of HTML
The HTML Living Standard, which is developed by WHATWG, is the official version, while W3C HTML5 is no longer separate from WHATWG.
Line 483:
There are some [[WYSIWYG]] editors (''what you see is what you get''), in which the user lays out everything as it is to appear in the HTML document using a [[graphical user interface]] (GUI), often similar to [[word processor]]s. The editor renders the document rather than showing the code, so authors do not require extensive knowledge of HTML.
The WYSIWYG editing model has been criticized,<ref>Sauer, C.: WYSIWIKI – Questioning WYSIWYG in the Internet Age. In: Wikimania (2006)</ref><ref>Spiesser, J., Kitchen, L.: Optimization of HTML automatically generated by WYSIWYG programs. In: 13th International Conference on World Wide Web, pp.
WYSIWYG editors remain a controversial topic because of their perceived flaws such as:
Line 498:
* [[Comparison of HTML parsers]]
* [[Dynamic web page]]
* [[HTML Application]]▼
* [[HTML character references]]
* [[List of document markup languages]]
Line 507 ⟶ 508:
* [[W3C Markup Validation Service|W3C (X)HTML Validator]]
* [[Web colors]]
▲* [[HTML Application]]
{{div col end}}
== Notes ==
{{Notelist}}
== References ==
|