HTML sanitization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:36, 16 March 2013 edit 1.164.210.15 (talk) No edit summary ← Previous edit		Latest revision as of 10:05, 7 December 2023 edit undo Frap (talk \| contribs) Extended confirmed users, File movers, Pending changes reviewers, Rollbackers 35,592 edits No edit summary
(26 intermediate revisions by 19 users not shown)
Line 1: {{Short description\|Process of removing undesirable parts of an HTML document}} {{~~Refimprove~~More citations needed\|date=December 2009}} In [[data sanitization]], '''HTML sanitization''' is the process of examining an [[HTML]] document and producing a new HTML document that preserves only whatever tags and attributes are designated "safe" and desired. HTML sanitization can be used to protect against attacks such as [[cross-site scripting~~\|cross-site scripting~~]] (XSS)~~]] attacks~~ by sanitizing any HTML code submitted by a user. == ~~See also~~Details ==▼ Basic tags for changing fonts are often allowed, such as <nowiki><b></nowiki>, <nowiki><i></nowiki>, <nowiki><u></nowiki>, <nowiki><em></nowiki>, and <nowiki><strong></nowiki> while more advanced tags such as <nowiki><script></nowiki>, <nowiki><object></nowiki>, <nowiki><embed></nowiki>, and <nowiki><link></nowiki> are removed by the sanitization process. Basic tags for changing fonts are often allowed, such as <code><b></code>, <code><i></code>, <code><u></code>, <code><em></code>, and <code><strong></code> while more advanced tags such as <code><script></code>, <code><object></code>, <code><embed></code>, and <code><link></code> are removed by the sanitization process. Also potentially dangerous [[HTML attribute\|attributes]] such as the <code>onclick</code> attribute are removed in order to prevent malicious code from being injected. Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)\|blacklist]] approach. AnLeaving ~~item~~a ~~left~~safe HTML element off a whitelist, ~~makes~~is ~~the~~not ~~sanitization~~so ~~produce~~serious; ~~HTML~~it ~~code~~simply means that ~~lacks~~that ~~safe~~feature ~~elements~~will not be included post-sanitation. IfOn the other hand, if an ~~item~~unsafe element is left off a blacklist, athen the vulnerability will not be ~~present~~sanitized inout ~~the~~of ~~sanitized~~the HTML output. ~~New~~An ~~unsafe~~out-of-date ~~HTML~~blacklist ~~features,~~can ~~introduced~~therefore ~~after~~be adangerous ~~blacklist~~if ~~has~~new, ~~been~~unsafe ~~defined,~~features ~~causes~~have ~~the~~been ~~blacklist~~introduced to ~~become~~the ~~out~~HTML ~~of date~~Standard. Further sanitization can be performed based on rules which specify what operation is to be performed on the subject tags. Typical operations include removal of the tag itself while preserving the content, preserving only the textual content of a tag or forcing certain values on attributes.<ref name="HtmlRuleSanitizer">{{Cite web\|url=https://github.com/Vereyon/HtmlRuleSanitizer\|title = HtmlRuleSanitizer\|website = [[GitHub]]\|date = 13 August 2021}}</ref> In [[PHP]], HTML sanitization can be performed using the <code>strip_tags()</code> or <code>htmlspecialchars()</code> functions.<ref>http://www.php.net/strip_tags</ref><ref>{{cite web\|url=http://php.net/manual/en/function.htmlspecialchars.php\|title=htmlspecialchars\|publisher=PHP.NET}}</ref> The HTML Purifier library is another popular option for PHP applications.<ref>http://www.htmlpurifier.org</ref>▼ == Implementations == In [[Java (programming language)\|Java]] (and [[.NET Framework\|.NET]]), sanitization can be achieved by using the [[OWASP]] Java HTML Sanitizer Project.<ref>https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project</ref>▼ ▲In [[PHP]], HTML sanitization can be performed using the <code>strip_tags()</code> function at the risk of removing all textual content following an unclosed less-than symbol or ~~<code>htmlspecialchars()</code>~~angle ~~functions~~bracket.~~<ref>http://www.php.net/strip_tags</ref>~~<ref>{{cite web\|url=http://us3.php.net/manual/en/function.~~htmlspecialchars~~strip-tags.php\|title=~~htmlspecialchars~~strip_tags\|publisher=PHP.NET}}</ref> The HTML Purifier library is another popular option for PHP applications.<ref>{{Cite web\|url=http://~~www~~htmlpurifier.org/\|title=HTML Purifier - Filter your HTML the standards-compliant way!\|website=htmlpurifier.org}}</ref> ▲In [[Java (programming language)\|Java]] (and [[.NET Framework\|.NET]]), sanitization can be achieved by using the [[OWASP]] Java HTML Sanitizer Project.<ref>{{Cite web\|url=https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project\|title = OWASP Java HTML Sanitizer}}</ref> In [[.NET Framework\|.NET]], a number of sanitizers use the Html Agility Pack, a HTML parser.<ref>http://htmlagilitypack.codeplex.com/</ref><ref>http://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/</ref> In [[.NET Framework\|.NET]], a number of sanitizers use the Html Agility Pack, an HTML parser.<ref>{{Cite web \|url=http://htmlagilitypack.codeplex.com/ \|title=HTML Agility Pack - Home \|access-date=2013-01-04 \|archive-date=2013-01-01 \|archive-url=https://web.archive.org/web/20130101170916/http://htmlagilitypack.codeplex.com/ \|url-status=dead }}</ref><ref>{{Cite web\|url=http://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/\|title = Whitelist santize with HtmlAgilityPack\|date = 14 June 2011}}</ref><ref name="HtmlRuleSanitizer" /> Another library is HtmlSanitizer.<ref>{{cite web \|last1=Ganss \|first1=Michael \|title=HtmlSanitizer \|url=https://github.com/mganss/HtmlSanitizer/ \|access-date=7 December 2023 \|date=5 December 2023}}</ref> ▲== See also == * [[Data sanitization]] In [[JavaScript]] there are "JS-only" sanitizers for the [[front and back ends\|back end]], and browser-based<ref>{{Cite web\|url=https://github.com/jitbit/HtmlSanitizer\|title=JS HTML Sanitizer\|website=[[GitHub]]\|date=14 October 2021}}</ref> implementations that use browser's own [[Document Object Model]] (DOM) parser to parse the HTML (for better performance). == References == Line 19 ⟶ 23: [[Category:HTML]] ~~{{web-software-stub}}~~