HTML sanitization: Difference between revisions

Content deleted Content added
this does not prevent SQL injection at all
Expanded on allowed tags, introduced blacklists and whitelists, added examples of sanitization libraries
Line 1:
{{Orphan|date=December 2009}}
{{Refimprove|date=December 2009}}
'''HTML sanitization''' is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe". HTML sanitization can be used to protect against [[cross-site scripting|cross-site scripting (XSS)]] attacks by sanitizing any HTML code submitted by a user.
 
TagsBasic tags for changing fonts are often allowed, aresuch as <nowiki><b></nowiki>, <nowiki><i></nowiki>, <nowiki><u></nowiki>, <nowiki><em></nowiki>, and <nowiki><strong></nowiki> while more advanced tags such as <nowiki><script></nowiki>, <nowiki><object></nowiki>, <nowiki><embed></nowiki>, and <nowiki><link></nowiki> are removed by the sanitization process.
 
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. An item left off a whitelist, makes the sanitization produce HTML code that lacks safe elements. If an item is left off a blacklist, a vulnerability will be present in the sanitized HTML output. New unsafe HTML features, introduced after a blacklist has been defined, causes the blacklist to become out of date.
In [[PHP]] this can be performed using the <code>strip_tags()</code> or <code>htmlspecialchars()</code> functions.<ref>http://www.php.net/strip_tags</ref><ref>http://php.net/manual/en/function.htmlspecialchars.php</ref>
 
In [[PHP]], thisHTML sanitization can be performed using the <code>strip_tags()</code> or <code>htmlspecialchars()</code> functions.<ref>http://www.php.net/strip_tags</ref><ref>http://php.net/manual/en/function.htmlspecialchars.php</ref> The HTML Purifier library is another popular option for PHP applications.<ref>http://www.htmlpurifier.org</ref>
In [[Java (programming language)|Java]] this can be achieved by using [[OWASP]] Java HTML Sanitizer Project <ref>https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project</ref>
 
In [[Java (programming language)|Java]] this(and [[.NET Framework|.NET]]), sanitization can be achieved by using the [[OWASP]] Java HTML Sanitizer Project .<ref>https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project</ref>
 
In [[.NET Framework|.NET]], a number of sanitizers use the Html Agility Pack, a HTML parser.<ref>http://htmlagilitypack.codeplex.com/</ref><ref>http://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/</ref>
 
== See also ==
* [[Data sanitization]]
 
== References ==