Revision as of 19:18, 22 August 2017 edit 128.187.112.6 (talk) Clarifying risks of a blacklist over a whitelist ← Previous edit		Revision as of 07:28, 23 December 2017 edit undo 203.109.83.130 (talk) No edit summary Next edit →
Line 1: {{Refimprove\|date=December 2009}} '''HTML sanitization ''' is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe" and desired. HTML sanitization can be used to protect against [[cross-site scripting\|cross-site scripting (XSS)]] attacks by sanitizing any HTML code submitted by a user. Basic tags for changing fonts are often allowed, such as <code><b></code>, <code><i></code>, <code><u></code>, <code><em></code>, and <code><strong></code> while more advanced tags such as <code><script></code>, <code><object></code>, <code><embed></code>, and <code><link></code> are removed by the sanitization process. Also potentially dangerous attributes such as the < code>onclick</code> attribute are removed in order to prevent malicious code from being injected. Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)\|blacklist]] approach. Leaving a safe HTML element off a whitelist is not so serious; it simply means that that feature will not be included post-sanitation. On the other hand, if an unsafe element is left off a blacklist, then the vulnerability will not be sanitized out of the HTML output. An out-of-date blacklist can therefore be dangerous if new, unsafe features have been introduced to the HTML Standard.

HTML sanitization: Difference between revisions