Content deleted Content added
Split article in general description and implementations, elaborated on the process. |
Yamaguchi先生 (talk | contribs) Clean up, typo(s) fixed: Futher → Further using AWB |
||
Line 2:
'''HTML sanitization''' is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe" and desired. HTML sanitization can be used to protect against [[cross-site scripting|cross-site scripting (XSS)]] attacks by sanitizing any HTML code submitted by a user.
Basic tags for changing fonts are often allowed, such as <code><b></code>, <code><i></code>, <code><u></code>, <code><em></code>, and <code><strong></code> while more advanced tags such as <code><script></code>, <code><object></code>, <code><embed></code>, and <code><link></code> are removed by the sanitization process. Also potentially dangerous attributes such as the <code>onclick</code> attribute are removed in order to prevent malicious code from being injected.
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. An item left off a whitelist, makes the sanitization produce HTML code that lacks safe elements. If an item is left off a blacklist, a vulnerability will be present in the sanitized HTML output. New unsafe HTML features, introduced after a blacklist has been defined, causes the blacklist to become out of date.
== Implementations ==
|