HTML sanitization: Difference between revisions

Content deleted Content added
No edit summary
m <code> on HTML tags
Line 2:
'''HTML sanitization''' is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated "safe". HTML sanitization can be used to protect against [[cross-site scripting|cross-site scripting (XSS)]] attacks by sanitizing any HTML code submitted by a user.
 
Basic tags for changing fonts are often allowed, such as <nowikicode><&lt;b>&gt;</nowikicode>, <nowikicode><&lt;i>&gt;</nowikicode>, <nowikicode><&lt;u>&gt;</nowikicode>, <nowikicode><&lt;em>&gt;</nowikicode>, and <nowikicode><&lt;strong>&gt;</nowikicode> while more advanced tags such as <nowikicode><&lt;script>&gt;</nowikicode>, <nowikicode><&lt;object>&gt;</nowikicode>, <nowikicode><&lt;embed>&gt;</nowikicode>, and <nowikicode><&lt;link>&gt;</nowikicode> are removed by the sanitization process.
 
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. An item left off a whitelist, makes the sanitization produce HTML code that lacks safe elements. If an item is left off a blacklist, a vulnerability will be present in the sanitized HTML output. New unsafe HTML features, introduced after a blacklist has been defined, causes the blacklist to become out of date.