HTML sanitization: Difference between revisions

Content deleted Content added
Clean up, typo(s) fixed: Futher → Further using AWB
Clarifying risks of a blacklist over a whitelist
Line 4:
Basic tags for changing fonts are often allowed, such as <code>&lt;b&gt;</code>, <code>&lt;i&gt;</code>, <code>&lt;u&gt;</code>, <code>&lt;em&gt;</code>, and <code>&lt;strong&gt;</code> while more advanced tags such as <code>&lt;script&gt;</code>, <code>&lt;object&gt;</code>, <code>&lt;embed&gt;</code>, and <code>&lt;link&gt;</code> are removed by the sanitization process. Also potentially dangerous attributes such as the <code>onclick</code> attribute are removed in order to prevent malicious code from being injected.
 
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. AnLeaving itema leftsafe HTML element off a whitelist, makesis thenot sanitizationso produceserious; HTMLit codesimply means that lacksthat safefeature elementswill not be included post-sanitation. IfOn the other hand, if an itemunsafe element is left off a blacklist, athen the vulnerability will not be presentsanitized inout theof sanitizedthe HTML output. NewAn unsafeout-of-date HTMLblacklist features,can introducedtherefore afterbe adangerous blacklistif hasnew, beenunsafe defined,features causeshave thebeen blacklistintroduced to becomethe outHTML of dateStandard.
 
Further sanitization can be performed based on rules which specify what operation is to be performed on the subject tags. Typical operations include removal of the tag itself while preserving the content, preserving only the textual content of a tag or forcing certain values on attributes.<ref>https://github.com/Vereyon/HtmlRuleSanitizer</ref>