HTML sanitization: Difference between revisions

Content deleted Content added
{{Dead link}} tagged 1 bare URL ref to 1 dead website: http://htmlagilitypack.codeplex.com
m linking
Line 4:
 
== Details ==
Basic tags for changing fonts are often allowed, such as <code>&lt;b&gt;</code>, <code>&lt;i&gt;</code>, <code>&lt;u&gt;</code>, <code>&lt;em&gt;</code>, and <code>&lt;strong&gt;</code> while more advanced tags such as <code>&lt;script&gt;</code>, <code>&lt;object&gt;</code>, <code>&lt;embed&gt;</code>, and <code>&lt;link&gt;</code> are removed by the sanitization process. Also potentially dangerous [[HTML attribute|attributes]] such as the <code>onclick</code> attribute are removed in order to prevent malicious code from being injected.
 
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. Leaving a safe HTML element off a whitelist is not so serious; it simply means that that feature will not be included post-sanitation. On the other hand, if an unsafe element is left off a blacklist, then the vulnerability will not be sanitized out of the HTML output. An out-of-date blacklist can therefore be dangerous if new, unsafe features have been introduced to the HTML Standard.