HTML sanitization: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: date, title. Changed bare reference to CS1/2. | Use this bot. Report bugs. | Suggested by BrownHairedGirl | Linked from User:BrownHairedGirl/Articles_with_bare_links | #UCB_webform_linked 594/2197
Citation bot (talk | contribs)
Add: website. | Use this bot. Report bugs. | Suggested by BrownHairedGirl | Linked from User:BrownHairedGirl/Articles_with_bare_links | #UCB_webform_linked 778/2189
Line 6:
Sanitization is typically performed by using either a [[whitelist]] or a [[Blacklist (computing)|blacklist]] approach. Leaving a safe HTML element off a whitelist is not so serious; it simply means that that feature will not be included post-sanitation. On the other hand, if an unsafe element is left off a blacklist, then the vulnerability will not be sanitized out of the HTML output. An out-of-date blacklist can therefore be dangerous if new, unsafe features have been introduced to the HTML Standard.
 
Further sanitization can be performed based on rules which specify what operation is to be performed on the subject tags. Typical operations include removal of the tag itself while preserving the content, preserving only the textual content of a tag or forcing certain values on attributes.<ref name="HtmlRuleSanitizer">{{Cite web|url=https://github.com/Vereyon/HtmlRuleSanitizer|title = HtmlRuleSanitizer|website = [[GitHub]]|date = 13 August 2021}}</ref>
 
== Implementations ==
Line 16:
In [[.NET Framework|.NET]], a number of sanitizers use the Html Agility Pack, an HTML parser.<ref>http://htmlagilitypack.codeplex.com/</ref><ref>{{Cite web|url=http://eksith.wordpress.com/2011/06/14/whitelist-santize-htmlagilitypack/|title = Whitelist santize with HtmlAgilityPack|date = 14 June 2011}}</ref><ref name="HtmlRuleSanitizer" />
 
In [[JavaScript]] there are "JS-only" sanitizers for the [[Front_and_back_ends|back end]], and browser-based<ref>{{Cite web|url=https://github.com/jitbit/HtmlSanitizer|title=JS HTML Sanitizer|website=[[GitHub]]|date=14 October 2021}}</ref> implementations that use browser's own DOM parser to parse the HTML (for better performance).
 
== See also ==