Revision as of 09:52, 17 July 2024 edit Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,108 edits ce ← Previous edit		Revision as of 09:56, 17 July 2024 edit undo Citation bot (talk \| contribs) Bots 5,865,516 edits Altered template type. Add: class, eprint. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar Next edit →
Line 45: For images, [[Saliency map\|saliency maps]] highlight the parts of an image that most influenced the result.<ref>{{Cite web \|last=Sharma \|first=Abhishek \|date=2018-07-11 \|title=What Are Saliency Maps In Deep Learning? \|url=https://analyticsindiamag.com/what-are-saliency-maps-in-deep-learning/ \|access-date=2024-07-10 \|website=Analytics India Magazine \|language=en-US}}</ref> However, these techniques are not very suitable for [[Language model\|language models]] like [[Generative pre-trained transformer\|generative pretrained transformers]]. Since these models generate language, they can provide an explanation, but which may not be reliable. Other techniques include attention analysis (examining how the model focuses on different parts of the input), probing methods (testing what information is captured in the model's representations), causal tracing (tracing the flow of information through the model) and circuit discovery (identifying specific subnetworks responsible for certain behaviors). Explainability research in this area overlaps significantly with interpretability and [[AI alignment\|alignment]] research.<ref>{{cite ~~arxiv~~arXiv \|last1=Luo \|first1=Haoyan \|title=From Understanding to Utilization: A Survey on Explainability for Large Language Models \|date=2024-02-21 \|~~arxiv~~eprint=2401.12874 \|last2=Specia \|first2=Lucia\|class=cs.CL }}</ref> === Interpretability ===

Explainable artificial intelligence: Difference between revisions