Explainable artificial intelligence: Difference between revisions

Content deleted Content added
ce
Citation bot (talk | contribs)
Altered template type. Add: class, eprint. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Line 45:
For images, [[Saliency map|saliency maps]] highlight the parts of an image that most influenced the result.<ref>{{Cite web |last=Sharma |first=Abhishek |date=2018-07-11 |title=What Are Saliency Maps In Deep Learning? |url=https://analyticsindiamag.com/what-are-saliency-maps-in-deep-learning/ |access-date=2024-07-10 |website=Analytics India Magazine |language=en-US}}</ref>
 
However, these techniques are not very suitable for [[Language model|language models]] like [[Generative pre-trained transformer|generative pretrained transformers]]. Since these models generate language, they can provide an explanation, but which may not be reliable. Other techniques include attention analysis (examining how the model focuses on different parts of the input), probing methods (testing what information is captured in the model's representations), causal tracing (tracing the flow of information through the model) and circuit discovery (identifying specific subnetworks responsible for certain behaviors). Explainability research in this area overlaps significantly with interpretability and [[AI alignment|alignment]] research.<ref>{{cite arxivarXiv |last1=Luo |first1=Haoyan |title=From Understanding to Utilization: A Survey on Explainability for Large Language Models |date=2024-02-21 |arxiveprint=2401.12874 |last2=Specia |first2=Lucia|class=cs.CL }}</ref>
 
=== Interpretability ===