Revision as of 16:09, 3 June 2025 edit CompSci researcher (talk \| contribs) 2 edits I added a brief reference to exemplar-auditing-based approaches, given their continued applicability and relevance for contemporary large language models (LLMs). ← Previous edit		Revision as of 05:28, 4 June 2025 edit undo Alenoach (talk \| contribs) Extended confirmed users 5,805 edits Simplified the added content (was too complex and jargony for a general audience) Tag: Visual edit Next edit →
Line 86: Some techniques allow visualisations of the inputs to which individual [[Neuron (software)\|software neurons]] respond to most strongly. Several groups found that neurons can be aggregated into circuits that perform human-comprehensible functions, some of which reliably arise across different networks trained independently.<ref name="Circuits">{{cite journal \|last1=Olah \|first1=Chris \|last2=Cammarata \|first2=Nick \|last3=Schubert \|first3=Ludwig \|last4=Goh \|first4=Gabriel \|last5=Petrov \|first5=Michael \|last6=Carter \|first6=Shan \|title=Zoom In: An Introduction to Circuits \|journal=Distill \|date=10 March 2020 \|volume=5 \|issue=3 \|pages=e00024.001 \|doi=10.23915/distill.00024.001 \|url=https://distill.pub/2020/circuits/zoom-in/ \|language=en \|issn=2476-0757\|doi-access=free }}</ref><ref>{{cite journal \|last1=Li \|first1=Yixuan \|last2=Yosinski \|first2=Jason \|last3=Clune \|first3=Jeff \|last4=Lipson \|first4=Hod \|last5=Hopcroft \|first5=John \|title=Convergent Learning: Do different neural networks learn the same representations? \|journal=Feature Extraction: Modern Questions and Challenges \|date=8 December 2015 \|pages=196–212 \|url=http://proceedings.mlr.press/v44/li15convergent.html \|publisher=PMLR \|language=en}}</ref> There are various techniques to extract compressed representations of the features of given inputs, which can then be analysed by standard [[Cluster analysis\|clustering techniques]]. Alternatively, networks can be trained to output linguistic explanations of their behaviour, which are then directly human-interpretable.<ref>{{cite book \|last1=Hendricks \|first1=Lisa Anne \|last2=Akata \|first2=Zeynep \|last3=Rohrbach \|first3=Marcus \|last4=Donahue \|first4=Jeff \|last5=Schiele \|first5=Bernt \|last6=Darrell \|first6=Trevor \|title=Computer Vision – ECCV 2016 \|chapter=Generating Visual Explanations \|series=Lecture Notes in Computer Science \|date=2016 \|volume=9908 \|pages=3–19 \|doi=10.1007/978-3-319-46493-0_1 \|chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-46493-0_1 \|publisher=Springer International Publishing \|language=en\|arxiv=1603.08507 \|isbn=978-3-319-46492-3 \|s2cid=12030503 }}</ref> Model behaviour can also be explained with reference to training data—for example, by evaluating which training inputs influenced a given behaviour the most,<ref>{{cite journal \|last1=Koh \|first1=Pang Wei \|last2=Liang \|first2=Percy \|title=Understanding Black-box Predictions via Influence Functions \|journal=International Conference on Machine Learning \|date=17 July 2017 \|pages=1885–1894 \|url=http://proceedings.mlr.press/v70/koh17a.html \|publisher=PMLR \|arxiv=1703.04730 \|language=en}}</ref> or by ~~constructing~~approximating aits ~~metric-learner~~predictions ~~approximation of a model's output as a function of~~using the ~~representation space~~most ~~and~~similar ~~labels~~instances ~~over~~from the training ~~set~~data.<ref>{{cite journal \|last1=Schmaltz \|first1=Allen \|title=Detecting Local Insights from Global Labels: Supervised & Zero-Shot Sequence Labeling via a Convolutional Decomposition \|journal=Computational Linguistics \|date=23 December 2021 \|volume=47 \|issue=4 \|pages=729-773 \|doi=10.1162/coli_a_00416 \|url=https://doi.org/10.1162/coli_a_00416}}</ref> The use of explainable artificial intelligence (XAI) in pain research, specifically in understanding the role of electrodermal activity for [[Automated Pain Recognition\|automated pain recognition]]: hand-crafted features and deep learning models in pain recognition, highlighting the insights that simple hand-crafted features can yield comparative performances to deep learning models and that both traditional feature engineering and deep feature learning approaches rely on simple characteristics of the input time-series data.<ref>{{Cite journal \|last1=Gouverneur \|first1=Philip \|last2=Li \|first2=Frédéric \|last3=Shirahama \|first3=Kimiaki \|last4=Luebke \|first4=Luisa \|last5=Adamczyk \|first5=Wacław M. \|last6=Szikszay \|first6=Tibor M. \|last7=Luedtke \|first7=Kerstin \|last8=Grzegorzek \|first8=Marcin \|date=2023-02-09 \|title=Explainable Artificial Intelligence (XAI) in Pain Research: Understanding the Role of Electrodermal Activity for Automated Pain Recognition \|journal=Sensors \|language=en \|volume=23 \|issue=4 \|pages=1959 \|doi=10.3390/s23041959 \|issn=1424-8220 \|pmc=9960387 \|pmid=36850556 \|bibcode=2023Senso..23.1959G \|doi-access=free }}</ref>

Explainable artificial intelligence: Difference between revisions