Content deleted Content added
I added a brief reference to exemplar-auditing-based approaches, given their continued applicability and relevance for contemporary large language models (LLMs). |
Simplified the added content (was too complex and jargony for a general audience) |
||
Line 86:
Some techniques allow visualisations of the inputs to which individual [[Neuron (software)|software neurons]] respond to most strongly. Several groups found that neurons can be aggregated into circuits that perform human-comprehensible functions, some of which reliably arise across different networks trained independently.<ref name="Circuits">{{cite journal |last1=Olah |first1=Chris |last2=Cammarata |first2=Nick |last3=Schubert |first3=Ludwig |last4=Goh |first4=Gabriel |last5=Petrov |first5=Michael |last6=Carter |first6=Shan |title=Zoom In: An Introduction to Circuits |journal=Distill |date=10 March 2020 |volume=5 |issue=3 |pages=e00024.001 |doi=10.23915/distill.00024.001 |url=https://distill.pub/2020/circuits/zoom-in/ |language=en |issn=2476-0757|doi-access=free }}</ref><ref>{{cite journal |last1=Li |first1=Yixuan |last2=Yosinski |first2=Jason |last3=Clune |first3=Jeff |last4=Lipson |first4=Hod |last5=Hopcroft |first5=John |title=Convergent Learning: Do different neural networks learn the same representations? |journal=Feature Extraction: Modern Questions and Challenges |date=8 December 2015 |pages=196–212 |url=http://proceedings.mlr.press/v44/li15convergent.html |publisher=PMLR |language=en}}</ref>
There are various techniques to extract compressed representations of the features of given inputs, which can then be analysed by standard [[Cluster analysis|clustering techniques]]. Alternatively, networks can be trained to output linguistic explanations of their behaviour, which are then directly human-interpretable.<ref>{{cite book |last1=Hendricks |first1=Lisa Anne |last2=Akata |first2=Zeynep |last3=Rohrbach |first3=Marcus |last4=Donahue |first4=Jeff |last5=Schiele |first5=Bernt |last6=Darrell |first6=Trevor |title=Computer Vision – ECCV 2016 |chapter=Generating Visual Explanations |series=Lecture Notes in Computer Science |date=2016 |volume=9908 |pages=3–19 |doi=10.1007/978-3-319-46493-0_1 |chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-46493-0_1 |publisher=Springer International Publishing |language=en|arxiv=1603.08507 |isbn=978-3-319-46492-3 |s2cid=12030503 }}</ref> Model behaviour can also be explained with reference to training data—for example, by evaluating which training inputs influenced a given behaviour the most,<ref>{{cite journal |last1=Koh |first1=Pang Wei |last2=Liang |first2=Percy |title=Understanding Black-box Predictions via Influence Functions |journal=International Conference on Machine Learning |date=17 July 2017 |pages=1885–1894 |url=http://proceedings.mlr.press/v70/koh17a.html |publisher=PMLR |arxiv=1703.04730 |language=en}}</ref> or by
The use of explainable artificial intelligence (XAI) in pain research, specifically in understanding the role of electrodermal activity for [[Automated Pain Recognition|automated pain recognition]]: hand-crafted features and deep learning models in pain recognition, highlighting the insights that simple hand-crafted features can yield comparative performances to deep learning models and that both traditional feature engineering and deep feature learning approaches rely on simple characteristics of the input time-series data.<ref>{{Cite journal |last1=Gouverneur |first1=Philip |last2=Li |first2=Frédéric |last3=Shirahama |first3=Kimiaki |last4=Luebke |first4=Luisa |last5=Adamczyk |first5=Wacław M. |last6=Szikszay |first6=Tibor M. |last7=Luedtke |first7=Kerstin |last8=Grzegorzek |first8=Marcin |date=2023-02-09 |title=Explainable Artificial Intelligence (XAI) in Pain Research: Understanding the Role of Electrodermal Activity for Automated Pain Recognition |journal=Sensors |language=en |volume=23 |issue=4 |pages=1959 |doi=10.3390/s23041959 |issn=1424-8220 |pmc=9960387 |pmid=36850556 |bibcode=2023Senso..23.1959G |doi-access=free }}</ref>
|