Revision as of 09:50, 17 July 2024 edit Citation bot (talk \| contribs) Bots 5,866,547 edits Alter: template type, title. Add: magazine, arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar ← Previous edit		Revision as of 09:52, 17 July 2024 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,204 edits ce Next edit →
Line 18: AI systems sometimes learn undesirable tricks that do an optimal job of satisfying explicit pre-programmed goals on the training data but do not reflect the more nuanced implicit desires of the human system designers or the full complexity of the ___domain data. For example, a 2017 system tasked with [[image recognition]] learned to "cheat" by looking for a copyright tag that happened to be associated with horse pictures rather than learning how to tell if a horse was actually pictured.<ref name=guardian/> In another 2017 system, a [[supervised learning]] AI tasked with grasping items in a virtual world learned to cheat by placing its manipulator between the object and the viewer in a way such that it falsely appeared to be grasping the object.<ref>{{cite news\|title=DeepMind Has Simple Tests That Might Prevent Elon Musk's AI Apocalypse\|url=https://www.bloomberg.com/news/articles/2017-12-11/deepmind-has-simple-tests-that-might-prevent-elon-musk-s-ai-apocalypse\|access-date=30 January 2018\|work=Bloomberg.com\|date=11 December 2017\|language=en}}</ref><ref>{{cite news\|title=Learning from Human Preferences\|url=https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/\|access-date=30 January 2018\|work=OpenAI Blog\|date=13 June 2017}}</ref> One transparency project, the [[DARPA]] XAI program, aims to produce "[[glass box]]" models that are explainable to a "[[human-in-the-loop]]" without greatly sacrificing AI performance. Human users of such a system can understand the AI's cognition (both in real-time and after the fact) and can determine whether to trust the AI.<ref>{{cite web\|title=Explainable Artificial Intelligence (XAI)\|url=https://www.darpa.mil/program/explainable-artificial-intelligence\|website=DARPA\|access-date=17 July 2017}}</ref> Other applications of XAI are [[knowledge extraction]] from black-box models and model comparisons.<ref>{{cite journal\|last=Biecek\|first=Przemyslaw\|title= DALEX: explainers for complex predictive models\|journal=Journal of Machine Learning Research\|volume=19\|pages=1–5\|arxiv=1806.08915\|date=23 June 2018~~\|bibcode=2018arXiv180608915B~~}}</ref> In the context of monitoring systems for ethical and socio-legal compliance, the term "glass box" is commonly used to refer to tools that track the inputs and outputs of the system in question, and provide value-based explanations for their behavior. These tools aim to ensure that the system operates in accordance with ethical and legal standards, and that its decision-making processes are transparent and accountable. The term "glass box" is often used in contrast to "black box" systems, which lack transparency and can be more difficult to monitor and regulate.<ref>Rai, Arun. "Explainable AI: From black box to glass box." Journal of the Academy of Marketing Science 48 (2020): 137-141.</ref> The term is also used to name a voice assistant that produces counterfactual statements as explanations.<ref name="SokolFlach2018">{{cite book\|last1=Sokol\|first1=Kacper\|last2=Flach\|first2=Peter\|title=Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence\|chapter=Glass-Box: Explaining AI Decisions With Counterfactual Statements Through Conversation With a Voice-enabled Virtual Assistant\|year=2018\|pages=5868–5870\|doi=10.24963/ijcai.2018/865\|isbn=9780999241127\|s2cid=51608978}}</ref> Line 45: For images, [[Saliency map\|saliency maps]] highlight the parts of an image that most influenced the result.<ref>{{Cite web \|last=Sharma \|first=Abhishek \|date=2018-07-11 \|title=What Are Saliency Maps In Deep Learning? \|url=https://analyticsindiamag.com/what-are-saliency-maps-in-deep-learning/ \|access-date=2024-07-10 \|website=Analytics India Magazine \|language=en-US}}</ref> However, these techniques are not very suitable for [[Language model\|language models]] like [[Generative pre-trained transformer\|generative pretrained transformers]]. Since these models generate language, they can provide an explanation, but which may not be reliable. Other techniques include attention analysis (examining how the model focuses on different parts of the input), probing methods (testing what information is captured in the model's representations), causal tracing (tracing the flow of information through the model) and circuit discovery (identifying specific subnetworks responsible for certain behaviors). Explainability research in this area overlaps significantly with interpretability and [[AI alignment\|alignment]] research.<ref>{{~~Citation~~cite arxiv \|last1=Luo \|first1=Haoyan \|title=From Understanding to Utilization: A Survey on Explainability for Large Language Models \|date=2024-02-21 ~~\|url=http://arxiv.org/abs/2401.12874 \|access-date=2024-07-10~~ \|arxiv=2401.12874 \|last2=Specia \|first2=Lucia}}</ref> === Interpretability === Line 69: [[Marvin Minsky]] et al. raised the issue that AI can function as a form of surveillance, with the biases inherent in surveillance, suggesting HI (Humanistic Intelligence) as a way to create a more fair and balanced "human-in-the-loop" AI.<ref>Minsky, et al., "The Society of Intelligent Veillance" IEEE ISTAS2013, pages 13-17.</ref> Modern complex AI techniques, such as [[deep learning]] and [[genetic algorithm]]s, are naturally opaque.<ref>{{cite magazine\|last1=Mukherjee\|first1=Siddhartha\|title=A.I. Versus M.D.\|url=https://www.newyorker.com/magazine/2017/04/03/ai-versus-md\|access-date=30 January 2018\|magazine=The New Yorker\|date=27 March 2017}}</ref> To address this issue, methods have been developed to make new models more explainable and interpretable.<ref>{{Cite journal\|date=2020-07-08\|title=Interpretable neural networks based on continuous-valued logic and multicriteria decision operators\|journal=Knowledge-Based Systems\|language=en\|volume=199\|pages=105972\|doi=10.1016/j.knosys.2020.105972 \|arxiv=1910.02486 \|issn=0950-7051\|doi-access=free\|last1=Csiszár\|first1=Orsolya\|last2=Csiszár\|first2=Gábor\|last3=Dombi\|first3=József}}</ref><ref name="Lipton 31–57"/><ref name="Interpretable machine learning: def"/><ref>{{cite arXiv\|last1=Doshi-Velez\|first1=Finale\|last2=Kim\|first2=Been\|date=2017-02-27\|title=Towards A Rigorous Science of Interpretable Machine Learning\|eprint=1702.08608\|class=stat.ML}}</ref><ref>{{Cite arXiv \|last=Abdollahi, Behnoush, and Olfa Nasraoui.\|title=Explainable Restricted Boltzmann Machines for Collaborative Filtering.\|eprint=1606.07129\|class=stat.ML\|year=2016}}</ref><ref>{{Cite book\|last1=Dombi\|first1=József\|last2=Csiszár\|first2=Orsolya\|series=Studies in Fuzziness and Soft Computing \|date=2021\|title=Explainable Neural Networks Based on Fuzzy Logic and Multi-criteria Decision Tools\|url=https://link.springer.com/book/10.1007/978-3-030-72280-7\|volume=408\|language=en-gb\|doi=10.1007/978-3-030-72280-7\|isbn=978-3-030-72279-1\|s2cid=233486978\|issn=1434-9922}}</ref> This includes layerwise relevance propagation (LRP), a technique for determining which features in a particular input vector contribute most strongly to a neural network's output.<ref name="Bach Binder Montavon Klauschen p=e0130140">{{cite journal\|last1=Bach\|first1=Sebastian\|last2=Binder\|first2=Alexander\|last3=Montavon\|first3=Grégoire\|last4=Klauschen\|first4=Frederick\|last5=Müller\|first5=Klaus-Robert\|author-link5=Klaus-Robert Müller\|last6=Samek\|first6=Wojciech\|date=2015-07-10\|editor-last=Suarez\|editor-first=Oscar Deniz\|title=On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation\|journal=PLOS ONE\|volume=10\|issue=7\|page=e0130140\|bibcode=2015PLoSO..1030140B\|doi=10.1371/journal.pone.0130140\|issn=1932-6203\|pmc=4498753\|pmid=26161953\|doi-access=free}}</ref><ref>{{cite news\|url=https://www.theguardian.com/science/2017/nov/05/computer-says-no-why-making-ais-fair-accountable-and-transparent-is-crucial\|title=Computer says no: why making AIs fair, accountable and transparent is crucial\|last1=Sample\|first1=Ian\|date=5 November 2017\|work=The Guardian\|access-date=5 August 2018\|language=en}}</ref> Other techniques explain some particular prediction made by a (nonlinear) black-box model, a goal referred to as "local interpretability".<ref>{{Cite journal\|last1=Martens\|first1=David\|last2=Provost\|first2=Foster\|title=Explaining data-driven document classifications\|url=http://pages.stern.nyu.edu/~fprovost/Papers/MartensProvost_Explaining.pdf\|journal=MIS Quarterly\|year=2014\|volume=38\|pages=73–99\|doi=10.25300/MISQ/2014/38.1.04\|s2cid=14238842}}</ref><ref>{{Cite journal\|title="Why Should I Trust You?" {{!}} Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\|language=EN\|doi=10.1145/2939672.2939778\|s2cid=13029170}}</ref><ref>{{Citation\|last1=Lundberg\|first1=Scott M\|title=A Unified Approach to Interpreting Model Predictions\|date=2017\|url=http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf\|work=Advances in Neural Information Processing Systems 30\|pages=4765–4774\|editor-last=Guyon\|editor-first=I.\|publisher=Curran Associates, Inc.\|access-date=2020-03-13\|last2=Lee\|first2=Su-In\|editor2-last=Luxburg\|editor2-first=U. V.\|editor3-last=Bengio\|editor3-first=S.\|editor4-last=Wallach\|editor4-first=H.~~\|bibcode=2017arXiv170507874L~~\|arxiv=1705.07874}}</ref><ref>{{Cite journal\|last1=Carter\|first1=Brandon\|last2=Mueller\|first2=Jonas\|last3=Jain\|first3=Siddhartha\|last4=Gifford\|first4=David\|date=2019-04-11\|title=What made you do this? Understanding black-box decisions with sufficient input subsets\|url=http://proceedings.mlr.press/v89/carter19a.html\|journal=The 22nd International Conference on Artificial Intelligence and Statistics\|language=en\|pages=567–576}}</ref><ref>{{Cite journal\|last1=Shrikumar\|first1=Avanti\|last2=Greenside\|first2=Peyton\|last3=Kundaje\|first3=Anshul\|date=2017-07-17\|title=Learning Important Features Through Propagating Activation Differences\|url=http://proceedings.mlr.press/v70/shrikumar17a.html\|journal=International Conference on Machine Learning\|language=en\|pages=3145–3153}}</ref><ref>{{Cite journal\|url=https://dl.acm.org/doi/abs/10.5555/3305890.3306024\|title=Axiomatic attribution for deep networks {{!}} Proceedings of the 34th International Conference on Machine Learning - Volume 70\|website=dl.acm.org\|series=Icml'17\|date=6 August 2017\|pages=3319–3328\|language=EN\|access-date=2020-03-13}}</ref> The mere transposition of the concepts of local interpretability into a remote context (where the black-box model is executed at a third party) is {{vague\|text=currently under scrutiny\|date=April 2023}}.{{clarify\|date=April 2023}}<ref>{{Cite journal\|last1=Aivodji\|first1=Ulrich\|last2=Arai\|first2=Hiromi\|last3=Fortineau\|first3=Olivier\|last4=Gambs\|first4=Sébastien\|last5=Hara\|first5=Satoshi\|last6=Tapp\|first6=Alain\|date=2019-05-24\|title=Fairwashing: the risk of rationalization\|url=http://proceedings.mlr.press/v97/aivodji19a.html\|journal=International Conference on Machine Learning\|language=en\|publisher=PMLR\|pages=161–170\|arxiv=1901.09749}}</ref><ref>{{Cite journal\|last1=Le Merrer\|first1=Erwan\|last2=Trédan\|first2=Gilles\|date=September 2020\|title=Remote explainability faces the bouncer problem\|url=https://www.nature.com/articles/s42256-020-0216-z\|journal=Nature Machine Intelligence\|language=en\|volume=2\|issue=9\|pages=529–539\|doi=10.1038/s42256-020-0216-z\|issn=2522-5839\|arxiv=1910.01432\|s2cid=225207140}}</ref> There has been work on making glass-box models which are more transparent to inspection.<ref name=":6"/><ref>{{cite journal \|last1=Singh \|first1=Chandan \|last2=Nasseri \|first2=Keyan \|last3=Tan \|first3=Yan Shuo \|last4=Tang \|first4=Tiffany \|last5=Yu \|first5=Bin \|title=imodels: a python package for fitting interpretable models \|journal=Journal of Open Source Software \|date=4 May 2021 \|volume=6 \|issue=61 \|pages=3192 \|doi=10.21105/joss.03192 \|bibcode=2021JOSS....6.3192S \|s2cid=235529515 \|url=https://joss.theoj.org/papers/10.21105/joss.03192 \|language=en \|issn=2475-9066}}</ref> This includes [[decision tree]]s,<ref>{{Cite journal\|last1=Vidal\|first1=Thibaut\|last2=Schiffer\|first2=Maximilian\|date=2020\|title=Born-Again Tree Ensembles\|url=http://proceedings.mlr.press/v119/vidal20a.html\|journal=International Conference on Machine Learning\|language=en\|publisher=PMLR\|volume=119\|pages=9743–9753\|arxiv=2003.11132}}</ref> [[Bayesian network]]s, sparse [[linear model]]s,<ref>{{cite journal \|last1=Ustun \|first1=Berk \|last2=Rudin \|first2=Cynthia \|title=Supersparse linear integer models for optimized medical scoring systems \|journal=Machine Learning \|date=1 March 2016 \|volume=102 \|issue=3 \|pages=349–391 \|doi=10.1007/s10994-015-5528-6 \|s2cid=207211836 \|url=https://link.springer.com/article/10.1007/s10994-015-5528-6 \|language=en \|issn=1573-0565}}</ref> and more.<ref>Bostrom, N., & Yudkowsky, E. (2014). [https://intelligence.org/files/EthicsofAI.pdf The ethics of artificial intelligence]. ''The Cambridge Handbook of Artificial Intelligence'', 316-334.</ref> The [[ACM Conference on Fairness, Accountability, and Transparency\|Association for Computing Machinery Conference on Fairness, Accountability, and Transparency (ACM FAccT)]] was established in 2018 to study transparency and explainability in the context of socio-technical systems, many of which include artificial intelligence.<ref name="FAT* conference">{{cite web \| url=https://fatconference.org/ \| title=FAT* Conference }}</ref><ref>{{cite news \|title=Computer programs recognise white men better than black women \|url=https://www.economist.com/science-and-technology/2018/02/15/computer-programs-recognise-white-men-better-than-black-women \|access-date=5 August 2018 \|newspaper=The Economist \|date=2018 \|language=en}}</ref>

Explainable artificial intelligence: Difference between revisions