Content deleted Content added
Citation bot (talk | contribs) Removed parameters. | Use this bot. Report bugs. | Suggested by Dominic3203 | #UCB_webform 111/199 |
grammar |
||
Line 35:
=== Explainability ===
Explainability is useful for ensuring that AI models are not
* ''Partial dependency plots'' show the marginal effect of an input feature on the predicted outcome.
Line 50:
Scholars sometimes use the term "mechanistic interpretability" to refer to the process of [[Reverse engineering|reverse-engineering]] [[artificial neural networks]] to understand their internal decision-making mechanisms and components, similar to how one might analyze a complex machine or computer program.<ref>{{Cite web |last=Olah |first=Chris |date=June 27, 2022 |title=Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases |url=https://www.transformer-circuits.pub/2022/mech-interp-essay |access-date=2024-07-10 |website=www.transformer-circuits.pub}}</ref>
Interpretability research often focuses on generative pretrained transformers. It is particularly relevant for [[AI safety]] and [[AI alignment|alignment,]]
Studying the interpretability of the most advanced [[Foundation model|foundation models]] often involves searching for an automated way to identify "features" in generative pretrained transformers. In a [[Neural network (machine learning)|neural network]], a feature is a pattern of neuron activations that corresponds to a concept. A compute-intensive technique called "[[dictionary learning]]" makes it possible to identify features to some degree. Enhancing the ability to identify and edit features is expected to significantly improve the [[AI safety|safety]] of [[Frontier model|frontier AI models]].<ref>{{Cite web |last=Ropek |first=Lucas |date=2024-05-21 |title=New Anthropic Research Sheds Light on AI's 'Black Box' |url=https://gizmodo.com/new-anthropic-research-sheds-light-on-ais-black-box-1851491333 |access-date=2024-05-23 |website=Gizmodo |language=en}}</ref><ref>{{Cite magazine |last=Perrigo |first=Billy |date=2024-05-21 |title=Artificial Intelligence Is a 'Black Box.' Maybe Not For Long |url=https://time.com/6980210/anthropic-interpretability-ai-safety-research/ |access-date=2024-05-24 |magazine=Time |language=en}}</ref>
|