1.58-bit large language model: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type, url. URLs might have been anonymized. Add: date, title, class, eprint, authors 1-10. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Adding short description: "Version of a transformer large language model"
Line 1:
{{Short description|Version of a transformer large language model}}
A '''1.58-bit Large Language Model''' ('''1.58-bit LLM''', also '''ternary LLM''') is a version of a [[Transformer (deep learning architecture)|transformer]] [[large language model]] with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and [[Perplexity (LLM)|perplexity]] of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit [[FP16]] or [[BF16]]) counterparts, this design allows reaching the same [[artificial intelligence]] goals with much lower hardware requirements, latency, and training effort.{{sfn|Ma|Wang|Ma|Wang|2024|p=1}}{{sfn|Friha|Amine Ferrag|Kantarci|Cakmak|2024|p=5822}}{{sfn|Hutson|2024}}