1.58-bit large language model: Difference between revisions

Content deleted Content added
Sources: added source
m clean up
Line 1:
A '''1.58-bit Large Language Model''' ('''1.58-bit LLM''', also '''ternary LLM''') is a version of a [[Transformer (deep learning architecture)|transformer]] [[large language model]] with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and [[Perplexity (LLM)|perplexity]] of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit [[FP16]] or [[BF16]]) counterparts, this design allows reaching the same [[artificial intelligence]] goals with much lower hardware requirements, latency, and training effort.{{sfn|Ma|Wang|Ma|Wang|2024|p=1}}{{sfn|Friha|Amine Ferrag|Kantarci|Cakmak|2024|p=5822}}{{sfn|Hutson|2024}}
 
The name comes from a fact that a single [[Ternary numeral system|trit]], a [[ternary arithmetic]] equivalent of a bit that can take the {-1, 0, 1} values, carries <math>log_2 3 \approx 1.58</math> [[bits of information]]. The 1.58-bit LLM models are also called '''1-bit LLMs'''{{sfn|Ma|Wang|Ma|Wang|2024|p=1}}{{sfn|Morales|2025}} (the true 1-bit models also exist).
 
== BitNet ==
In 2024, Ma et al., researchers at [[Microsoft]] declared that their 1.58-bit model, '''''BitNet''' b1.58'' is comparable in performance to the 16-bit [[Llama 2]] and opens the era of 1-bit LLM.{{sfn|Huyen|2024|p=330}} BitNet creators did not use the post-training quantization of weights but instead relied on the new ''BitLinear'' transform that replaced the ''nn.Linear'' layer of the traditional transformer design.{{sfn|Wang|Ma|Dong|Huang|2023|p=1}}
 
In 2025, Microsoft researchers had released an [[open-weights]] and [[open inference code]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn|Ma|Wang|Huang|Zhang|2025|p=}}
Line 16:
==Sources==
* {{cite arXiv |last=Ma |first=Shuming |last2=Wang |first2=Hongyu |last3=Ma |first3=Lingxiao |last4=Wang |first4=Lei |last5=Wang |first5=Wenhui |last6=Huang |first6=Shaohan |last7=Dong |first7=Li |last8=Wang |first8=Ruiping |last9=Xue |first9=Jilong |last10=Wei |first10=Furu |title=The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits |arxiv=2402.17764 |date=2024-02-27 }}
* {{citecitation |last=Ma |first=Shuming |last2=Wang |first2=Hongyu |last3=Huang |first3=Shaohan |last4=Zhang |first4=Xingxing |last5=Hu |first5=Ying |last6=Song |first6=Ting |last7=Xia |first7=Yan |last8=Wei |first8=Furu |title=BitNet b1.58 2B4T Technical Report |date=2025 |doi=10.48550/ARXIV.2504.12285 |url=https://arxiv.org/abs/2504.12285 |access-date=2025-04-22}}
* {{cite journal |last=Friha |first=Othmane |last2=Amine Ferrag |first2=Mohamed |last3=Kantarci |first3=Burak |last4=Cakmak |first4=Burak |last5=Ozgun |first5=Arda |last6=Ghoualmi-Zine |first6=Nassira |title=LLM-Based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness |journal=IEEE Open Journal of the Communications Society |volume=5 |date=2024 |issn=2644-125X |doi=10.1109/OJCOMS.2024.3456549 |doi-access=free |pages=5799–5856}}
* {{cite journal |title=1-bit LLMs Could Solve AI’sAI's Energy Demands |journal=IEEE Spectrum |date=2024-05-30 |url=https://spectrum.ieee.org/1-bit-llm |first=Matthew|last=Hutson|access-date=2025-04-22}}
* {{cite book |last=Huyen |first=Chip |title=AI Engineering |publisher="O'Reilly Media, Inc." |date=2024-12-04 |isbn=978-1-0981-6627-4 |url=https://www.google.com/books/edition/AI_Engineering/S7M1EQAAQBAJ?hl=en&gbpv=1&pg=PA330 |access-date=2025-04-22}}
* {{citecitation |last=Kumar |first=Tanishq |last2=Ankner |first2=Zachary |last3=Spector |first3=Benjamin F. |last4=Bordelon |first4=Blake |last5=Muennighoff |first5=Niklas |last6=Paul |first6=Mansheej |last7=Pehlevan |first7=Cengiz |last8=Ré |first8=Christopher |last9=Raghunathan |first9=Aditi |title=Scaling Laws for Precision |date=2024 |doi=10.48550/ARXIV.2411.04330 |doi-access=free |url=http://arxiv.org/pdf/2411.04330 |access-date=2025-04-22}}
* {{cite web |last=Morales |first=Jowi |title=Microsoft researchers build 1-bit AI LLM with 2B parameters |website=Tom's Hardware |date=2025-04-17 |url=https://www.tomshardware.com/tech-industry/artificial-intelligence/microsoft-researchers-build-1-bit-ai-llm-with-2b-parameters-model-small-enough-to-run-on-some-cpus |access-date=2025-04-21}}
* {{citecitation |last=Ouyang |first=Xu |last2=Ge |first2=Tao |last3=Hartvigsen |first3=Thomas |last4=Zhang |first4=Zhisong |last5=Mi |first5=Haitao |last6=Yu |first6=Dong |title=Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens |date=2024 |doi=10.48550/ARXIV.2411.17691 |doi-access=free |url=http://arxiv.org/pdf/2411.17691 |access-date=2025-04-22}}
* {{citecitation |last=Wang |first=Hongyu |last2=Ma |first2=Shuming |last3=Dong |first3=Li |last4=Huang |first4=Shaohan |last5=Wang |first5=Huaijie |last6=Ma |first6=Lingxiao |last7=Yang |first7=Fan |last8=Wang |first8=Ruiping |last9=Wu |first9=Yi |last10=Wei |first10=Furu |title=BitNet: Scaling 1-bit Transformers for Large Language Models |date=2023 |doi=10.48550/ARXIV.2310.11453 |doi-access=free |url=https://arxiv.org/abs/2310.11453 |access-date=2025-04-23}}
 
[[Category:Large language models]]
 
 
{{ai-stub}}