Revision as of 07:17, 22 April 2025 edit Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →Sources: added source ← Previous edit		Revision as of 07:22, 22 April 2025 edit undo Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →top: Expanding article Next edit →
Line 1: {{in use}} A '''1.58-bit Large Language Model''' ('''1.58-bit LLM''') is a version of a [[Transformer (deep learning architecture)\|transformer]] [[large language model]] with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and [[Perplexity (LLM)\|perplexity]] of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4 GB), are close to their "full precision" (16-bit [[FP16]] or [[BF16]]) counterparts, this design allows reaching the same [[artificial intelligence]] goals with much lower hardware requirements, latency, and training effort.{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Friha\|Amine Ferrag\|Kantarci\|Cakmak\|2024\|p=5822}} The name comes from a fact that a single [[Ternary numeral system\|trit]], a [[ternary arithmetic]] equivalent of a bit that can take the {-1, 0, 1} values, carries <math>log_2 3 \approx 1.58</math> [[bits of information]]. The 1.58-bit LLM models are also called '''1-bit LLMs'''.{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Morales\|2025}} In 2025, Microsoft researchers had released an [[open-weights]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn\|Ma\|Wang\|Huang\|Zhang\|2025\|p=}} ==References==

1.58-bit large language model: Difference between revisions