Revision as of 08:05, 22 April 2025 edit Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →top: Copyedit (minor) ← Previous edit		Revision as of 19:46, 22 April 2025 edit undo Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →top: Expanding article Tag: harv or sfn error Next edit →
Line 2: The name comes from a fact that a single [[Ternary numeral system\|trit]], a [[ternary arithmetic]] equivalent of a bit that can take the {-1, 0, 1} values, carries <math>log_2 3 \approx 1.58</math> [[bits of information]]. The 1.58-bit LLM models are also called '''1-bit LLMs'''{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Morales\|2025}} (the true 1-bit models also exist). == BitNet == BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design.{{sfn\|Wang\|Ma\|Dong\|Huang\|2023\|p=1}} In 2025, Microsoft researchers had released an [[open-weights]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn\|Ma\|Wang\|Huang\|Zhang\|2025\|p=}}

1.58-bit large language model: Difference between revisions