Revision as of 07:37, 22 April 2025 edit Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →Sources: added source ← Previous edit		Revision as of 07:42, 22 April 2025 edit undo Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →top: Expanding article Next edit →
Line 5: In 2025, Microsoft researchers had released an [[open-weights]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn\|Ma\|Wang\|Huang\|Zhang\|2025\|p=}} == Critique == Some researchers{{sfn\|Ouyang\|Ge\|Hartvigsen\|Zhang\|2024\|p=}} point out that the scaling laws of large language models favor the low-bit weights only in case of undertrained models (as the number of training tokens increase, the deficiencies of low-bit quantization surface). ==References==

1.58-bit large language model: Difference between revisions