1.58-bit large language model: Difference between revisions

Content deleted Content added
Sources: added source
top: Expanding article
Line 5:
 
In 2025, Microsoft researchers had released an [[open-weights]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn|Ma|Wang|Huang|Zhang|2025|p=}}
 
== Critique ==
Some researchers{{sfn|Ouyang|Ge|Hartvigsen|Zhang|2024|p=}} point out that the scaling laws of large language models favor the low-bit weights only in case of undertrained models (as the number of training tokens increase, the deficiencies of low-bit quantization surface).
 
==References==