Revision as of 19:47, 22 April 2025 edit Dimawik (talk \| contribs) Extended confirmed users 2,445 edits →BitNet: Expanding article ← Previous edit		Revision as of 19:54, 22 April 2025 edit undo Dimawik (talk \| contribs) Extended confirmed users 2,445 edits m →BitNet: ce Next edit →
Line 4: == BitNet == BitNet creators did not use the post-training quantization of weights but instead relied on the new ''BitLinear'' transform that replaced the ''nn.Linear'' layer of the traditional transformer design.{{sfn\|Wang\|Ma\|Dong\|Huang\|2023\|p=1}} In 2025, Microsoft researchers had released an [[open-weights]] and [[open inference code]] model ''BitNet b1.58 2B4T'' demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.{{sfn\|Ma\|Wang\|Huang\|Zhang\|2025\|p=}}

1.58-bit large language model: Difference between revisions