Revision as of 11:50, 8 May 2025 edit Dimawik (talk \| contribs) Extended confirmed users 2,445 edits ce ← Previous edit		Revision as of 18:44, 29 May 2025 edit undo Unis Tankos (talk \| contribs) Extended confirmed users 1,408 edits →Sources Next edit →
Line 25: * {{cite arXiv \|eprint=2411.17691 \|last1=Ouyang \|first1=Xu \|last2=Ge \|first2=Tao \|last3=Hartvigsen \|first3=Thomas \|last4=Zhang \|first4=Zhisong \|last5=Mi \|first5=Haitao \|last6=Yu \|first6=Dong \|title=Low-Bit Quantization Favors Undertrained LLMS: Scaling Laws for Quantized LLMS with 100T Training Tokens \|date=2024 \|class=cs.LG }} * {{cite arXiv \|eprint=2310.11453 \|last1=Wang \|first1=Hongyu \|last2=Ma \|first2=Shuming \|last3=Dong \|first3=Li \|last4=Huang \|first4=Shaohan \|last5=Wang \|first5=Huaijie \|last6=Ma \|first6=Lingxiao \|last7=Yang \|first7=Fan \|last8=Wang \|first8=Ruiping \|last9=Wu \|first9=Yi \|last10=Wei \|first10=Furu \|title=BitNet: Scaling 1-bit Transformers for Large Language Models \|date=2023 \|class=cs.CL }} {{Generative AI}} [[Category:Large language models]]

1.58-bit large language model: Difference between revisions