1.58-bit large language model: Difference between revisions

Content deleted Content added
ce
Line 25:
* {{cite arXiv |eprint=2411.17691 |last1=Ouyang |first1=Xu |last2=Ge |first2=Tao |last3=Hartvigsen |first3=Thomas |last4=Zhang |first4=Zhisong |last5=Mi |first5=Haitao |last6=Yu |first6=Dong |title=Low-Bit Quantization Favors Undertrained LLMS: Scaling Laws for Quantized LLMS with 100T Training Tokens |date=2024 |class=cs.LG }}
* {{cite arXiv |eprint=2310.11453 |last1=Wang |first1=Hongyu |last2=Ma |first2=Shuming |last3=Dong |first3=Li |last4=Huang |first4=Shaohan |last5=Wang |first5=Huaijie |last6=Ma |first6=Lingxiao |last7=Yang |first7=Fan |last8=Wang |first8=Ruiping |last9=Wu |first9=Yi |last10=Wei |first10=Furu |title=BitNet: Scaling 1-bit Transformers for Large Language Models |date=2023 |class=cs.CL }}
 
{{Generative AI}}
 
[[Category:Large language models]]