Content deleted Content added
m →Sources: clean up |
Citation bot (talk | contribs) Alter: template type, url. URLs might have been anonymized. Add: date, title, class, eprint, authors 1-10. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
||
Line 15:
==Sources==
* {{cite arXiv |
* {{cite arXiv |eprint=2504.12285 |last1=Ma |first1=Shuming |last2=Wang |first2=Hongyu |last3=Huang |first3=Shaohan |last4=Zhang |first4=Xingxing |last5=Hu |first5=Ying |last6=Song |first6=Ting |last7=Xia |first7=Yan |last8=Wei |first8=Furu |title=BitNet b1.58 2B4T Technical Report |date=2025 |class=cs.CL }}
* {{cite journal |
* {{cite journal |title=1-bit LLMs Could Solve AI's Energy Demands |journal=IEEE Spectrum |date=2024-05-30 |url=https://spectrum.ieee.org/1-bit-llm |first=Matthew|last=Hutson|access-date=2025-04-22}}
* {{cite book |last=Huyen |first=Chip |title=AI Engineering |publisher="O'Reilly Media, Inc." |date=2024-12-04 |isbn=978-1-0981-6627-4 |url=https://
* {{cite arXiv |eprint=2411.04330 |last1=Kumar |first1=Tanishq |last2=Ankner |first2=Zachary |last3=Spector |first3=Benjamin F. |last4=Bordelon |first4=Blake |last5=Muennighoff |first5=Niklas |last6=Paul |first6=Mansheej |last7=Pehlevan |first7=Cengiz |last8=Ré |first8=Christopher |last9=Raghunathan |first9=Aditi |title=Scaling Laws for Precision |date=2024 |class=cs.LG }}
* {{cite web |last=Morales |first=Jowi |title=Microsoft researchers build 1-bit AI LLM with 2B parameters |website=Tom's Hardware |date=2025-04-17 |url=https://www.tomshardware.com/tech-industry/artificial-intelligence/microsoft-researchers-build-1-bit-ai-llm-with-2b-parameters-model-small-enough-to-run-on-some-cpus |access-date=2025-04-21}}
* {{cite arXiv |eprint=2411.17691 |last1=Ouyang |first1=Xu |last2=Ge |first2=Tao |last3=Hartvigsen |first3=Thomas |last4=Zhang |first4=Zhisong |last5=Mi |first5=Haitao |last6=Yu |first6=Dong |title=Low-Bit Quantization Favors Undertrained LLMS: Scaling Laws for Quantized LLMS with 100T Training Tokens |date=2024 |class=cs.LG }}
* {{cite arXiv |eprint=2310.11453 |last1=Wang |first1=Hongyu |last2=Ma |first2=Shuming |last3=Dong |first3=Li |last4=Huang |first4=Shaohan |last5=Wang |first5=Huaijie |last6=Ma |first6=Lingxiao |last7=Yang |first7=Fan |last8=Wang |first8=Ruiping |last9=Wu |first9=Yi |last10=Wei |first10=Furu |title=BitNet: Scaling 1-bit Transformers for Large Language Models |date=2023 |class=cs.CL }}
[[Category:Large language models]]
|