Revision as of 14:29, 10 July 2025 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,944 edits Fine-tuning LLMs to 1.58bit: extreme quantization made eas ← Previous edit		Latest revision as of 01:58, 28 July 2025 edit undo TokenByToken (talk \| contribs) Extended confirmed users 1,392 edits revise lead Tag: Visual edit
Line 1: {{Short description\|Large language model with ternary weights}} A '''1.58-bit Large Language Model''' ('''1.58-bit LLM''', also '''ternary LLM''') is a version of a [[Transformer (deep learning architecture)\|transformer]] [[large language model]] with [[Neural_network#In_machine_learning\|weights]] using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and [[Perplexity (LLM)\|perplexity]] of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit [[FP16]] or [[BF16]]) counterparts, this design allows reaching the same [[artificial intelligence]] goals with much lower hardware requirements, latency, and training effort.{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Friha\|Amine Ferrag\|Kantarci\|Cakmak\|2024\|p=5822}}{{sfn\|Hutson\|2024}} The name comes from a fact that a single [[Ternary numeral system\|trit]], a [[ternary arithmetic]] equivalent of a bit that can take the {-1, 0, 1} values, carries <math>log_2 3 \approx 1.58</math> [[bits of information]]. The 1.58-bit LLM models are also called '''1-bit LLMs'''{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Morales\|2025}} (true 1-bit models with 2 possible values also exist). A '''1.58-bit large language model''' (also known as a '''ternary LLM''') is a type of [[large language model]] (LLM) designed to be computationally efficient. It achieves this by using [[neural network#in machine learning\|weights]] that are restricted to only three values: -1, 0, and +1. This restriction significantly reduces the model's memory footprint and allows for faster processing, as complex multiplication operations can be replaced with simpler additions. This contrasts with traditional models that use 16-bit floating-point numbers ([[FP16]] or [[BF16]]) for their weights. Studies have shown that for models up to several billion parameters, the performance of 1.58-bit LLMs on various tasks is comparable to their full-precision counterparts.{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Hutson\|2024}} This approach could enable powerful AI to run on less specialized and lower-power hardware.{{sfn\|Friha\|Amine Ferrag\|Kantarci\|Cakmak\|2024\|p=5822}} The name "1.58-bit" comes from the fact that a system with three states contains <math>\log_2 3 \approx 1.58</math> [[bit\|bits]] of [[information theory\|information]]. These models are sometimes also referred to as '''1-bit LLMs''' in research papers, although this term can also refer to true binary models (with weights of -1 and +1).{{sfn\|Ma\|Wang\|Ma\|Wang\|2024\|p=1}}{{sfn\|Morales\|2025}} == BitNet == {{redirect\|BitNet\|a computer network\|BITNET}}

1.58-bit large language model: Difference between revisions