Large language model: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered template type. Add: class, date, title, eprint, authors 1-4. Removed proxy/dead URL that duplicated identifier. Removed access-date with no URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Artem.G | #UCB_webform
m spelling
Line 63:
In order to find out which tokens are relevant to each other within the scope of the context window, the attention mechanism calculates "soft" weights for each token, more precisely for its embedding, by using multiple attention heads, each with its own "relevance" for calculating its own soft weights. For example, the small (i.e. 117M parameter sized) [[GPT-2]] model, has had twelve attention heads and a context window of only 1k token.<ref name="Jay_Allamar_GPT2">{{Cite web | last=Allamar | first=Jay | title=The Illustrated GPT-2 (Visualizing Transformer Language Models) |url=https://jalammar.github.io/illustrated-gpt2/ |access-date=2023-08-01 |language=en}}</ref> In its medium version it has 345M parameters and contains 24 layers, each with 12 attention heads. For the training with gradient descent a batch size of 512 was utilized.<ref name="2022Book_"/>
 
The largest models, such as Google's [[Gemini (language model)|Gemini 1.5]], presented in February 2024, can have a context window sized up to 1 million (context window of 10 million was also "succesfullysuccessfully tested").<ref>{{cite web |title=Our next-generation model: Gemini 1.5 |url=https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window |website=Google |access-date=18 February 2024 |language=en-us |date=15 February 2024}}</ref> Other models with large context windows includes Anthropic's Claude 2.1, with a context window of up to 200k tokens.<ref>{{cite web |url=https://www.anthropic.com/news/claude-2-1-prompting |title=Long context prompting for Claude 2.1 |date=December 6, 2023 |access-date=January 20, 2024}}</ref> Note that this maximum refers to the number of input tokens and that the maximum number of output tokens differs from the input and is often smaller. For example, the GPT-4 Turbo model has a maximum output of 4096 tokens.<ref>{{cite web |url=https://platform.openai.com/docs/guides/rate-limits |title=Rate limits |author=<!--Not stated--> |website=openai.com |access-date=January 20, 2024}}</ref>
 
Length of a conversation that the model can take into account when generating its next answer is limited by the size of a context window, as well. If the length of a conversation, for example with [[Chat-GPT]], is longer than its context window, only the parts inside the context window are taken into account when generating the next answer, or the model needs to apply some algorithm to summarize the too distant parts of conversation.