Content deleted Content added
Removed content about Prompting, CoT, RAG, tool use, few shot prompting, benchmarking. These are related concepts but not specific to RLMs. This reads like it was written by someone who just discovered how LLMs work but it detracts from the emerging concept of RLMs |
Refactored introduction to be more specific |
||
Line 1:
{{Short description|Language models designed for reasoning tasks}}{{Merge to|Reflection (artificial intelligence)|date=April 2025}}{{unreliable sources|date=January 2025}}
'''Reasoning language models''' ('''RLMs''') are [[large language model]]s that have been further trained to solve multi‑step reasoning tasks.<ref>{{cite arXiv |title=Reasoning Language Models: A Blueprint |last=Besta |first=Maciej |date=2025-01-23 |eprint=2501.11223 |class=cs.CL}}</ref> These models perform better on logical, mathematical or programmatic tasks than traditional autoregressive LLMs, have the ability to [[Backtracking|backtrack]], and employ test-time compute as an additional [[Neural scaling law|scaling axis]] beyond [[Training, validation, and test data sets|training examples]], parameter count, and train-time compute.
== Supervised finetuning ==
|