Reasoning language model: Difference between revisions

Content deleted Content added
Hplotter (talk | contribs)
Feedback has nothing to do with this. This article is essentially about RLVR, a new training step on top of LLMs
Line 3:
{{Copy edit|for=jargon|date=May 2025}}
}}
{{Merge to|Feedback neural network|date=April 2025}}
'''Reasoning language models''' ('''RLMs''') are [[large language model]]s that have been further trained to solve multi-step [[reasoning]] tasks.<ref>{{cite arXiv |title=Reasoning Language Models: A Blueprint |last=Besta |first=Maciej |date=2025-01-23 |eprint=2501.11223 |class=cs.CL}}</ref> These models perform better on logical, mathematical or programmatic tasks than traditional autoregressive LLMs, have the ability to [[Backtracking|backtrack]], and employ test-time compute as an additional [[Neural scaling law|scaling axis]] beyond [[Training, validation, and test data sets|training examples]], parameter count, and train-time compute.