Reasoning language model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 05:37, 8 August 2025 edit Xose.vazquez (talk \| contribs) Extended confirmed users 10,808 edits →Models ← Previous edit		Latest revision as of 10:21, 20 August 2025 edit undo SimonAytes (talk \| contribs) 13 edits →Generation time: Update to be more factually correct and concise. Tag: Visual edit
(3 intermediate revisions by 3 users not shown)
Line 18: === 2025 === In January 2025, [[DeepSeek]] released [[DeepSeek (chatbot)\|R1]], a model with comparable performance to o1 at lower cost. The release demonstrated the effectiveness of [[Group Relative Policy Optimization]] (GRPO).<ref>{{cite news \|last1=Orland \|first1=Kyle \|date=2025-01-28 \|title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? \|url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ \|access-date=2025-02-06 \|work=Ars Technica}}</ref><ref name=":9">{{cite arXiv \|last1=DeepSeek-AI \|last2=Guo \|first2=Daya \|last3=Yang \|first3=Dejian \|last4=Zhang \|first4=Haowei \|last5=Song \|first5=Junxiao \|last6=Zhang \|first6=Ruoyu \|last7=Xu \|first7=Runxin \|last8=Zhu \|first8=Qihao \|last9=Ma \|first9=Shirong \|title=DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning \|date=2025-01-22 \|eprint=2501.12948 \|class=cs.CL}}</ref> On January 25, 2025, [[DeepSeek]] added a feature to DeepSeek R1 that lets the model search the web while it reasons, making it easier to combine retrieval with reasoning.<ref>{{cite news \|script-title=zh:DeepSeek 支持“"深度思考+联网检索”"能力 \|trans-title=DeepSeek adds a search feature supporting simultaneous deep thinking and web search \|work=~~People’s~~People's Daily Online \|date=2025-01-29 \|url=http://tech.people.com.cn/n1/2025/0129/c1007-40386565.html \|language=zh \|access-date=2025-07-26}}</ref> OpenAI subsequently released o3-mini, followed by [[ChatGPT Deep Research\|Deep Research]] based on [[OpenAI o3\|o3]].<ref>{{cite news \|last1=Milmo \|first1=Dan \|date=2025-02-03 \|title=OpenAI launches 'deep research' tool that it says can match research analyst \|url=https://www.theguardian.com/technology/2025/feb/03/openai-deep-research-agent-chatgpt-deepseek \|access-date=2025-03-16 \|work=The ~~Guardian~~effectiveness ~~\|language=en-GB~~of ~~\|issn=0261-3077}}</ref>~~distillation ~~The~~for ~~effectiveness~~reasoning ~~of distillation~~models was shown ~~again~~in works such byas s1-32B, which ~~reached~~achieved strong performance ~~with~~through budget forcing and scaling methods.<ref name=":10">{{cite arXiv \|last1=Muennighoff \|first1=Niklas \|last2=Yang \|first2=Zitong \|last3=Shi \|first3=Weijia \|last4=Li \|first4=Xiang Lisa \|last5=Fei-Fei \|first5=Li \|last6=Hajishirzi \|first6=Hannaneh \|last7=Zettlemoyer \|first7=Luke \|last8=Liang \|first8=Percy \|last9=Candès \|first9=Emmanuel \|title=s1: Simple test-time scaling \|date=2025-02-03 \|eprint=2501.19393 \|class=cs.CL}}</ref><ref name=":6"/> On February 2, 2025, OpenAI released [[ChatGPT Deep Research\|Deep Research]] based on their [[OpenAI o3\|o3]] model,<ref name=":5">{{cite web \|date=2025-02-02 \|title=Introducing deep research \|url=https://openai.com/index/introducing-deep-research/ \|access-date=2025-02-05 \|website=OpenAI \|language=en-US}}</ref> ~~a tool that integrates reasoning and web search in one workflow so~~allowing users ~~can~~to ~~run~~initiate complex research ~~that needs several steps~~tasks and ~~sources.~~generate Itcomprehensive isreports ~~based~~which onincorporate ~~[[OpenAI~~various ~~o3\|o3]] and can take~~sources from 5the ~~to 30 minutes to generate comprehensive reports~~web.<ref name=":5" /> == Supervised finetuning == Line 80: Reasoning models generally score higher than non-reasoning models on many benchmarks, especially on tasks requiring multi-step reasoning. Some benchmarks exclude reasoning models because their responses take longer and cost more.<ref>{{cite ~~journal~~book \|last1=Huang \|first1=Yuting \|last2=Zois \|first2=Christos \|last3=Wang \|first3=Yue \|last4=Zhang \|first4=Yue \|last5=Mavromatis \|first5=Christos \|last6=Zeng \|first6=Jiachen \|last7=Yin \|first7=Shihao \|last8=Voulkidis \|first8=Antonios \|last9=Shepard \|first9=Daniel \|~~title~~chapter=Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study \|~~journal~~title=Proceedings of the ~~26th~~2nd International ~~Conference~~Workshop on ~~Information~~Foundation ~~Processing~~Models infor ~~Sensor~~Cyber-Physical ~~Networks~~Systems ~~(IPSN~~& Internet of ~~'25)~~Things \|publisher=ACM \|date=2025 \|pages=1–6 \|doi=10.1145/3722565.3727198 \|arxiv=2503.12282 \|isbn=979-8-4007-1608-9 \|quote=Although we did not evaluate o1 and o3 models … their high cost and inference time make them impractical for online CED, which requires frequent, low-latency API requests.}}</ref><ref>{{cite arXiv \|last1=Hu \|first1=Zihao \|last2=Wang \|first2=Yuqing \|last3=Sun \|first3=Rui \|last4=Lu \|first4=Haoran \|last5=Gong \|first5=Qian \|last6=Wang \|first6=Jinshuai \|last7=Gong \|first7=Yunlong \|last8=Huang \|first8=Yiming \|last9=He \|first9=Peng \|title=Inference-Time Compute: More Faithful? A Research Note \|date=2025-02-13 \|eprint=2502.09673 \|class=cs.CL \|quote=we were unable to evaluate O1 and R1 …}}</ref><ref>{{cite arXiv \|last1=Chen \|first1=Guoliang \|last2=Zhu \|first2=Zhiyao \|last3=Meng \|first3=Qinxiang \|last4=Liang \|first4=Weilin \|last5=Ji \|first5=Zijie \|last6=Liu \|first6=Jiangning \|last7=Zeng \|first7=Jie \|title=RealBench: Evaluating LLMs as Verilog Engineers \|date=2025-03-07 \|eprint=2503.04914 \|class=cs.AI \|quote=For O1-preview, we sample only once due to high cost.}}</ref><ref>{{cite arXiv \|last1=Gupta \|first1=Arpit \|last2=Schapira \|first2=Michael \|last3=Gill \|first3=Phillipa \|last4=Seetharaman \|first4=Srinivasan \|title=On the Feasibility of Using LLMs to Execute Multistage Network Attacks \|date=2025-01-30 \|eprint=2501.16466 \|class=cs.CR \|quote=We were unable to evaluate o1 … the public API has a safeguard that prevents o1 from executing attacks.}}</ref> === Humanity's Last Exam === Line 97: === Generation time === Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to a standard [[large language model]]. ~~Reasoning increases response time, with current models taking from a few seconds to several minutes to answer. As depth of reasoning grows, future models may need even longer.~~ == Models ==