Reasoning language model: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title, template type. Added chapter. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Generation time: Update to be more factually correct and concise.
 
(2 intermediate revisions by 2 users not shown)
Line 18:
 
=== 2025 ===
In January 2025, [[DeepSeek]] released [[DeepSeek (chatbot)|R1]], a model with comparable performance to o1 at lower cost. The release demonstrated the effectiveness of [[Group Relative Policy Optimization]] (GRPO).<ref>{{cite news |last1=Orland |first1=Kyle |date=2025-01-28 |title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? |url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ |access-date=2025-02-06 |work=Ars Technica}}</ref><ref name=":9">{{cite arXiv |last1=DeepSeek-AI |last2=Guo |first2=Daya |last3=Yang |first3=Dejian |last4=Zhang |first4=Haowei |last5=Song |first5=Junxiao |last6=Zhang |first6=Ruoyu |last7=Xu |first7=Runxin |last8=Zhu |first8=Qihao |last9=Ma |first9=Shirong |title=DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning |date=2025-01-22 |eprint=2501.12948 |class=cs.CL}}</ref> On January 25, 2025, [[DeepSeek]] added a feature to DeepSeek R1 that lets the model search the web while it reasons, making it easier to combine retrieval with reasoning.<ref>{{cite news |script-title=zh:DeepSeek 支持"深度思考+联网检索"能力 |trans-title=DeepSeek adds a search feature supporting simultaneous deep thinking and web search |work=People’sPeople's Daily Online |date=2025-01-29 |url=http://tech.people.com.cn/n1/2025/0129/c1007-40386565.html |language=zh |access-date=2025-07-26}}</ref> OpenAI subsequently released o3-mini, followed by [[ChatGPT Deep Research|Deep Research]] based on [[OpenAI o3|o3]].<ref>{{cite news |last1=Milmo |first1=Dan |date=2025-02-03 |title=OpenAI launches 'deep research' tool that it says can match research analyst |url=https://www.theguardian.com/technology/2025/feb/03/openai-deep-research-agent-chatgpt-deepseek |access-date=2025-03-16 |work=The Guardianeffectiveness |language=en-GBof |issn=0261-3077}}</ref>distillation Thefor effectivenessreasoning of distillationmodels was shown againin works such byas s1-32B, which reachedachieved strong performance withthrough budget forcing and scaling methods.<ref name=":10">{{cite arXiv |last1=Muennighoff |first1=Niklas |last2=Yang |first2=Zitong |last3=Shi |first3=Weijia |last4=Li |first4=Xiang Lisa |last5=Fei-Fei |first5=Li |last6=Hajishirzi |first6=Hannaneh |last7=Zettlemoyer |first7=Luke |last8=Liang |first8=Percy |last9=Candès |first9=Emmanuel |title=s1: Simple test-time scaling |date=2025-02-03 |eprint=2501.19393 |class=cs.CL}}</ref><ref name=":6"/>
 
On February 2, 2025, OpenAI released [[ChatGPT Deep Research|Deep Research]] based on their [[OpenAI o3|o3]] model,<ref name=":5">{{cite web |date=2025-02-02 |title=Introducing deep research |url=https://openai.com/index/introducing-deep-research/ |access-date=2025-02-05 |website=OpenAI |language=en-US}}</ref> a tool that integrates reasoning and web search in one workflow soallowing users canto runinitiate complex research that needs several stepstasks and sources.generate Itcomprehensive isreports basedwhich onincorporate [[OpenAIvarious o3|o3]] and can takesources from 5the to 30 minutes to generate comprehensive reportsweb.<ref name=":5" />
 
== Supervised finetuning ==
Line 97:
 
=== Generation time ===
Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to a standard [[large language model]].
Reasoning increases response time, with current models taking from a few seconds to several minutes to answer. As depth of reasoning grows, future models may need even longer.
 
== Models ==