Reasoning language model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:52, 14 August 2025 edit Not-cheesewhisk3rs (talk \| contribs) Extended confirmed users 5,496 edits m MOS:CURLY Tag: JWB ← Previous edit		Latest revision as of 10:21, 20 August 2025 edit undo SimonAytes (talk \| contribs) 13 edits →Generation time: Update to be more factually correct and concise. Tag: Visual edit
(One intermediate revision by the same user not shown)
Line 18: === 2025 === In January 2025, [[DeepSeek]] released [[DeepSeek (chatbot)\|R1]], a model with comparable performance to o1 at lower cost. The release demonstrated the effectiveness of [[Group Relative Policy Optimization]] (GRPO).<ref>{{cite news \|last1=Orland \|first1=Kyle \|date=2025-01-28 \|title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? \|url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ \|access-date=2025-02-06 \|work=Ars Technica}}</ref><ref name=":9">{{cite arXiv \|last1=DeepSeek-AI \|last2=Guo \|first2=Daya \|last3=Yang \|first3=Dejian \|last4=Zhang \|first4=Haowei \|last5=Song \|first5=Junxiao \|last6=Zhang \|first6=Ruoyu \|last7=Xu \|first7=Runxin \|last8=Zhu \|first8=Qihao \|last9=Ma \|first9=Shirong \|title=DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning \|date=2025-01-22 \|eprint=2501.12948 \|class=cs.CL}}</ref> On January 25, 2025, [[DeepSeek]] added a feature to DeepSeek R1 that lets the model search the web while it reasons, making it easier to combine retrieval with reasoning.<ref>{{cite news \|script-title=zh:DeepSeek 支持"深度思考+联网检索"能力 \|trans-title=DeepSeek adds a search feature supporting simultaneous deep thinking and web search \|work=People's Daily Online \|date=2025-01-29 \|url=http://tech.people.com.cn/n1/2025/0129/c1007-40386565.html \|language=zh \|access-date=2025-07-26}}</ref> OpenAI subsequently released o3-mini, followed by [[ChatGPT Deep Research\|Deep Research]] based on [[OpenAI o3\|o3]].<ref>{{cite news \|last1=Milmo \|first1=Dan \|date=2025-02-03 \|title=OpenAI launches 'deep research' tool that it says can match research analyst \|url=https://www.theguardian.com/technology/2025/feb/03/openai-deep-research-agent-chatgpt-deepseek \|access-date=2025-03-16 \|work=The ~~Guardian~~effectiveness ~~\|language=en-GB~~of ~~\|issn=0261-3077}}</ref>~~distillation ~~The~~for ~~effectiveness~~reasoning ~~of distillation~~models was shown ~~again~~in works such byas s1-32B, which ~~reached~~achieved strong performance ~~with~~through budget forcing and scaling methods.<ref name=":10">{{cite arXiv \|last1=Muennighoff \|first1=Niklas \|last2=Yang \|first2=Zitong \|last3=Shi \|first3=Weijia \|last4=Li \|first4=Xiang Lisa \|last5=Fei-Fei \|first5=Li \|last6=Hajishirzi \|first6=Hannaneh \|last7=Zettlemoyer \|first7=Luke \|last8=Liang \|first8=Percy \|last9=Candès \|first9=Emmanuel \|title=s1: Simple test-time scaling \|date=2025-02-03 \|eprint=2501.19393 \|class=cs.CL}}</ref><ref name=":6"/> On February 2, 2025, OpenAI released [[ChatGPT Deep Research\|Deep Research]] based on their [[OpenAI o3\|o3]] model,<ref name=":5">{{cite web \|date=2025-02-02 \|title=Introducing deep research \|url=https://openai.com/index/introducing-deep-research/ \|access-date=2025-02-05 \|website=OpenAI \|language=en-US}}</ref> ~~a tool that integrates reasoning and web search in one workflow so~~allowing users ~~can~~to ~~run~~initiate complex research ~~that needs several steps~~tasks and ~~sources.~~generate Itcomprehensive isreports ~~based~~which onincorporate ~~[[OpenAI~~various ~~o3\|o3]] and can take~~sources from 5the ~~to 30 minutes to generate comprehensive reports~~web.<ref name=":5" /> == Supervised finetuning == Line 97: === Generation time === Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to a standard [[large language model]]. ~~Reasoning increases response time, with current models taking from a few seconds to several minutes to answer. As depth of reasoning grows, future models may need even longer.~~ == Models ==