Revision as of 19:41, 2 June 2025 edit Amberkitten (talk \| contribs) 341 edits →2025: add link Tag: Visual edit ← Previous edit		Revision as of 19:42, 2 June 2025 edit undo Amberkitten (talk \| contribs) 341 edits →2025: add link Tag: Visual edit Next edit →
Line 21: In January 2025, DeepSeek released R1, a model competitive with o1 at lower cost, highlighting the effectiveness of [[Group Relative Policy Optimization]](GRPO).<ref>{{Cite web \|last=Orland \|first=Kyle \|date=2025-01-28 \|title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? \|url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ \|access-date=2025-02-06 \|website=Ars Technica \|language=en-US}}</ref> On January 25, 2025, [[DeepSeek]] launched a feature in their DeepSeek R1 model, enabling the simultaneous use of search and reasoning capabilities, which allows for more efficient integration of data retrieval with reflective reasoning processes. OpenAI subsequently released o3-mini, followed by [[ChatGPT Deep Research\|Deep Research]] which is based on [[OpenAI o3\|o3]].<ref>{{Cite news \|last=Milmo \|first=Dan \|date=2025-02-03 \|title=OpenAI launches 'deep research' tool that it says can match research analyst \|url=https://www.theguardian.com/technology/2025/feb/03/openai-deep-research-agent-chatgpt-deepseek \|access-date=2025-03-16 \|work=The Guardian \|language=en-GB \|issn=0261-3077}}</ref> The power of distillation was further demonstrated by s1-32B, achieving strong performance with budget forcing and scaling techniques.<ref>{{Citation \|last1=Muennighoff \|first1=Niklas \|title=s1: Simple test-time scaling \|date=2025-02-03 \|arxiv=2501.19393 \|last2=Yang \|first2=Zitong \|last3=Shi \|first3=Weijia \|last4=Li \|first4=Xiang Lisa \|last5=Fei-Fei \|first5=Li \|last6=Hajishirzi \|first6=Hannaneh \|last7=Zettlemoyer \|first7=Luke \|last8=Liang \|first8=Percy \|last9=Candès \|first9=Emmanuel}}</ref> On February 2, 2025, OpenAI released ~~deep~~[[ChatGPT ~~research~~Deep Research\|Deep Research]],<ref>{{Cite web \|date=2025-02-02 \|title=Introducing deep research \|url=https://openai.com/index/introducing-deep-research/ \|access-date=2025-02-05 \|website=OpenAI \|language=en-US}}</ref> a tool that integrates reasoning and web search in a unified workflow, allowing users to perform complex research tasks that require multi-step reasoning and data synthesis from multiple sources. It is based on [[OpenAI o3\|o3]] and can take from 5 to 30 minutes to generate comprehensive reports.<ref>{{Cite web \|last=Ha \|first=Anthony \|date=2025-02-03 \|title=OpenAI unveils a new ChatGPT agent for 'deep research' \|url=https://techcrunch.com/2025/02/02/openai-unveils-a-new-chatgpt-agent-for-deep-research/ \|access-date=2025-02-06 \|website=TechCrunch \|language=en-US}}</ref> == Supervised finetuning ==

Reasoning language model: Difference between revisions