Content deleted Content added
Amberkitten (talk | contribs) →2025: add link |
Amberkitten (talk | contribs) →2025: add link |
||
Line 21:
In January 2025, DeepSeek released R1, a model competitive with o1 at lower cost, highlighting the effectiveness of [[Group Relative Policy Optimization]](GRPO).<ref>{{Cite web |last=Orland |first=Kyle |date=2025-01-28 |title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? |url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ |access-date=2025-02-06 |website=Ars Technica |language=en-US}}</ref> On January 25, 2025, [[DeepSeek]] launched a feature in their DeepSeek R1 model, enabling the simultaneous use of search and reasoning capabilities, which allows for more efficient integration of data retrieval with reflective reasoning processes. OpenAI subsequently released o3-mini, followed by [[ChatGPT Deep Research|Deep Research]] which is based on [[OpenAI o3|o3]].<ref>{{Cite news |last=Milmo |first=Dan |date=2025-02-03 |title=OpenAI launches 'deep research' tool that it says can match research analyst |url=https://www.theguardian.com/technology/2025/feb/03/openai-deep-research-agent-chatgpt-deepseek |access-date=2025-03-16 |work=The Guardian |language=en-GB |issn=0261-3077}}</ref> The power of distillation was further demonstrated by s1-32B, achieving strong performance with budget forcing and scaling techniques.<ref>{{Citation |last1=Muennighoff |first1=Niklas |title=s1: Simple test-time scaling |date=2025-02-03 |arxiv=2501.19393 |last2=Yang |first2=Zitong |last3=Shi |first3=Weijia |last4=Li |first4=Xiang Lisa |last5=Fei-Fei |first5=Li |last6=Hajishirzi |first6=Hannaneh |last7=Zettlemoyer |first7=Luke |last8=Liang |first8=Percy |last9=Candès |first9=Emmanuel}}</ref>
On February 2, 2025, OpenAI released
== Supervised finetuning ==
|