Content deleted Content added
Xose.vazquez (talk | contribs) |
Ira Leviton (talk | contribs) m Fixed a reference. Please see Category:CS1 errors: dates. |
||
(5 intermediate revisions by 4 users not shown) | |||
Line 18:
=== 2025 ===
In January 2025, [[DeepSeek]] released [[DeepSeek (chatbot)|R1]], a model with comparable performance to o1 at lower cost. The release demonstrated the effectiveness of [[Group Relative Policy Optimization]] (GRPO).<ref>{{cite news |last1=Orland |first1=Kyle |date=2025-01-28 |title=How does DeepSeek R1 really fare against OpenAI's best reasoning models? |url=https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/ |access-date=2025-02-06 |work=Ars Technica}}</ref><ref name=":9">{{cite arXiv |last1=DeepSeek-AI |last2=Guo |first2=Daya |last3=Yang |first3=Dejian |last4=Zhang |first4=Haowei |last5=Song |first5=Junxiao |last6=Zhang |first6=Ruoyu |last7=Xu |first7=Runxin |last8=Zhu |first8=Qihao |last9=Ma |first9=Shirong |title=DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning |date=2025-01-22 |eprint=2501.12948 |class=cs.CL}}</ref> On January 25, 2025, [[DeepSeek]] added a feature to DeepSeek R1 that lets the model search the web while it reasons, making it easier to combine retrieval with reasoning.<ref>{{cite news |script-title=zh:DeepSeek 支持
On February 2, 2025, OpenAI released [[ChatGPT Deep Research|Deep Research]] based on their [[OpenAI o3|o3]] model,<ref name=":5">{{cite web |date=2025-02-02 |title=Introducing deep research |url=https://openai.com/index/introducing-deep-research/ |access-date=2025-02-05 |website=OpenAI |language=en-US}}</ref>
== Supervised finetuning ==
Line 78:
== Benchmarks ==
Reasoning models generally score higher than non-reasoning models on many benchmarks, especially on tasks requiring multi-step reasoning.<ref>{{Citation |last=Wei |first=Jason |title=Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |date=2023-01-10 |url=http://arxiv.org/abs/2201.11903 |access-date=2025-08-30 |publisher=arXiv |doi=10.48550/arXiv.2201.11903 |id=arXiv:2201.11903 |last2=Wang |first2=Xuezhi |last3=Schuurmans |first3=Dale |last4=Bosma |first4=Maarten |last5=Ichter |first5=Brian |last6=Xia |first6=Fei |last7=Chi |first7=Ed |last8=Le |first8=Quoc |last9=Zhou |first9=Denny}}</ref><ref>{{Citation |last=Wang |first=Xuezhi |title=Self-Consistency Improves Chain of Thought Reasoning in Language Models |date=2023-03-07 |url=http://arxiv.org/abs/2203.11171 |access-date=2025-08-30 |publisher=arXiv |doi=10.48550/arXiv.2203.11171 |id=arXiv:2203.11171 |last2=Wei |first2=Jason |last3=Schuurmans |first3=Dale |last4=Le |first4=Quoc |last5=Chi |first5=Ed |last6=Narang |first6=Sharan |last7=Chowdhery |first7=Aakanksha |last8=Zhou |first8=Denny}}</ref><ref>{{Citation |last=Yao |first=Shunyu |title=Tree of Thoughts: Deliberate Problem Solving with Large Language Models |date=2023-12-03 |url=http://arxiv.org/abs/2305.10601 |access-date=2025-08-30 |publisher=arXiv |doi=10.48550/arXiv.2305.10601 |id=arXiv:2305.10601 |last2=Yu |first2=Dian |last3=Zhao |first3=Jeffrey |last4=Shafran |first4=Izhak |last5=Griffiths |first5=Thomas L. |last6=Cao |first6=Yuan |last7=Narasimhan |first7=Karthik}}</ref><ref>{{Cite journal |last=Cui |first=Dong-Xu |last2=Long |first2=Shi-Yu |last3=Tang |first3=Yi-Xuan |last4=Zhao |first4=Yue |last5=Li |first5=Qiao |date=2025-08-25 |title=Can Reasoning Power Significantly Improve the Knowledge of Large Language Models for Chemistry?─Based on Conversations with LLMs |url=https://doi.org/10.1021/acs.jcim.5c01265 |journal=Journal of Chemical Information and Modeling |doi=10.1021/acs.jcim.5c01265 |issn=1549-9596}}</ref><ref>{{Citation |last=Qwen |title=Qwen2.5 Technical Report |date=2024 |url=https://arxiv.org/abs/2412.15115 |access-date=2025-08-30 |publisher=arXiv |doi=10.48550/ARXIV.2412.15115 |last2=Yang |first2=An |last3=Yang |first3=Baosong |last4=Zhang |first4=Beichen |last5=Hui |first5=Binyuan |last6=Zheng |first6=Bo |last7=Yu |first7=Bowen |last8=Li |first8=Chengyuan |last9=Liu |first9=Dayiheng}}</ref><ref>{{Citation |last=Comanici |first=Gheorghe |title=Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities |date=2025-07-22 |url=http://arxiv.org/abs/2507.06261 |access-date=2025-08-30 |publisher=arXiv |doi=10.48550/arXiv.2507.06261 |id=arXiv:2507.06261 |last2=Bieber |first2=Eric |last3=Schaekermann |first3=Mike |last4=Pasupat |first4=Ice |last5=Sachdeva |first5=Noveen |last6=Dhillon |first6=Inderjit |last7=Blistein |first7=Marcel |last8=Ram |first8=Ori |last9=Zhang |first9=Dan}}</ref><ref>{{Cite journal |last=Mirza |first=Adrian |last2=Alampara |first2=Nawaf |last3=Kunchapu |first3=Sreekanth |last4=Ríos-García |first4=Martiño |last5=Emoekabu |first5=Benedict |last6=Krishnan |first6=Aswanth |last7=Gupta |first7=Tanya |last8=Schilling-Wilhelmi |first8=Mara |last9=Okereke |first9=Macjonathan |last10=Aneesh |first10=Anagha |last11=Asgari |first11=Mehrdad |last12=Eberhardt |first12=Juliane |last13=Elahi |first13=Amir Mohammad |last14=Elbeheiry |first14=Hani M. |last15=Gil |first15=María Victoria |date=July 2025 |title=A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists |url=https://www.nature.com/articles/s41557-025-01815-x |journal=Nature Chemistry |language=en |volume=17 |issue=7 |pages=1027–1034 |doi=10.1038/s41557-025-01815-x |issn=1755-4349}}</ref>
Some benchmarks exclude reasoning models because their responses take longer and cost more.<ref>{{cite
=== Humanity's Last Exam ===
Line 97:
=== Generation time ===
Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to a standard [[large language model]].
== Models ==
|