Reasoning language model: Difference between revisions

Content deleted Content added
Hplotter (talk | contribs)
2024: In September 2024, OpenAI released o1-preview, an LLM with enhanced reasoning
Line 8:
== History ==
=== 2024 ===
In September 2024, [[OpenAI]] released [[OpenAI o1#release|o1-preview]], an LLM with enhanced reasoning, was released in September 2024.<ref>{{Cite web |last=Edwards |first=Benj |date=2024-09-12 |title=OpenAI's new "reasoning" AI models are here: o1-preview and o1-mini |url=https://arstechnica.com/information-technology/2024/09/openais-new-reasoning-ai-models-are-here-o1-preview-and-o1-mini/ |access-date=2025-02-06 |website=Ars Technica |language=en-US}}</ref> The full version, [[OpenAI o1|o1]], followed in December 2024. OpenAI also began sharing results on its successor, [[OpenAI o3|o3]].<ref>{{Cite web |last= |first= |date=2024-12-20 |title=OpenAI confirms new frontier models o3 and o3-mini |url=https://venturebeat.com/ai/openai-confirms-new-frontier-models-o3-and-o3-mini/ |access-date=2025-02-06 |website=VentureBeat |language=en-US}}</ref>
 
The development of reasoning LLMs has illustrated what [[Richard S. Sutton|Rich Sutton]] termed the "bitter lesson": that general methods leveraging computation often outperform those relying on specific human insights.<ref>{{Cite web |last=Sutton |first=Richard S. |title=The Bitter Lesson |url=http://www.incompleteideas.net/IncIdeas/BitterLesson.html |access-date=2025-02-27 |website=Incomplete Ideas}}</ref> For instance, some research groups, such as the Generative AI Research Lab (GAIR), initially explored complex techniques like tree search and reinforcement learning in attempts to replicate o1's capabilities. However, they found, as documented in their "o1 Replication Journey" papers, that [[knowledge distillation]] — training a smaller model to mimic o1's outputs – was surprisingly effective. This highlighted the power of distillation in this context.