Revision as of 09:59, 13 June 2025 edit Hplotter (talk \| contribs) 448 edits m →Mistral AI Tag: Visual edit ← Previous edit		Revision as of 13:21, 10 July 2025 edit undo Permacultura (talk \| contribs) Extended confirmed users 2,842 edits →2024: In September 2024, OpenAI released o1-preview, an LLM with enhanced reasoning Next edit →
Line 8: == History == === 2024 === In September 2024, [[OpenAI]] released [[OpenAI o1#release\|o1-preview]], an LLM with enhanced reasoning~~, was released in September 2024~~.<ref>{{Cite web \|last=Edwards \|first=Benj \|date=2024-09-12 \|title=OpenAI's new "reasoning" AI models are here: o1-preview and o1-mini \|url=https://arstechnica.com/information-technology/2024/09/openais-new-reasoning-ai-models-are-here-o1-preview-and-o1-mini/ \|access-date=2025-02-06 \|website=Ars Technica \|language=en-US}}</ref> The full version, [[OpenAI o1\|o1]], followed in December 2024. OpenAI also began sharing results on its successor, [[OpenAI o3\|o3]].<ref>{{Cite web \|last= \|first= \|date=2024-12-20 \|title=OpenAI confirms new frontier models o3 and o3-mini \|url=https://venturebeat.com/ai/openai-confirms-new-frontier-models-o3-and-o3-mini/ \|access-date=2025-02-06 \|website=VentureBeat \|language=en-US}}</ref> The development of reasoning LLMs has illustrated what [[Richard S. Sutton\|Rich Sutton]] termed the "bitter lesson": that general methods leveraging computation often outperform those relying on specific human insights.<ref>{{Cite web \|last=Sutton \|first=Richard S. \|title=The Bitter Lesson \|url=http://www.incompleteideas.net/IncIdeas/BitterLesson.html \|access-date=2025-02-27 \|website=Incomplete Ideas}}</ref> For instance, some research groups, such as the Generative AI Research Lab (GAIR), initially explored complex techniques like tree search and reinforcement learning in attempts to replicate o1's capabilities. However, they found, as documented in their "o1 Replication Journey" papers, that [[knowledge distillation]] — training a smaller model to mimic o1's outputs – was surprisingly effective. This highlighted the power of distillation in this context.

Reasoning language model: Difference between revisions