Reasoning language model: Difference between revisions

Content deleted Content added
Added benchmarks (essentially "benefits"), and drawbacks sections from Reflection (artificial intelligence)
Fix cite bug
Line 84:
 
=== AIME ===
The [[American Invitational Mathematics Examination]] (AIME) benchmark, a challenging mathematics competition, demonstrates significant performance differences between model types. Non-reasoning models typically solve less than 30% of AIME. In contrast, models employing reasoning techniques score between 50% and 80%.<ref name=":1">{{Cite web |date=2025-02-10 |title=MathArena |url=https://matharena.ai/ |access-date=2025-02-10 |archive-url=https://web.archive.org/web/20250210032556/https://matharena.ai/ |archive-date=10 February 2025 }}</ref> While [[OpenAI o1|OpenAI's o1]] maintained or slightly improved its accuracy from reported 2024{{Source?|date=July 2022}} metrics to 2025 AIME results, o3-mini (high) achieved a higher accuracy (80%) at a significantly lower cost (approximately 12 times cheaper).
 
=== o3-mini performance ===