Content deleted Content added
Added benchmarks (essentially "benefits"), and drawbacks sections from Reflection (artificial intelligence) |
Fix cite bug |
||
Line 84:
=== AIME ===
The [[American Invitational Mathematics Examination]] (AIME) benchmark, a challenging mathematics competition, demonstrates significant performance differences between model types. Non-reasoning models typically solve less than 30% of AIME. In contrast, models employing reasoning techniques score between 50% and 80%.<ref
=== o3-mini performance ===
|