Content deleted Content added
Line 83:
{{Main|Benchmark (computing)|List of language model benchmarks}}
The reasoning ability of language models are usually tested on problems
* GSM8K (Grade School Math): 8.5K linguistically diverse [[Primary school|elementary school]] [[Word problem (mathematics education)|math word problems]] that require 2 to 8 basic arithmetic operations to solve.<ref name=":2" />
|