Reasoning language model: Difference between revisions

Content deleted Content added
Refactored introduction to be more specific
Added models section, taken directly from Reflection (artificial intelligence) article
Line 56:
 
Self-consistency can be combined with an ORM. The model would be used to generate multiple answers, and the answers would be clustered, so that each cluster has the same answer. The ORM is used to compute the reward for each answer, and the rewards within each cluster is summed. The answer corresponding to the cluster with the highest summed reward is output.<ref name=":3" />
 
== Models ==
 
=== [[OpenAI]] ===
* [[OpenAI o4-mini|o4-mini]]
* [[OpenAI o3|o3 and o3-mini]]
* [[OpenAI o1|o1-preview and o1]]
 
=== [[Gemini (chatbot)|Gemini]] ===
* 2.5 pro (2.5 Flash can also be used this way)
* 2.0 Flash Thinking Experimental
 
=== [[DeepSeek]] ===
* R1 (based on V3)
* R1-Lite-Preview (test version based on V2.5)
 
=== [[Qwen]] ===
* QvQ-72B-Preview — an experimental visual reasoning model launched on December 24, 2024, which integrates image understanding with verbal chain-of-thought reasoning.
* QwQ-32B-Preview — an experimental text-based reasoning model released in late November 2024 that emphasizes complex, step-by-step analysis.
 
=== [[Anthropic]] ===
* [[Claude (language model)#Claude 3.7|Claude Sonnet 3.7]] has an adjustable amount of 'thinking' tokens.
 
=== [[XAI (company)|xAI]] ===
* [[Grok (chatbot)|Grok]] 3
 
=== [[Hugging Face]] ===
 
* OlympicCoder-7B & 32B, as part of reproducing the R1 training openly (Open R1 project).<ref>{{Cite web |date=2025-03-12 |title=@lewtun on Hugging Face: "Introducing OlympicCoder: a series of open reasoning models that can solve…" |url=https://huggingface.co/posts/lewtun/886287473065721 |access-date=2025-04-04 |website=huggingface.co}}</ref>
 
== See also ==