Knowledge
SimpleQA Leaderboard
SimpleQA measures factual accuracy on short, fact-seeking questions with a single correct answer — directly probing how often a model is right versus confidently wrong (hallucination) on simple facts.
Source: epoch4 open models ranked+47 proprietaryData through May 2026
Open models ranked on SimpleQA
# shows rank among open models / rank overall (including proprietary).
| # | Model | Score |
|---|---|---|
| 1 / 14 | Qwen3 235B A22B Thinking 2507 · 235.1B | 50.1% |
| 2 / 27 | GLM 5.1 · 753.9B | 37.3% |
| 3 / 33 | GLM 4.7 · 358.3B | 31.5% |
| 4 / 37 | DeepSeek R1 0528 · 684.5B | 27.4% |
SimpleQA: frequently asked questions
- What is the best open LLM on SimpleQA?
- Qwen3 235B A22B Thinking 2507 is the top open model on SimpleQA, scoring 50.1%. Among all models tested — including proprietary ones — it ranks #14.
- Can open models match proprietary models on SimpleQA?
- Not quite on SimpleQA: the strongest proprietary model (gemini-3.1-pro-preview) scores 77.3%, ahead of the best open model (Qwen3 235B A22B Thinking 2507) at 50.1% — but you can run the open one yourself.
Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.