Knowledge

SimpleQA Leaderboard

SimpleQA measures factual accuracy on short, fact-seeking questions with a single correct answer — directly probing how often a model is right versus confidently wrong (hallucination) on simple facts.

Source: epoch4 open models ranked+47 proprietaryData through May 2026

All models ranked on SimpleQA

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#ModelScore
1gemini-3.1-pro-preview · proprietary
77.3%
2gemini-3-pro-preview · proprietary
72.9%
3gemini-3.5-flash_high · proprietary
68.4%
4qwen3-max-2025-09-23 · proprietary
67.5%
5gemini-3-flash-preview · proprietary
67.4%
6muse-spark · proprietary
66.3%
7gpt-5.5-pro-pre-release_xhigh · proprietary
64.5%
8gpt-5.5-pre-release_xhigh · proprietary
63.1%
9qwen3.6-max-preview · proprietary
56.9%
10gemini-2.5-pro · proprietary
56.0%
11o3-2025-04-16_high · proprietary
53.0%
12claude-opus-4-7_xhigh · proprietary
50.6%
13gpt-5-2025-08-07_high · proprietary
50.6%
14Qwen3 235B A22B Thinking 2507 · 235.1B
50.1%
15qwen3.6-plus · proprietary
49.1%
16gpt-5.1-2025-11-13_high · proprietary
48.9%
17grok-4-0709 · proprietary
47.9%
18gpt-5.4-pro-2026-03-05_xhigh · proprietary
47.8%
19claude-opus-4-6_32K · proprietary
46.5%
20gpt-5.4-2026-03-05_xhigh · proprietary
44.8%
21claude-opus-4-6 · proprietary
43.1%
22claude-opus-4-5-20251101_32K · proprietary
41.8%
23claude-opus-4-6_max · proprietary
41.0%
24gpt-5.2-2025-12-11_xhigh · proprietary
38.9%
25kimi-k2.6 · proprietary
38.7%
26gpt-5.2-2025-12-11_high · proprietary
38.2%
27GLM 5.1 · 753.9B
37.3%
28gpt-5.2-2025-12-11_medium · proprietary
35.4%
29claude-opus-4-1-20250805_27K · proprietary
34.8%
30gpt-5.2-2025-12-11_low · proprietary
34.7%
31fireworks/kimi-k2p5 · proprietary
33.9%
32kimi-k2-thinking-turbo · proprietary
31.6%
33GLM 4.7 · 358.3B
31.5%
34claude-sonnet-4-6_32K · proprietary
29.0%
35gpt-5.4-mini-2026-03-17_high · proprietary
28.6%
36deepseek-reasoner · proprietary
27.5%
37DeepSeek R1 0528 · 684.5B
27.4%
38qwen3.5-plus · proprietary
26.0%
39o4-mini-2025-04-16_high · proprietary
23.9%
40claude-sonnet-4-5-20250929_59K · proprietary
23.6%
41qwen3.6-flash · proprietary
21.2%
42grok-3-mini-beta_high · proprietary
21.1%
43gpt-5-mini-2025-08-07_high · proprietary
21.0%
44qwen3.5-flash · proprietary
19.8%
45openai/gpt-oss-120b_high · proprietary
13.9%
46claude-sonnet-4-5-20250929 · proprietary
13.0%
47gpt-5-nano-2025-08-07_high · proprietary
12.2%
48gpt-5.4-nano-2026-03-17_high · proprietary
12.0%
49gemma-4-31b-it · proprietary
9.6%
50claude-3-5-haiku-20241022 · proprietary
6.7%
51claude-haiku-4-5-20251001_32K · proprietary
5.9%

SimpleQA: frequently asked questions

What is the best open LLM on SimpleQA?
Qwen3 235B A22B Thinking 2507 is the top open model on SimpleQA, scoring 50.1%. Among all models tested — including proprietary ones — it ranks #14.
Can open models match proprietary models on SimpleQA?
Not quite on SimpleQA: the strongest proprietary model (gemini-3.1-pro-preview) scores 77.3%, ahead of the best open model (Qwen3 235B A22B Thinking 2507) at 50.1% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.