What is the best open LLM on SimpleQA?

DeepSeek V4 Pro is the top open model on SimpleQA, scoring 57.0%. Among all models tested — including proprietary ones — it ranks #12. The top model overall is Gemini 3.1 Pro Preview (Google DeepMind) at 77.3%.

What's the best SimpleQA model you can run on a 24 GB GPU?

Gemma 4 31B IT is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 9.6% on SimpleQA.

Can open models match proprietary models on SimpleQA?

Not quite on SimpleQA: the strongest proprietary model (Gemini 3.1 Pro Preview) scores 77.3%, ahead of the best open model (DeepSeek V4 Pro) at 57.0% — but you can run the open one yourself.

Knowledge

SimpleQA Leaderboard

Name: SimpleQA — open LLM scores
Creator: epoch

SimpleQA measures factual accuracy on short, fact-seeking questions with a single correct answer — directly probing how often a model is right versus confidently wrong (hallucination) on simple facts.

Source: epoch11 open models ranked+54 proprietaryData through Jul 2026

Open models All models

Open models ranked on SimpleQA

# shows rank among open models / rank overall (including proprietary).

#	Model	Score
1 / 12	DeepSeek V4 Pro · 861.6B	57.0%
2 / 19	Qwen3 235B A22B Thinking 2507 · 235.1B	50.1%
3 / 33	Kimi K2.7 Code · 1058.6B	39.2%
4 / 35	Kimi K2.6 · 1058.6B	38.7%
5 / 39	GLM 5.1 · 753.9B	37.3%
6 / 44	Kimi K2.5 · 1058.6B	33.9%
7 / 45	Kimi K2 Thinking · 1058.1B	31.6%
8 / 46	GLM 4.7 · 358.3B	31.5%
9 / 50	DeepSeek R1 0528 · 684.5B	27.4%
10 / 59	GPT OSS 120B · 120.4B	13.9%
11 / 63	Gemma 4 31B IT · 32.7B	9.6%

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

SimpleQA: frequently asked questions

What is the best open LLM on SimpleQA?: DeepSeek V4 Pro is the top open model on SimpleQA, scoring 57.0%. Among all models tested — including proprietary ones — it ranks #12. The top model overall is Gemini 3.1 Pro Preview (Google DeepMind) at 77.3%.
What's the best SimpleQA model you can run on a 24 GB GPU?: Gemma 4 31B IT is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 9.6% on SimpleQA.
Can open models match proprietary models on SimpleQA?: Not quite on SimpleQA: the strongest proprietary model (Gemini 3.1 Pro Preview) scores 77.3%, ahead of the best open model (DeepSeek V4 Pro) at 57.0% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.