What is the best open LLM on SimpleBench?

DeepSeek V4 Pro is the top open model on SimpleBench, scoring 61.2%. Among all models tested — including proprietary ones — it ranks #22. The top model overall is Claude Fable 5 Max (Anthropic) at 81.9%.

Can open models match proprietary models on SimpleBench?

Not quite on SimpleBench: the strongest proprietary model (Claude Fable 5 Max) scores 81.9%, ahead of the best open model (DeepSeek V4 Pro) at 61.2% — but you can run the open one yourself.

Reasoning

SimpleBench Leaderboard

Name: SimpleBench — open LLM scores
Creator: epoch

SimpleBench is a set of everyday, common-sense and trick questions that humans answer easily but language models often get wrong. It probes basic reasoning and robustness rather than specialist knowledge.

Source: epoch19 open models ranked+71 proprietaryData through Jul 2026

Open models All models

Open models ranked on SimpleBench

# shows rank among open models / rank overall (including proprietary).

#	Model	Score
1 / 22	DeepSeek V4 Pro · 861.6B	61.2%
2 / 30	Kimi K2.7 Code · 1058.6B	57.9%
3 / 35	GLM 5.1 · 753.9B	55.1%
4 / 38	GLM 5 · 753.9B	53.2%
5 / 41	DeepSeek V3.2 Speciale · 685.4B	52.6%
6 / 45	GLM 4.7 · 358.3B	47.7%
7 / 47	Kimi K2.5 · 1058.6B	46.8%
8 / 52	MiniMax M3 · 427.0B	45.8%
9 / 59	DeepSeek R1 0528 · 684.5B	40.8%
10 / 61	DeepSeek V3.1 · 684.5B	40.0%
11 / 68	Qwen3 235B A22B · 235.1B	31.0%
12 / 69	DeepSeek R1 · 684.5B	30.9%
13 / 71	Llama 4 Maverick 17B 128E Instruct · 401.6B	27.7%
14 / 73	DeepSeek v3 0324 · 684.5B	27.2%
15 / 76	Kimi K2 Instruct · 1026.5B	26.3%
16 / 79	Llama 3.1 405B Instruct · 405.9B	23.0%
17 / 83	GPT OSS 120B · 120.4B	22.1%
18 / 84	Llama 3.3 70B Instruct · 70.6B	19.9%
19 / 85	DeepSeek v3 · 684.5B	18.9%

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

SimpleBench: frequently asked questions

What is the best open LLM on SimpleBench?: DeepSeek V4 Pro is the top open model on SimpleBench, scoring 61.2%. Among all models tested — including proprietary ones — it ranks #22. The top model overall is Claude Fable 5 Max (Anthropic) at 81.9%.
Can open models match proprietary models on SimpleBench?: Not quite on SimpleBench: the strongest proprietary model (Claude Fable 5 Max) scores 81.9%, ahead of the best open model (DeepSeek V4 Pro) at 61.2% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.