What is the best open LLM on LiveBench Math?

DeepSeek V4 Pro is the top open model on LiveBench Math, scoring 90.7. Among all models tested — including proprietary ones — it ranks #15. The top model overall is GPT 5.6 Sol Max (OpenAI) at 96.2.

What's the best LiveBench Math model you can run on a 24 GB GPU?

Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 79.9 on LiveBench Math.

Can open models match proprietary models on LiveBench Math?

Not quite on LiveBench Math: the strongest proprietary model (GPT 5.6 Sol Max) scores 96.2, ahead of the best open model (DeepSeek V4 Pro) at 90.7 — but you can run the open one yourself.

Math

LiveBench Math Leaderboard

Name: LiveBench Math — open LLM scores
Creator: livebench

LiveBench Math measures mathematical problem-solving on contamination-free, regularly-refreshed questions, including competition-style problems.

Source: livebench8 open models ranked+29 proprietaryData through Jun 2026

Open models All models

All models ranked on LiveBench Math

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#	Model	Score
1	GPT 5.6 Sol Max · proprietary	96.2
2	Claude Fable 5 Max · proprietary	96.0
3	GPT 5.5 · proprietary	95.9
4	Claude Fable 5 · proprietary	95.7
5	GPT 5.6 Sol · proprietary	95.5
6	Claude Opus 4.8 · proprietary	95.3
7	GPT 5.6 Terra Max · proprietary	94.9
8	GPT 5.4 · proprietary	94.2
9	GPT 5.2 · proprietary	93.2
10	Claude Sonnet 5 · proprietary	92.9
11	Claude Opus 4.7 · proprietary	92.8
12	Gemini 3.1 Pro · proprietary	91.0
13	GPT 5.4 Nano · proprietary	91.0
14	Grok 4.5 · proprietary	90.8
15	DeepSeek V4 Pro · 861.6B	90.7
16	Claude Opus 4.5 · proprietary	90.4
17	GLM 5.2 · 753.3B	89.8
18	GPT 5.6 Terra · proprietary	89.5
19	Claude Opus 4.6 · proprietary	89.3
20	GPT 5.2 Codex · proprietary	88.8
21	Inkling · 952.4B	88.4
22	Gemini 3.5 Flash · proprietary	88.2
23	GPT 5.6 Luna Max · proprietary	87.2
24	Muse Spark 1.1 · proprietary	87.1
25	Claude Sonnet 4.6 · proprietary	87.0
26	GPT 5.6 Luna · proprietary	86.3
27	Qwen3.7 Max · proprietary	85.3
28	Kimi K3 · proprietary	84.4
29	Grok 4.3 · proprietary	84.3
30	Kimi K2.6 · 1058.6B	84.3
31	Qwen3.6 Plus · proprietary	83.7
32	Qwen3.6 27B · 27.8B	79.9
33	DeepSeek V4 Flash · 158.1B	79.7
34	Kimi K2.7 Code · 1058.6B	79.6
35	GPT 5.4 Mini · proprietary	78.5
36	Grok Build 0.1 · proprietary	78.4
37	MiniMax M3 · 427.0B	77.0

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

LiveBench Math: frequently asked questions

What is the best open LLM on LiveBench Math?: DeepSeek V4 Pro is the top open model on LiveBench Math, scoring 90.7. Among all models tested — including proprietary ones — it ranks #15. The top model overall is GPT 5.6 Sol Max (OpenAI) at 96.2.
What's the best LiveBench Math model you can run on a 24 GB GPU?: Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 79.9 on LiveBench Math.
Can open models match proprietary models on LiveBench Math?: Not quite on LiveBench Math: the strongest proprietary model (GPT 5.6 Sol Max) scores 96.2, ahead of the best open model (DeepSeek V4 Pro) at 90.7 — but you can run the open one yourself.

Scores aggregated from livebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.