What is the best open LLM on LiveBench Reasoning?

Kimi K2.7 Code is the top open model on LiveBench Reasoning, scoring 82.8. Among all models tested — including proprietary ones — it ranks #22. The top model overall is GPT 5.6 Sol Max (OpenAI) at 91.7.

What's the best LiveBench Reasoning model you can run on a 24 GB GPU?

Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 70.3 on LiveBench Reasoning.

Can open models match proprietary models on LiveBench Reasoning?

Not quite on LiveBench Reasoning: the strongest proprietary model (GPT 5.6 Sol Max) scores 91.7, ahead of the best open model (Kimi K2.7 Code) at 82.8 — but you can run the open one yourself.

Reasoning

LiveBench Reasoning Leaderboard

Name: LiveBench Reasoning — open LLM scores
Creator: livebench

LiveBench Reasoning measures logical, multi-step reasoning using contamination-free questions that are refreshed regularly, so models cannot have trained on the test set.

Source: livebench8 open models ranked+29 proprietaryData through Jun 2026

Open models All models

Open models ranked on LiveBench Reasoning

# shows rank among open models / rank overall (including proprietary).

#	Model	Score
1 / 22	Kimi K2.7 Code · 1058.6B	82.8
2 / 23	DeepSeek V4 Pro · 861.6B	82.7
3 / 27	Kimi K2.6 · 1058.6B	79.4
4 / 28	GLM 5.2 · 753.3B	78.6
5 / 29	Inkling · 952.4B	78.3
6 / 33	MiniMax M3 · 427.0B	74.5
7 / 36	DeepSeek V4 Flash · 158.1B	70.6
8 / 37	Qwen3.6 27B · 27.8B	70.3

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

LiveBench Reasoning: frequently asked questions

What is the best open LLM on LiveBench Reasoning?: Kimi K2.7 Code is the top open model on LiveBench Reasoning, scoring 82.8. Among all models tested — including proprietary ones — it ranks #22. The top model overall is GPT 5.6 Sol Max (OpenAI) at 91.7.
What's the best LiveBench Reasoning model you can run on a 24 GB GPU?: Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 70.3 on LiveBench Reasoning.
Can open models match proprietary models on LiveBench Reasoning?: Not quite on LiveBench Reasoning: the strongest proprietary model (GPT 5.6 Sol Max) scores 91.7, ahead of the best open model (Kimi K2.7 Code) at 82.8 — but you can run the open one yourself.

Scores aggregated from livebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.