What is the best open LLM on LiveBench Coding?

GLM 5.2 is the top open model on LiveBench Coding, scoring 79.7. Among all models tested — including proprietary ones — it ranks #11. The top model overall is Claude Fable 5 Max (Anthropic) at 86.0.

What's the best LiveBench Coding model you can run on a 24 GB GPU?

Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 71.8 on LiveBench Coding.

Can open models match proprietary models on LiveBench Coding?

Not quite on LiveBench Coding: the strongest proprietary model (Claude Fable 5 Max) scores 86.0, ahead of the best open model (GLM 5.2) at 79.7 — but you can run the open one yourself.

Coding

LiveBench Coding Leaderboard

Name: LiveBench Coding — open LLM scores
Creator: livebench

LiveBench Coding evaluates code generation and completion on fresh, contamination-free programming tasks that are updated regularly.

Source: livebench8 open models ranked+29 proprietaryData through Jun 2026

Open models All models

All models ranked on LiveBench Coding

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#	Model	Score
1	Claude Fable 5 Max · proprietary	86.0
2	GPT 5.6 Sol Max · proprietary	83.9
3	GPT 5.2 Codex · proprietary	83.6
4	GPT 5.6 Luna Max · proprietary	82.9
5	Claude Fable 5 · proprietary	82.5
6	GPT 5.5 · proprietary	82.2
7	Claude Opus 4.7 · proprietary	82.1
8	GPT 5.6 Sol · proprietary	81.8
9	Kimi K3 · proprietary	81.5
10	Claude Sonnet 5 · proprietary	80.7
11	Claude Opus 4.5 · proprietary	79.7
12	GLM 5.2 · 753.3B	79.7
13	Claude Opus 4.8 · proprietary	79.3
14	Claude Sonnet 4.6 · proprietary	79.3
15	Kimi K2.6 · 1058.6B	78.6
16	GPT 5.6 Terra Max · proprietary	78.3
17	Claude Opus 4.6 · proprietary	78.2
18	Gemini 3.5 Flash · proprietary	78.2
19	Qwen3.6 Plus · proprietary	78.2
20	GPT 5.4 · proprietary	77.5
21	Muse Spark 1.1 · proprietary	77.2
22	GPT 5.6 Luna · proprietary	76.7
23	Gemini 3.1 Pro · proprietary	76.5
24	GPT 5.2 · proprietary	76.1
25	GPT 5.6 Terra · proprietary	75.4
26	Qwen3.7 Max · proprietary	74.2
27	Kimi K2.7 Code · 1058.6B	74.0
28	Qwen3.6 27B · 27.8B	71.8
29	GPT 5.4 Mini · proprietary	71.6
30	Inkling · 952.4B	71.0
31	GPT 5.4 Nano · proprietary	70.8
32	DeepSeek V4 Pro · 861.6B	70.0
33	Grok 4.3 · proprietary	69.9
34	DeepSeek V4 Flash · 158.1B	69.2
35	Grok 4.5 · proprietary	68.6
36	MiniMax M3 · 427.0B	68.2
37	Grok Build 0.1 · proprietary	65.4

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

LiveBench Coding: frequently asked questions

What is the best open LLM on LiveBench Coding?: GLM 5.2 is the top open model on LiveBench Coding, scoring 79.7. Among all models tested — including proprietary ones — it ranks #11. The top model overall is Claude Fable 5 Max (Anthropic) at 86.0.
What's the best LiveBench Coding model you can run on a 24 GB GPU?: Qwen3.6 27B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 15 GB), scoring 71.8 on LiveBench Coding.
Can open models match proprietary models on LiveBench Coding?: Not quite on LiveBench Coding: the strongest proprietary model (Claude Fable 5 Max) scores 86.0, ahead of the best open model (GLM 5.2) at 79.7 — but you can run the open one yourself.

Scores aggregated from livebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.