What is the best open LLM on Aider Polyglot?

DeepSeek V3.2 Exp is the top open model on Aider Polyglot, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13. The top model overall is GPT 5 (Aug 07, 2025, high) (OpenAI) at 88.0%.

What's the best Aider Polyglot model you can run on a 24 GB GPU?

Qwen3 32B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 40.0% on Aider Polyglot.

Can open models match proprietary models on Aider Polyglot?

Not quite on Aider Polyglot: the strongest proprietary model (GPT 5 (Aug 07, 2025, high)) scores 88.0%, ahead of the best open model (DeepSeek V3.2 Exp) at 74.2% — but you can run the open one yourself.

Coding

Aider Polyglot Leaderboard

Name: Aider Polyglot — open LLM scores
Creator: epoch

The Aider Polyglot benchmark measures real-world coding across several programming languages: the model edits code to solve Exercism exercises, and is scored on whether the final solution actually runs and passes the tests.

Source: epoch18 open models ranked+51 proprietaryData through Dec 2025

Open models All models

Open models ranked on Aider Polyglot

# shows rank among open models / rank overall (including proprietary).

#	Model	Score
1 / 14	DeepSeek V3.2 Exp · 685.4B	74.2%
2 / 19	DeepSeek R1 0528 · 684.5B	71.4%
3 / 27	Qwen3 235B A22B · 235.1B	59.6%
4 / 28	Qwen3 235B A22B Instruct 2507 · 235.1B	59.6%
5 / 29	Kimi K2 Instruct · 1026.5B	59.1%
6 / 30	Kimi K2 Instruct 0905 · 1026.5B	59.1%
7 / 31	DeepSeek R1 · 684.5B	56.9%
8 / 33	DeepSeek v3 0324 · 684.5B	55.1%
9 / 40	DeepSeek v3 · 684.5B	48.4%
10 / 45	GPT OSS 120B · 120.4B	41.8%
11 / 46	Qwen3 32B · 32.8B	40.0%
12 / 57	QwQ 32B · 32.8B	20.9%
13 / 60	DeepSeek V2.5 · 235.7B	17.8%
14 / 61	Qwen2.5 Coder 32B Instruct · 32.8B	16.4%
15 / 62	Llama 4 Maverick 17B 128E Instruct · 401.6B	15.6%
16 / 64	C4ai Command A 03 2025 · 111.1B	12.0%
17 / 66	Openhands Lm 32B v0.1 · 32.8B	10.2%
18 / 68	Gemma 3 27B IT · 27.4B	4.9%

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

Aider Polyglot: frequently asked questions

What is the best open LLM on Aider Polyglot?: DeepSeek V3.2 Exp is the top open model on Aider Polyglot, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13. The top model overall is GPT 5 (Aug 07, 2025, high) (OpenAI) at 88.0%.
What's the best Aider Polyglot model you can run on a 24 GB GPU?: Qwen3 32B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 40.0% on Aider Polyglot.
Can open models match proprietary models on Aider Polyglot?: Not quite on Aider Polyglot: the strongest proprietary model (GPT 5 (Aug 07, 2025, high)) scores 88.0%, ahead of the best open model (DeepSeek V3.2 Exp) at 74.2% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.