What is the best open LLM on Aider Polyglot?

DeepSeek V3.2 Exp is the top open model on Aider Polyglot, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13. The top model overall is GPT 5 (Aug 07, 2025, high) (OpenAI) at 88.0%.

What's the best Aider Polyglot model you can run on a 24 GB GPU?

Qwen3 32B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 40.0% on Aider Polyglot.

Can open models match proprietary models on Aider Polyglot?

Not quite on Aider Polyglot: the strongest proprietary model (GPT 5 (Aug 07, 2025, high)) scores 88.0%, ahead of the best open model (DeepSeek V3.2 Exp) at 74.2% — but you can run the open one yourself.

Coding

Aider Polyglot Leaderboard

Name: Aider Polyglot — open LLM scores
Creator: epoch

The Aider Polyglot benchmark measures real-world coding across several programming languages: the model edits code to solve Exercism exercises, and is scored on whether the final solution actually runs and passes the tests.

Source: epoch18 open models ranked+51 proprietaryData through Dec 2025

Open models All models

All models ranked on Aider Polyglot

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#	Model	Score
1	GPT 5 (Aug 07, 2025, high) · proprietary	88.0%
2	GPT 5 (Aug 07, 2025, medium) · proprietary	86.7%
3	O3 Pro (Jun 10, 2025, high) · proprietary	84.9%
4	Gemini 2.5 Pro Preview (Jun 05, 32K) · proprietary	83.1%
5	GPT 5 (Aug 07, 2025, low) · proprietary	81.3%
6	O3 (Apr 16, 2025, high) · proprietary	81.3%
7	Grok 4 (Jul 09, high) · proprietary	79.6%
8	Grok 4 (Jul 09) · proprietary	79.6%
9	Gemini 2.5 Pro Preview (Jun 05) · proprietary	79.1%
10	Gemini 2.5 Pro Preview (May 06) · proprietary	76.9%
11	O3 (Apr 16, 2025, medium) · proprietary	76.9%
12	O3 (Apr 16, 2025, unspecified) · proprietary	76.9%
13	DeepSeek Reasoner · proprietary	74.2%
14	DeepSeek V3.2 Exp · 685.4B	74.2%
15	Gemini 2.5 Pro Exp (Mar 25) · proprietary	72.9%
16	Gemini 2.5 Pro Preview (Mar 25) · proprietary	72.9%
17	Claude Opus 4 (May 14, 2025, 32K) · proprietary	72.0%
18	O4 Mini (Apr 16, 2025, high) · proprietary	72.0%
19	DeepSeek R1 0528 · 684.5B	71.4%
20	Claude Opus 4 (May 14, 2025) · proprietary	70.7%
21	DeepSeek Chat · proprietary	70.2%
22	Claude 3.7 Sonnet (Feb 19, 2025, 32K) · proprietary	64.9%
23	O1 (Dec 17, 2024, high) · proprietary	61.7%
24	Claude Sonnet 4 (May 14, 2025, 32K) · proprietary	61.3%
25	Claude 3.7 Sonnet (Feb 19, 2025) · proprietary	60.4%
26	O3 Mini (Jan 31, 2025, high) · proprietary	60.4%
27	Qwen3 235B A22B · 235.1B	59.6%
28	Qwen3 235B A22B Instruct 2507 · 235.1B	59.6%
29	Kimi K2 Instruct · 1026.5B	59.1%
30	Kimi K2 Instruct 0905 · 1026.5B	59.1%
31	DeepSeek R1 · 684.5B	56.9%
32	Claude Sonnet 4 (May 14, 2025) · proprietary	56.4%
33	DeepSeek v3 0324 · 684.5B	55.1%
34	Gemini 2.5 Flash Preview (May 20, 23K) · proprietary	55.1%
35	O3 Mini (Jan 31, 2025, medium) · proprietary	53.8%
36	Grok 3 Beta · proprietary	53.3%
37	GPT 4.1 (Apr 14, 2025) · proprietary	52.4%
38	Claude 3.5 Sonnet (Oct 22, 2024) · proprietary	51.6%
39	Grok 3 Mini Beta (high) · proprietary	49.3%
40	DeepSeek v3 · 684.5B	48.4%
41	Gemini 2.5 Flash Preview (Apr 17) · proprietary	47.1%
42	ChatGPT 4o (Mar 27, 2025) · proprietary	45.3%
43	GPT 4.5 Preview (Feb 27, 2025) · proprietary	44.9%
44	Gemini 2.5 Flash Preview (May 20) · proprietary	44.0%
45	GPT OSS 120B · 120.4B	41.8%
46	Qwen3 32B · 32.8B	40.0%
47	Gemini Exp (Dec 06) · proprietary	38.2%
48	Gemini 2.0 Pro Exp (Feb 05) · proprietary	35.6%
49	Grok 3 Mini Beta (low) · proprietary	34.7%
50	O1 Mini (Sep 12, 2024, unspecified) · proprietary	32.9%
51	GPT 4.1 Mini (Apr 14, 2025) · proprietary	32.4%
52	Claude 3.5 Haiku (Oct 22, 2024) · proprietary	28.0%
53	ChatGPT 4o (Jan 29, 2025) · proprietary	27.1%
54	GPT 4o (Aug 06, 2024) · proprietary	23.1%
55	Gemini 2.0 Flash Exp · proprietary	22.2%
56	Qwen Max (Jan 25, 2025) · proprietary	21.8%
57	QwQ 32B · 32.8B	20.9%
58	Gemini 2.0 Flash Thinking Exp (Jan 21) · proprietary	18.2%
59	GPT 4o (Nov 20, 2024) · proprietary	18.2%
60	DeepSeek V2.5 · 235.7B	17.8%
61	Qwen2.5 Coder 32B Instruct · 32.8B	16.4%
62	Llama 4 Maverick 17B 128E Instruct · 401.6B	15.6%
63	Yi Lightning · proprietary	12.9%
64	C4ai Command A 03 2025 · 111.1B	12.0%
65	Codestral 2501 · proprietary	11.1%
66	Openhands Lm 32B v0.1 · 32.8B	10.2%
67	GPT 4.1 Nano (Apr 14, 2025) · proprietary	8.9%
68	Gemma 3 27B IT · 27.4B	4.9%
69	GPT 4o Mini (Jul 18, 2024) · proprietary	3.6%

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

Aider Polyglot: frequently asked questions

What is the best open LLM on Aider Polyglot?: DeepSeek V3.2 Exp is the top open model on Aider Polyglot, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13. The top model overall is GPT 5 (Aug 07, 2025, high) (OpenAI) at 88.0%.
What's the best Aider Polyglot model you can run on a 24 GB GPU?: Qwen3 32B is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 18 GB), scoring 40.0% on Aider Polyglot.
Can open models match proprietary models on Aider Polyglot?: Not quite on Aider Polyglot: the strongest proprietary model (GPT 5 (Aug 07, 2025, high)) scores 88.0%, ahead of the best open model (DeepSeek V3.2 Exp) at 74.2% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.