What is the best open LLM on SWE-bench Multilingual?

GLM 5 is the top open model on SWE-bench Multilingual, scoring 69.7%. Among all models tested — including proprietary ones — it ranks #4. The top model overall is Gemini 3 Flash (Google) at 72.7%.

Can open models match proprietary models on SWE-bench Multilingual?

Not quite on SWE-bench Multilingual: the strongest proprietary model (Gemini 3 Flash) scores 72.7%, ahead of the best open model (GLM 5) at 69.7% — but you can run the open one yourself.

Coding

SWE-bench Multilingual Leaderboard

Name: SWE-bench Multilingual — open LLM scores
Creator: swebench

SWE-bench Multilingual extends SWE-bench beyond Python to real GitHub issues across many programming languages, measuring whether a model can fix bugs in codebases written in Java, Go, Rust, TypeScript and more.

Source: swebench4 open models ranked+10 proprietaryData through Feb 2026

Open models All models

All models ranked on SWE-bench Multilingual

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#	Model	Score
1	Gemini 3 Flash · proprietary	72.7%
2	Claude 4.6 Opus · proprietary	72.0%
3	Claude 4.5 Opus · proprietary	70.7%
4	GLM 5 · 753.9B	69.7%
5	Gemini 3 Pro · proprietary	68.7%
6	MiniMax M2.5 · 228.7B	68.3%
7	Kimi K2.5 · 1058.6B	67.3%
8	Claude 4.5 Sonnet · proprietary	67.0%
9	GPT-5.2 (high reasoning) · proprietary	66.7%
10	GPT 5.2 Codex · proprietary	66.3%
11	GPT-5-2 Codex · proprietary	66.3%
12	Claude 4.5 Haiku · proprietary	64.7%
13	DeepSeek V3.2 · 685.4B	59.0%
14	GPT-5 mini · proprietary	39.7%

SWE-bench Multilingual: frequently asked questions

What is the best open LLM on SWE-bench Multilingual?: GLM 5 is the top open model on SWE-bench Multilingual, scoring 69.7%. Among all models tested — including proprietary ones — it ranks #4. The top model overall is Gemini 3 Flash (Google) at 72.7%.
Can open models match proprietary models on SWE-bench Multilingual?: Not quite on SWE-bench Multilingual: the strongest proprietary model (Gemini 3 Flash) scores 72.7%, ahead of the best open model (GLM 5) at 69.7% — but you can run the open one yourself.

Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.