Coding
SWE-bench Multilingual Leaderboard
SWE-bench Multilingual extends SWE-bench beyond Python to real GitHub issues across many programming languages, measuring whether a model can fix bugs in codebases written in Java, Go, Rust, TypeScript and more.
Source: swebench4 open models ranked+10 proprietaryData through Feb 2026
All models ranked on SWE-bench Multilingual
Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.
| # | Model | Score |
|---|---|---|
| 1 | Gemini 3 Flash · proprietary | 72.7% |
| 2 | Claude 4.6 Opus · proprietary | 72.0% |
| 3 | Claude 4.5 Opus · proprietary | 70.7% |
| 4 | GLM 5 · 753.9B | 69.7% |
| 5 | Gemini 3 Pro · proprietary | 68.7% |
| 6 | MiniMax M2.5 · 228.7B | 68.3% |
| 7 | Kimi K2.5 · 1058.6B | 67.3% |
| 8 | Claude 4.5 Sonnet · proprietary | 67.0% |
| 9 | GPT-5.2 (high reasoning) · proprietary | 66.7% |
| 10 | GPT 5.2 Codex · proprietary | 66.3% |
| 11 | GPT-5-2 Codex · proprietary | 66.3% |
| 12 | Claude 4.5 Haiku · proprietary | 64.7% |
| 13 | DeepSeek V3.2 · 685.4B | 59.0% |
| 14 | GPT-5 mini · proprietary | 39.7% |
SWE-bench Multilingual: frequently asked questions
- What is the best open LLM on SWE-bench Multilingual?
- GLM 5 is the top open model on SWE-bench Multilingual, scoring 69.7%. Among all models tested — including proprietary ones — it ranks #4. The top model overall is Gemini 3 Flash (Google) at 72.7%.
- Can open models match proprietary models on SWE-bench Multilingual?
- Not quite on SWE-bench Multilingual: the strongest proprietary model (Gemini 3 Flash) scores 72.7%, ahead of the best open model (GLM 5) at 69.7% — but you can run the open one yourself.
Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.