Coding

SWE-bench Verified Leaderboard

SWE-bench Verified tests whether a model can resolve real GitHub issues from popular open-source Python projects. It is a human-validated subset focused on realistic software-engineering tasks.

Source: epoch2 open models ranked+28 proprietaryData through May 2026

Open models ranked on SWE-bench Verified

# shows rank among open models / rank overall (including proprietary).

#ModelScore
1 / 13GLM 5.1 · 753.9B
74.2%
2 / 19GLM 5 · 753.9B
72.1%

SWE-bench Verified: frequently asked questions

What is the best open LLM on SWE-bench Verified?
GLM 5.1 is the top open model on SWE-bench Verified, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13.
Can open models match proprietary models on SWE-bench Verified?
Not quite on SWE-bench Verified: the strongest proprietary model (claude-opus-4-7_max) scores 83.5%, ahead of the best open model (GLM 5.1) at 74.2% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.