Coding
SWE-bench Verified Leaderboard
SWE-bench Verified tests whether a model can resolve real GitHub issues from popular open-source Python projects. It is a human-validated subset focused on realistic software-engineering tasks.
Source: epoch2 open models ranked+28 proprietaryData through May 2026
Open models ranked on SWE-bench Verified
# shows rank among open models / rank overall (including proprietary).
SWE-bench Verified: frequently asked questions
- What is the best open LLM on SWE-bench Verified?
- GLM 5.1 is the top open model on SWE-bench Verified, scoring 74.2%. Among all models tested — including proprietary ones — it ranks #13.
- Can open models match proprietary models on SWE-bench Verified?
- Not quite on SWE-bench Verified: the strongest proprietary model (claude-opus-4-7_max) scores 83.5%, ahead of the best open model (GLM 5.1) at 74.2% — but you can run the open one yourself.
Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.