Coding

SWE-bench Lite Leaderboard

SWE-bench Lite is a smaller, lower-cost subset of SWE-bench focused on self-contained bug fixes. It is the quickest of the SWE-bench boards to run and a common entry point for comparing coding agents.

Source: swebench3 open models ranked+77 proprietaryData through Sep 2025

Open models ranked on SWE-bench Lite

# shows rank among open models / rank overall (including proprietary).

#ModelScore
1 / 8Qwen3 Coder 30B A3B Instruct · 30.5B
49.7%
2 / 34DeepSeek v3 · 684.5B
36.7%
3 / 46DeepSeek V3.2 · 685.4B
30.7%

SWE-bench Lite: frequently asked questions

What is the best open LLM on SWE-bench Lite?
Qwen3 Coder 30B A3B Instruct is the top open model on SWE-bench Lite, scoring 49.7%. Among all models tested — including proprietary ones — it ranks #8. The top model overall is ExpeRepair-v1.0 + Claude 4 Sonnet at 60.3%.
What's the best SWE-bench Lite model you can run on a 24 GB GPU?
Qwen3 Coder 30B A3B Instruct is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 17 GB), scoring 49.7% on SWE-bench Lite.
Can open models match proprietary models on SWE-bench Lite?
Not quite on SWE-bench Lite: the strongest proprietary model (ExpeRepair-v1.0 + Claude 4 Sonnet) scores 60.3%, ahead of the best open model (Qwen3 Coder 30B A3B Instruct) at 49.7% — but you can run the open one yourself.

Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.