Coding
SWE-bench Lite Leaderboard
SWE-bench Lite is a smaller, lower-cost subset of SWE-bench focused on self-contained bug fixes. It is the quickest of the SWE-bench boards to run and a common entry point for comparing coding agents.
Source: swebench3 open models ranked+77 proprietaryData through Sep 2025
Open models ranked on SWE-bench Lite
# shows rank among open models / rank overall (including proprietary).
| # | Model | Score |
|---|---|---|
| 1 / 8 | Qwen3 Coder 30B A3B Instruct · 30.5B | 49.7% |
| 2 / 34 | DeepSeek v3 · 684.5B | 36.7% |
| 3 / 46 | DeepSeek V3.2 · 685.4B | 30.7% |
SWE-bench Lite: frequently asked questions
- What is the best open LLM on SWE-bench Lite?
- Qwen3 Coder 30B A3B Instruct is the top open model on SWE-bench Lite, scoring 49.7%. Among all models tested — including proprietary ones — it ranks #8. The top model overall is ExpeRepair-v1.0 + Claude 4 Sonnet at 60.3%.
- What's the best SWE-bench Lite model you can run on a 24 GB GPU?
- Qwen3 Coder 30B A3B Instruct is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 17 GB), scoring 49.7% on SWE-bench Lite.
- Can open models match proprietary models on SWE-bench Lite?
- Not quite on SWE-bench Lite: the strongest proprietary model (ExpeRepair-v1.0 + Claude 4 Sonnet) scores 60.3%, ahead of the best open model (Qwen3 Coder 30B A3B Instruct) at 49.7% — but you can run the open one yourself.
Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.