Coding
SWE-bench Lite Leaderboard
SWE-bench Lite is a smaller, lower-cost subset of SWE-bench focused on self-contained bug fixes. It is the quickest of the SWE-bench boards to run and a common entry point for comparing coding agents.
Source: swebench3 open models ranked+77 proprietaryData through Sep 2025
All models ranked on SWE-bench Lite
Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.
| # | Model | Score |
|---|---|---|
| 1 | ExpeRepair-v1.0 + Claude 4 Sonnet · proprietary | 60.3% |
| 2 | Refact.ai Agent · proprietary | 60.0% |
| 3 | KGCompass + Claude 4 Sonnet (20250514) · proprietary | 58.3% |
| 4 | SWE-agent + Claude 4 Sonnet · proprietary | 56.7% |
| 5 | Isoform · proprietary | 55.0% |
| 6 | SemAgent_Multi-v1.0 · proprietary | 51.7% |
| 7 | Isea · proprietary | 51.3% |
| 8 | Qwen3 Coder 30B A3B Instruct · 30.5B | 49.7% |
| 9 | Blackbox AI Agent · proprietary | 49.0% |
| 10 | Codev · proprietary | 49.0% |
| 11 | Gru(2024-12-08) · proprietary | 48.7% |
| 12 | ExpeRepair-v1.0 · proprietary | 48.3% |
| 13 | Globant Code Fixer Agent · proprietary | 48.3% |
| 14 | SWE-agent + Claude 3.7 Sonnet · proprietary | 48.0% |
| 15 | devlo · proprietary | 47.3% |
| 16 | DARS Agent · proprietary | 47.0% |
| 17 | KGCompass + Claude 3.5 Sonnet (20241022) · proprietary | 46.0% |
| 18 | Kodu-v1 + Claude-3.5 Sonnet (20241022) · proprietary | 44.7% |
| 19 | CodeFuse-CGM · proprietary | 44.0% |
| 20 | CodeStory Aide + Mixed Models · proprietary | 43.0% |
| 21 | Lingxi · proprietary | 42.7% |
| 22 | Codart AI · proprietary | 41.7% |
| 23 | OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022) · proprietary | 41.7% |
| 24 | PatchKitty-0.9 + Claude-3.5 Sonnet (20241022) · proprietary | 41.3% |
| 25 | Composio SWE-Kit (2024-10-30) · proprietary | 41.0% |
| 26 | OrcaLoca + Agentless-1.5 + Claude-3.5 Sonnet (20241022) · proprietary | 41.0% |
| 27 | Agentless-1.5 + Claude-3.5 Sonnet (20241022) · proprietary | 40.7% |
| 28 | OpenCSG Starship Agentic Coder + GPT 4 (0806) · proprietary | 39.7% |
| 29 | Bytedance MarsCode Agent · proprietary | 39.3% |
| 30 | Moatless Tools + Claude 3.5 Sonnet (20241022) · proprietary | 39.0% |
| 31 | Honeycomb · proprietary | 38.3% |
| 32 | AbanteAI MentatBot + GPT 4o (2024-05-13) · proprietary | 38.0% |
| 33 | Patched.Codes Patchwork · proprietary | 37.0% |
| 34 | DeepSeek v3 · 684.5B | 36.7% |
| 35 | AppMap Navie v2 · proprietary | 36.0% |
| 36 | CodeFuse-AAIS · proprietary | 35.7% |
| 37 | Gru(2024-08-11) · proprietary | 35.7% |
| 38 | Bytedance MarsCode Agent + GPT 4o (2024-05-13) · proprietary | 34.0% |
| 39 | SuperCoder2.0 · proprietary | 34.0% |
| 40 | Alibaba Lingma Agent · proprietary | 33.0% |
| 41 | Agentless Lite + O3 Mini (20250214) · proprietary | 32.3% |
| 42 | Agentless-1.5 + GPT 4o (2024-05-13) · proprietary | 32.0% |
| 43 | CodeShellTester + GPT 4o (2024-05-13) · proprietary | 31.3% |
| 44 | Factory Code Droid · proprietary | 31.3% |
| 45 | AutoCodeRover (v20240620) + GPT 4o (2024-05-13) · proprietary | 30.7% |
| 46 | DeepSeek V3.2 · 685.4B | 30.7% |
| 47 | Aegis - o3-mini_1.0 · proprietary | 30.3% |
| 48 | AIGCode Infant-Coder(2024-08-30) · proprietary | 30.0% |
| 49 | Kortix AI (claude-3-5-sonnet-20241022) · proprietary | 30.0% |
| 50 | Agentless + RepoGraph + GPT-4o · proprietary | 29.7% |
| 51 | Amazon Q Developer Agent (v20240719-dev) · proprietary | 29.7% |
| 52 | CodeR + GPT 4 (1106) · proprietary | 28.3% |
| 53 | reproducedRG · proprietary | 28.0% |
| 54 | SIMA + GPT 4o (2024-05-13) · proprietary | 27.7% |
| 55 | Agentless + GPT 4o (2024-05-13) · proprietary | 27.3% |
| 56 | MASAI + GPT 4o (2024-05-13) · proprietary | 27.3% |
| 57 | IBM Research Agent-101 · proprietary | 26.7% |
| 58 | Moatless Tools + Claude 3.5 Sonnet · proprietary | 26.7% |
| 59 | OpenHands + CodeAct v1.8 · proprietary | 26.7% |
| 60 | Aider + GPT 4o & Claude 3 Opus · proprietary | 26.3% |
| 61 | HyperAgent · proprietary | 25.3% |
| 62 | Moatless Tools + GPT 4o (2024-05-13) · proprietary | 24.7% |
| 63 | Qwen 2.5 · proprietary | 24.7% |
| 64 | IBM AI Agent SWE-1.0 (with open LLMs) · proprietary | 23.7% |
| 65 | OpenCSG StarShip CodeGenAgent + GPT 4 (0613) · proprietary | 23.7% |
| 66 | SWE-agent + Claude 3.5 Sonnet · proprietary | 23.0% |
| 67 | AppMap Navie + GPT 4o (2024-05-13) · proprietary | 21.7% |
| 68 | Bytedance AutoSE (based on SWE-Agent) + GPT4/GPT4o Mixed (20240828) · proprietary | 21.7% |
| 69 | Amazon Q Developer Agent (v20240430-dev) · proprietary | 20.3% |
| 70 | AutoCodeRover (v20240408) + GPT 4 (0125) · proprietary | 19.0% |
| 71 | SWE-agent + GPT 4o (2024-05-13) · proprietary | 18.3% |
| 72 | SWE-agent + GPT 4 (1106) · proprietary | 18.0% |
| 73 | MCTS-Refine-7B · proprietary | 16.3% |
| 74 | SWE-agent + Claude 3 Opus · proprietary | 11.7% |
| 75 | RAG + Claude 3 Opus · proprietary | 4.3% |
| 76 | RAG + Claude 2 · proprietary | 3.0% |
| 77 | RAG + GPT 4 (1106) · proprietary | 2.7% |
| 78 | RAG + SWE-Llama 7B · proprietary | 1.3% |
| 79 | RAG + SWE-Llama 13B · proprietary | 1.0% |
| 80 | RAG + ChatGPT 3.5 · proprietary | 0.3% |
SWE-bench Lite: frequently asked questions
- What is the best open LLM on SWE-bench Lite?
- Qwen3 Coder 30B A3B Instruct is the top open model on SWE-bench Lite, scoring 49.7%. Among all models tested — including proprietary ones — it ranks #8. The top model overall is ExpeRepair-v1.0 + Claude 4 Sonnet at 60.3%.
- What's the best SWE-bench Lite model you can run on a 24 GB GPU?
- Qwen3 Coder 30B A3B Instruct is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 17 GB), scoring 49.7% on SWE-bench Lite.
- Can open models match proprietary models on SWE-bench Lite?
- Not quite on SWE-bench Lite: the strongest proprietary model (ExpeRepair-v1.0 + Claude 4 Sonnet) scores 60.3%, ahead of the best open model (Qwen3 Coder 30B A3B Instruct) at 49.7% — but you can run the open one yourself.
Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.