Coding

SWE-bench Lite Leaderboard

SWE-bench Lite is a smaller, lower-cost subset of SWE-bench focused on self-contained bug fixes. It is the quickest of the SWE-bench boards to run and a common entry point for comparing coding agents.

Source: swebench3 open models ranked+77 proprietaryData through Sep 2025

All models ranked on SWE-bench Lite

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#ModelScore
1ExpeRepair-v1.0 + Claude 4 Sonnet · proprietary
60.3%
2Refact.ai Agent · proprietary
60.0%
3KGCompass + Claude 4 Sonnet (20250514) · proprietary
58.3%
4SWE-agent + Claude 4 Sonnet · proprietary
56.7%
5Isoform · proprietary
55.0%
6SemAgent_Multi-v1.0 · proprietary
51.7%
7Isea · proprietary
51.3%
8Qwen3 Coder 30B A3B Instruct · 30.5B
49.7%
9Blackbox AI Agent · proprietary
49.0%
10Codev · proprietary
49.0%
11Gru(2024-12-08) · proprietary
48.7%
12ExpeRepair-v1.0 · proprietary
48.3%
13Globant Code Fixer Agent · proprietary
48.3%
14SWE-agent + Claude 3.7 Sonnet · proprietary
48.0%
15devlo · proprietary
47.3%
16DARS Agent · proprietary
47.0%
17KGCompass + Claude 3.5 Sonnet (20241022) · proprietary
46.0%
18Kodu-v1 + Claude-3.5 Sonnet (20241022) · proprietary
44.7%
19CodeFuse-CGM · proprietary
44.0%
20CodeStory Aide + Mixed Models · proprietary
43.0%
21Lingxi · proprietary
42.7%
22Codart AI · proprietary
41.7%
23OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022) · proprietary
41.7%
24PatchKitty-0.9 + Claude-3.5 Sonnet (20241022) · proprietary
41.3%
25Composio SWE-Kit (2024-10-30) · proprietary
41.0%
26OrcaLoca + Agentless-1.5 + Claude-3.5 Sonnet (20241022) · proprietary
41.0%
27Agentless-1.5 + Claude-3.5 Sonnet (20241022) · proprietary
40.7%
28OpenCSG Starship Agentic Coder + GPT 4 (0806) · proprietary
39.7%
29Bytedance MarsCode Agent · proprietary
39.3%
30Moatless Tools + Claude 3.5 Sonnet (20241022) · proprietary
39.0%
31Honeycomb · proprietary
38.3%
32AbanteAI MentatBot + GPT 4o (2024-05-13) · proprietary
38.0%
33Patched.Codes Patchwork · proprietary
37.0%
34DeepSeek v3 · 684.5B
36.7%
35AppMap Navie v2 · proprietary
36.0%
36CodeFuse-AAIS · proprietary
35.7%
37Gru(2024-08-11) · proprietary
35.7%
38Bytedance MarsCode Agent + GPT 4o (2024-05-13) · proprietary
34.0%
39SuperCoder2.0 · proprietary
34.0%
40Alibaba Lingma Agent · proprietary
33.0%
41Agentless Lite + O3 Mini (20250214) · proprietary
32.3%
42Agentless-1.5 + GPT 4o (2024-05-13) · proprietary
32.0%
43CodeShellTester + GPT 4o (2024-05-13) · proprietary
31.3%
44Factory Code Droid · proprietary
31.3%
45AutoCodeRover (v20240620) + GPT 4o (2024-05-13) · proprietary
30.7%
46DeepSeek V3.2 · 685.4B
30.7%
47Aegis - o3-mini_1.0 · proprietary
30.3%
48AIGCode Infant-Coder(2024-08-30) · proprietary
30.0%
49Kortix AI (claude-3-5-sonnet-20241022) · proprietary
30.0%
50Agentless + RepoGraph + GPT-4o · proprietary
29.7%
51Amazon Q Developer Agent (v20240719-dev) · proprietary
29.7%
52CodeR + GPT 4 (1106) · proprietary
28.3%
53reproducedRG · proprietary
28.0%
54SIMA + GPT 4o (2024-05-13) · proprietary
27.7%
55Agentless + GPT 4o (2024-05-13) · proprietary
27.3%
56MASAI + GPT 4o (2024-05-13) · proprietary
27.3%
57IBM Research Agent-101 · proprietary
26.7%
58Moatless Tools + Claude 3.5 Sonnet · proprietary
26.7%
59OpenHands + CodeAct v1.8 · proprietary
26.7%
60Aider + GPT 4o & Claude 3 Opus · proprietary
26.3%
61HyperAgent · proprietary
25.3%
62Moatless Tools + GPT 4o (2024-05-13) · proprietary
24.7%
63Qwen 2.5 · proprietary
24.7%
64IBM AI Agent SWE-1.0 (with open LLMs) · proprietary
23.7%
65OpenCSG StarShip CodeGenAgent + GPT 4 (0613) · proprietary
23.7%
66SWE-agent + Claude 3.5 Sonnet · proprietary
23.0%
67AppMap Navie + GPT 4o (2024-05-13) · proprietary
21.7%
68Bytedance AutoSE (based on SWE-Agent) + GPT4/GPT4o Mixed (20240828) · proprietary
21.7%
69Amazon Q Developer Agent (v20240430-dev) · proprietary
20.3%
70AutoCodeRover (v20240408) + GPT 4 (0125) · proprietary
19.0%
71SWE-agent + GPT 4o (2024-05-13) · proprietary
18.3%
72SWE-agent + GPT 4 (1106) · proprietary
18.0%
73MCTS-Refine-7B · proprietary
16.3%
74SWE-agent + Claude 3 Opus · proprietary
11.7%
75RAG + Claude 3 Opus · proprietary
4.3%
76RAG + Claude 2 · proprietary
3.0%
77RAG + GPT 4 (1106) · proprietary
2.7%
78RAG + SWE-Llama 7B · proprietary
1.3%
79RAG + SWE-Llama 13B · proprietary
1.0%
80RAG + ChatGPT 3.5 · proprietary
0.3%

SWE-bench Lite: frequently asked questions

What is the best open LLM on SWE-bench Lite?
Qwen3 Coder 30B A3B Instruct is the top open model on SWE-bench Lite, scoring 49.7%. Among all models tested — including proprietary ones — it ranks #8. The top model overall is ExpeRepair-v1.0 + Claude 4 Sonnet at 60.3%.
What's the best SWE-bench Lite model you can run on a 24 GB GPU?
Qwen3 Coder 30B A3B Instruct is the highest-scoring open model that fits in 24 GB at 4-bit quantization (about 17 GB), scoring 49.7% on SWE-bench Lite.
Can open models match proprietary models on SWE-bench Lite?
Not quite on SWE-bench Lite: the strongest proprietary model (ExpeRepair-v1.0 + Claude 4 Sonnet) scores 60.3%, ahead of the best open model (Qwen3 Coder 30B A3B Instruct) at 49.7% — but you can run the open one yourself.

Scores aggregated from swebench. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.