GPT OSS 120B — Benchmarks

Benchmark scores for GPT OSS 120B aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Overall rank: #27 of 73 open modelscomposite 52/100 across 9 benchmarks in 4 categories · methodology

Coding

Benchmark	Score	Open rank	All models
Terminal-Bench	18.7%	#14 / 16	#52 / 57
Aider Polyglot	41.8%	#10 / 18	#45 / 69
SWE-bench Bash Only	26.0%	#8 / 9	#42 / 48
SWE-bench Verified	26.0%	#13 / 13	#143 / 163

Knowledge

Benchmark	Score	Open rank	All models
SimpleQA	13.9%	#10 / 11	#59 / 65
MMLU-Pro	80.8%	#21 / 119	#61 / 259

Math

Benchmark	Score	Open rank	All models
AIME 2024/2025	88.9%	#6 / 34	#34 / 155

Reasoning

Benchmark	Score	Open rank	All models
GPQA Diamond	75.8%	#11 / 46	#77 / 182
SimpleBench	22.1%	#17 / 19	#83 / 90

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.