GPT OSS 120B — Benchmarks

Benchmark scores for GPT OSS 120B aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Coding

BenchmarkScoreRank
Terminal-Bench14.2%#10 / 11

Reasoning

BenchmarkScoreRank
SimpleBench22.1%#9 / 10

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.