Question 1

How much VRAM do I need to run a GPT-OSS model?

Accepted Answer

The smallest GPT-OSS model, GPT OSS 20B, runs from 6.3 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which GPT-OSS models can I run on a 16 GB GPU?

Accepted Answer

6 of 7 GPT-OSS models fit in 16 GB of VRAM at some quantization, including GPT OSS 20B, Huihui GPT OSS 20B BF16 Abliterated, GPT OSS Safeguard 20B.

Question 3

What is the most popular GPT-OSS model to run locally?

Accepted Answer

GPT OSS 20B is the most downloaded GPT-OSS model in local-friendly quantized formats. It runs from 6.3 GB of VRAM.

Question 4

How do GPT-OSS models score on benchmarks?

Accepted Answer

GPT OSS 120B leads the family with an overall benchmark rating of 52.0/100, ranking #27 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Model	Params	Runs from	Context	Publisher	Quant downloads
Huihui GPT OSS 20B BF16 Abliterated	20.9B	9.3 GB	131K	huihui-ai	10.5K
GPT OSS 20B Heretic	20.9B	9.3 GB	131K	p-e-w	—
GPT OSS 20B	21.5B	6.3 GB	131K	OpenAI	759.6K
GPT OSS Safeguard 20B	21.5B	6.3 GB	131K	OpenAI	2.4K
GPT OSS 20B Heretic Ara v3	21.5B	9.5 GB	131K	p-e-w	—
GPT OSS 20B RichardErkhov Heresy	21.5B	9.5 GB	131K	MuXodious	—
GPT OSS 120B	120.4B	51.6 GB	131K	OpenAI	152.2K

GPT-OSS Models — Hardware Requirements

All GPT-OSS Models by Size

How GPT-OSS Compares — Benchmark Rating

Frequently Asked Questions