How much VRAM do I need to run a Nemotron model?

The smallest Nemotron model, OpenMath Nemotron 1.5B, runs from 1.0 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Nemotron models can I run on a 16 GB GPU?

22 of 34 Nemotron models fit in 16 GB of VRAM at some quantization, including NVIDIA Nemotron 3 Nano 30B A3B BF16, Nemotron 3 Nano Omni 30B A3B Reasoning BF16, NVIDIA Nemotron Nano 9B v2 Japanese.

What is the most popular Nemotron model to run locally?

NVIDIA Nemotron 3 Nano 30B A3B BF16 is the most downloaded Nemotron model in local-friendly quantized formats. It runs from 9.1 GB of VRAM.

Nemotron Models — Hardware Requirements

30 Nemotron models from NVIDIA and the community, from the smallest that runs in 1.0 GB of VRAM up to 560.5B parameters. Every row links to full quantization tables and GPU compatibility.

All Nemotron Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
OpenMath Nemotron 1.5B	1.5B	1.0 GB	131K	NVIDIA	—
Nemotron Labs Audex 2B	2B	4.4 GB	—	NVIDIA	—
Nemotron Flash 3B	2.7B	6.0 GB	29K	NVIDIA	—
Nemotron Labs Diffusion 3B	3.8B	8.1 GB	262K	NVIDIA	—
NVIDIA Nemotron 3 Nano 4B BF16	4.0B	2.2 GB	262K	NVIDIA	339
Nemotron Mini 4B Instruct	4B	1.8 GB	4K	NVIDIA	2.2K
Nemotron Content Safety Reasoning 4B	4.3B	2.5 GB	131K	NVIDIA	—
Nemotron Cascade 8B	8B	4 GB	33K	NVIDIA	—
Nemotron H 8B Reasoning 128K	8.1B	17.8 GB	—	NVIDIA	—
Nemotron Orchestrator 8B	8.2B	4.1 GB	41K	NVIDIA	—
Nemotron Terminal 8B	8.2B	4.1 GB	41K	NVIDIA	—
Nemotron Labs Diffusion 8B	8.5B	17.6 GB	262K	NVIDIA	—
NVIDIA Nemotron Nano 9B v2 Japanese	8.9B	4.4 GB	131K	NVIDIA	19.6K
NVIDIA Nemotron Nano 9B v2	8.9B	4.5 GB	131K	NVIDIA	1.3K
NVIDIA Nemotron Nano 12B v2	12B	26.4 GB	—	NVIDIA	—
Nemotron Labs Diffusion 14B	13.5B	6.5 GB	262K	NVIDIA	—
Nemotron Terminal 14B	14.8B	6.9 GB	41K	NVIDIA	—
Nemotron Labs Audex 30B A3B	30B	14.0 GB	—	NVIDIA	4.8K
Elbaz NVIDIA Nemotron 3 Nano 30B A3B PRISM	30B	14.0 GB	—	Ex0bit	—
NVIDIA Nemotron 3 Nano 30B A3B BF16	31.6B	9.1 GB	262K	NVIDIA	1.4M
Nemotron Cascade 2 30B A3B	31.6B	9.1 GB	262K	NVIDIA	8.6K
NVIDIA Nemotron 3 Nano 30B A3B Base BF16	31.6B	69.5 GB	—	NVIDIA	—
Nemotron Terminal 32B	32.8B	14.6 GB	41K	NVIDIA	—
OpenReasoning Nemotron 32B	32.8B	14.8 GB	131K	NVIDIA	—
OpenCodeReasoning Nemotron 1.1 32B	32.8B	14.8 GB	66K	NVIDIA	—
Nemotron 3 Nano Omni 30B A3B Reasoning BF16	33.0B	10.0 GB	—	NVIDIA	25.3K
Nemotron H 47B Reasoning 128K	46.8B	102.9 GB	—	NVIDIA	—
NVIDIA Nemotron Labs 3 Puzzle 75B A9B BF16	75.4B	165.8 GB	262K	NVIDIA	2.9K
NVIDIA Nemotron 3 Super 120B A12B BF16 Heretic	120.7B	51.8 GB	262K	trohrbaugh	1.2K
NVIDIA Nemotron 3 Super 120B A12B BF16	123.6B	34.5 GB	262K	NVIDIA	55.0K
NVIDIA Nemotron 3 Super 120B A12B Base BF16	123.6B	53.0 GB	1049K	NVIDIA	—
NVIDIA Nemotron 3 Ultra 550B A55B BF16	560.5B	169.6 GB	262K	NVIDIA	15.7K
NVIDIA Nemotron 3 Ultra 550B A55B Base BF16	560.5B	262.1 GB	262K	NVIDIA	20
NVIDIA Nemotron 3 Ultra 550B A55B GenRM	560.5B	262.1 GB	262K	NVIDIA	—

Frequently Asked Questions

How much VRAM do I need to run a Nemotron model?: The smallest Nemotron model, OpenMath Nemotron 1.5B, runs from 1.0 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Nemotron models can I run on a 16 GB GPU?: 22 of 34 Nemotron models fit in 16 GB of VRAM at some quantization, including NVIDIA Nemotron 3 Nano 30B A3B BF16, Nemotron 3 Nano Omni 30B A3B Reasoning BF16, NVIDIA Nemotron Nano 9B v2 Japanese.
What is the most popular Nemotron model to run locally?: NVIDIA Nemotron 3 Nano 30B A3B BF16 is the most downloaded Nemotron model in local-friendly quantized formats. It runs from 9.1 GB of VRAM.