NVIDIA Nemotron 3 Super 120B A12B NVFP4 — Hardware Requirements & GPU Compatibility
ChatNVIDIA Nemotron 3 Super 120B A12B NVFP4 is a large-scale mixture-of-experts model compressed to roughly 67.2 billion parameters of effective memory usage through NVIDIA's NVFP4 quantization. With 12 billion parameters active per token from a 120 billion parameter pool, it delivers flagship-tier intelligence in a more accessible package. This is where the MoE architecture and aggressive quantization really shine together. A model that would normally require data center hardware becomes feasible on high-end consumer GPUs or multi-GPU setups. The NVFP4 format is purpose-built for NVIDIA silicon, keeping quality surprisingly close to the full-precision version.
Specifications
- Publisher
- NVIDIA
- Parameters
- 67.2B
- Architecture
- NemotronHForCausalLM
- Context Length
- 262,144 tokens
- Vocabulary Size
- 131,072
- Release Date
- 2026-03-14
- License
- Other
Get Started
How Much VRAM Does NVIDIA Nemotron 3 Super 120B A12B NVFP4 Need?
Select a quantization to see compatible GPUs below.
| Quantization | Bits | VRAM | + Context | File Size | Quality |
|---|---|---|---|---|---|
| BF16 | 16.00 | 134.9 GB | 158.4 GB | 134.46 GB | Brain floating point 16 — preferred for training |
Which GPUs Can Run NVIDIA Nemotron 3 Super 120B A12B NVFP4?
BF16 · 134.9 GBNVIDIA Nemotron 3 Super 120B A12B NVFP4 (BF16) requires 134.9 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 176+ GB is recommended. Using the full 262K context window can add up to 23.4 GB, bringing total usage to 158.4 GB. No single GPU has enough memory — multi-GPU or cluster setups are needed.
Which Devices Can Run NVIDIA Nemotron 3 Super 120B A12B NVFP4?
BF16 · 134.9 GB4 devices with unified memory can run NVIDIA Nemotron 3 Super 120B A12B NVFP4, including NVIDIA DGX H100, NVIDIA DGX A100 640GB, Mac Pro M2 Ultra (192 GB).
Runs great
— Plenty of headroomDecent
— Enough memory, may be tightRelated Models
Frequently Asked Questions
- How much VRAM does NVIDIA Nemotron 3 Super 120B A12B NVFP4 need?
NVIDIA Nemotron 3 Super 120B A12B NVFP4 requires 134.9 GB of VRAM at BF16. Full 262K context adds up to 23.4 GB (158.4 GB total).
VRAM = Weights + KV Cache + Overhead
Weights = 67.2B × 16 bits ÷ 8 = 134.5 GB
KV Cache + Overhead ≈ 0.4 GB (at 2K context + ~0.3 GB framework)
KV Cache + Overhead ≈ 23.9 GB (at full 262K context)
VRAM usage by quantization
BF16134.9 GBBF16 + full context158.4 GB- Can NVIDIA GeForce RTX 5090 run NVIDIA Nemotron 3 Super 120B A12B NVFP4?
No — NVIDIA Nemotron 3 Super 120B A12B NVFP4 requires at least 134.9 GB at BF16, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.
- Can I run NVIDIA Nemotron 3 Super 120B A12B NVFP4 on a Mac?
NVIDIA Nemotron 3 Super 120B A12B NVFP4 requires at least 134.9 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
- Can I run NVIDIA Nemotron 3 Super 120B A12B NVFP4 locally?
Yes — NVIDIA Nemotron 3 Super 120B A12B NVFP4 can run locally on consumer hardware. At BF16 quantization it needs 134.9 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
- How fast is NVIDIA Nemotron 3 Super 120B A12B NVFP4?
At BF16, NVIDIA Nemotron 3 Super 120B A12B NVFP4 can reach ~22 tok/s on AMD Instinct MI300X. Speed depends mainly on GPU memory bandwidth. Real-world results typically within ±20%.
tok/s = (bandwidth GB/s ÷ model GB) × efficiency
Example: AMD Instinct MI300X → 5300 ÷ 134.9 × 0.55 = ~22 tok/s
Estimated speed at BF16 (134.9 GB)
AMD Instinct MI300X~22 tok/sReal-world results typically within ±20%. Speed depends on batch size, quantization kernel, and software stack.
- What's the download size of NVIDIA Nemotron 3 Super 120B A12B NVFP4?
At BF16, the download is about 134.46 GB.