DeepSeek R1 0528 NVFP4 v2 — Hardware Requirements & GPU Compatibility
ChatReasoningDeepSeek R1 0528 NVFP4 v2 is NVIDIA's optimized quantization of the massive 393.6 billion parameter DeepSeek R1 reasoning model, using the NVFP4 format to make this behemoth more practical for local deployment. DeepSeek R1 is renowned for its strong chain-of-thought reasoning, and this version preserves that capability at a fraction of the original memory cost. Running a 393B parameter model locally is no small feat even with aggressive quantization, but NVIDIA's NVFP4 format is specifically designed to squeeze maximum quality from minimal bits on their GPUs. For users with multi-GPU setups who want top-tier reasoning without cloud API dependencies, this is one of the most compelling options available.
Specifications
- Publisher
- NVIDIA
- Family
- DeepSeek R1
- Parameters
- 393.6B
- Architecture
- DeepseekV3ForCausalLM
- Context Length
- 163,840 tokens
- Vocabulary Size
- 129,280
- Release Date
- 2025-09-02
- License
- MIT
Get Started
HuggingFace
How Much VRAM Does DeepSeek R1 0528 NVFP4 v2 Need?
Select a quantization to see compatible GPUs below.
| Quantization | Bits | VRAM | + Context | File Size | Quality |
|---|---|---|---|---|---|
| IQ2_XXS | 2.20 | 112.1 GB | 395.1 GB | 108.25 GB | Importance-weighted 2-bit, extreme compression — significant quality loss |
| IQ2_M | 2.70 | 136.7 GB | 419.7 GB | 132.85 GB | Importance-weighted 2-bit, medium |
| IQ3_XXS | 3.10 | 156.4 GB | 439.4 GB | 152.53 GB | Importance-weighted 3-bit |
| Q2_K | 3.40 | 171.2 GB | 454.1 GB | 167.29 GB | 2-bit quantization with K-quant improvements |
| Q3_K_S | 3.50 | 176.1 GB | 459.1 GB | 172.21 GB | 3-bit small quantization |
| Q3_K_M | 3.90 | 195.8 GB | 478.8 GB | 191.90 GB | 3-bit medium quantization |
| Q4_0 | 4.00 | 200.7 GB | 483.7 GB | 196.82 GB | 4-bit legacy quantization |
| IQ4_XS | 4.30 | 215.5 GB | 498.4 GB | 211.58 GB | Importance-weighted 4-bit, compact |
| Q4_1 | 4.50 | 225.3 GB | 508.3 GB | 221.42 GB | 4-bit legacy quantization with offset |
| Q4_K_S | 4.50 | 225.3 GB | 508.3 GB | 221.42 GB | 4-bit small quantization |
| IQ4_NL | 4.50 | 225.3 GB | 508.3 GB | 221.42 GB | Importance-weighted 4-bit, non-linear |
| Q4_K_M | 4.80 | 240.1 GB | 523.0 GB | 236.18 GB | 4-bit medium quantization — most popular sweet spot |
| Q5_K_S | 5.50 | 274.5 GB | 557.5 GB | 270.62 GB | 5-bit small quantization |
| Q5_K_M | 5.70 | 284.4 GB | 567.3 GB | 280.46 GB | 5-bit medium quantization — good quality/size tradeoff |
| Q6_K | 6.60 | 328.6 GB | 611.6 GB | 324.75 GB | 6-bit quantization, very good quality |
| Q8_0 | 8.00 | 397.5 GB | 680.5 GB | 393.63 GB | 8-bit quantization, near-lossless |
Which GPUs Can Run DeepSeek R1 0528 NVFP4 v2?
Q4_K_M · 240.1 GBDeepSeek R1 0528 NVFP4 v2 (Q4_K_M) requires 240.1 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 313+ GB is recommended. Using the full 164K context window can add up to 283.0 GB, bringing total usage to 523.0 GB. No single GPU has enough memory — multi-GPU or cluster setups are needed.
Which Devices Can Run DeepSeek R1 0528 NVFP4 v2?
Q4_K_M · 240.1 GB2 devices with unified memory can run DeepSeek R1 0528 NVFP4 v2, including NVIDIA DGX H100, NVIDIA DGX A100 640GB.
Runs great
— Plenty of headroomRelated Models
Frequently Asked Questions
- How much VRAM does DeepSeek R1 0528 NVFP4 v2 need?
DeepSeek R1 0528 NVFP4 v2 requires 240.1 GB of VRAM at Q4_K_M, or 397.5 GB at Q8_0. Full 164K context adds up to 283.0 GB (523.0 GB total).
VRAM = Weights + KV Cache + Overhead
Weights = 393.6B × 4.8 bits ÷ 8 = 236.2 GB
KV Cache + Overhead ≈ 3.9 GB (at 2K context + ~0.3 GB framework)
KV Cache + Overhead ≈ 286.8 GB (at full 164K context)
VRAM usage by quantization
Q4_K_M240.1 GBQ4_K_M + full context523.0 GB- Can NVIDIA GeForce RTX 5090 run DeepSeek R1 0528 NVFP4 v2?
No — DeepSeek R1 0528 NVFP4 v2 requires at least 112.1 GB at IQ2_XXS, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.
- What's the best quantization for DeepSeek R1 0528 NVFP4 v2?
For DeepSeek R1 0528 NVFP4 v2, Q4_K_M (240.1 GB) offers the best balance of quality and VRAM usage. Q5_K_S (274.5 GB) provides better quality if you have the VRAM. The smallest option is IQ2_XXS at 112.1 GB.
VRAM requirement by quantization
IQ2_XXS112.1 GB~53%Q3_K_S176.1 GB~77%Q4_1225.3 GB~88%Q4_K_M ★240.1 GB~89%Q5_K_S274.5 GB~92%Q8_0397.5 GB~99%★ Recommended — best balance of quality and VRAM usage.
- Can I run DeepSeek R1 0528 NVFP4 v2 on a Mac?
DeepSeek R1 0528 NVFP4 v2 requires at least 112.1 GB at IQ2_XXS, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
- Can I run DeepSeek R1 0528 NVFP4 v2 locally?
Yes — DeepSeek R1 0528 NVFP4 v2 can run locally on consumer hardware. At Q4_K_M quantization it needs 240.1 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
- What's the download size of DeepSeek R1 0528 NVFP4 v2?
At Q4_K_M, the download is about 236.18 GB. The full-precision Q8_0 version is 393.63 GB. The smallest option (IQ2_XXS) is 108.25 GB.