Qwen3.5 397B A17B NVFP4 — Hardware Requirements & GPU Compatibility
ChatQwen3.5 397B A17B NVFP4 is NVIDIA's NVFP4-quantized version of Alibaba's enormous Qwen3.5, a 397 billion parameter mixture-of-experts model with 17 billion active parameters per token. Even with aggressive quantization, this is one of the largest models you can attempt to run locally. This model represents the cutting edge of what's possible for local inference. The MoE architecture keeps per-token compute manageable despite the massive parameter count, and NVIDIA's NVFP4 quantization brings memory requirements down from utterly impossible to merely ambitious. Multi-GPU setups with substantial VRAM are essential, but the reward is near-frontier intelligence running entirely on your own machines.
Specifications
- Publisher
- NVIDIA
- Family
- Qwen
- Parameters
- 397B
- Architecture
- Qwen3_5MoeForConditionalGeneration
- Context Length
- 262,144 tokens
- Vocabulary Size
- 248,320
- Release Date
- 2026-02-18
- License
- Apache 2.0
Get Started
HuggingFace
How Much VRAM Does Qwen3.5 397B A17B NVFP4 Need?
Select a quantization to see compatible GPUs below.
| Quantization | Bits | VRAM | + Context | File Size | Quality |
|---|---|---|---|---|---|
| BF16 | 16.00 | 794.4 GB | 810.4 GB | 794.00 GB | Brain floating point 16 — preferred for training |
Which GPUs Can Run Qwen3.5 397B A17B NVFP4?
BF16 · 794.4 GBQwen3.5 397B A17B NVFP4 (BF16) requires 794.4 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 1033+ GB is recommended. Using the full 262K context window can add up to 16.0 GB, bringing total usage to 810.4 GB. No single GPU has enough memory — multi-GPU or cluster setups are needed.
Related Models
Frequently Asked Questions
- How much VRAM does Qwen3.5 397B A17B NVFP4 need?
Qwen3.5 397B A17B NVFP4 requires 794.4 GB of VRAM at BF16. Full 262K context adds up to 16.0 GB (810.4 GB total).
VRAM = Weights + KV Cache + Overhead
Weights = 397B × 16 bits ÷ 8 = 794 GB
KV Cache + Overhead ≈ 0.4 GB (at 2K context + ~0.3 GB framework)
KV Cache + Overhead ≈ 16.4 GB (at full 262K context)
VRAM usage by quantization
BF16794.4 GBBF16 + full context810.4 GB- Can NVIDIA GeForce RTX 5090 run Qwen3.5 397B A17B NVFP4?
No — Qwen3.5 397B A17B NVFP4 requires at least 794.4 GB at BF16, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.
- Can I run Qwen3.5 397B A17B NVFP4 on a Mac?
Qwen3.5 397B A17B NVFP4 requires at least 794.4 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
- Can I run Qwen3.5 397B A17B NVFP4 locally?
Yes — Qwen3.5 397B A17B NVFP4 can run locally on consumer hardware. At BF16 quantization it needs 794.4 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
- What's the download size of Qwen3.5 397B A17B NVFP4?
At BF16, the download is about 794.00 GB.