Kimi Dev 72B — Hardware Requirements & GPU Compatibility
ChatCodeKimi Dev 72B is Moonshot AI's developer-focused model built on the Qwen2.5-72B architecture, specifically optimized for coding tasks, tool use, and agentic workflows. It combines strong general-purpose chat abilities with specialized developer capabilities, making it a compelling choice for software engineering assistance. At 72 billion parameters it requires substantial hardware, typically needing 40+ GB of VRAM at 4-bit quantization, which puts it in reach of dual consumer GPU setups or single professional cards like the A100 or RTX 6000 Ada. If you are primarily looking for a local coding assistant with strong reasoning skills, Kimi Dev is a top-tier option in the 70B class.
Specifications
- Publisher
- Moonshot AI
- Family
- Kimi
- Parameters
- 72B
- Architecture
- Qwen2ForCausalLM
- Context Length
- 131,072 tokens
- Vocabulary Size
- 152,064
- Release Date
- 2025-06-17
- License
- MIT
Get Started
HuggingFace
How Much VRAM Does Kimi Dev 72B Need?
Select a quantization to see compatible GPUs below.
| Quantization | Bits | VRAM | + Context | File Size | Quality |
|---|---|---|---|---|---|
| BF16 | 16.00 | 145.0 GB | 187.3 GB | 144.00 GB | Brain floating point 16 — preferred for training |
Which GPUs Can Run Kimi Dev 72B?
BF16 · 145.0 GBKimi Dev 72B (BF16) requires 145.0 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 189+ GB is recommended. Using the full 131K context window can add up to 42.3 GB, bringing total usage to 187.3 GB. No single GPU has enough memory — multi-GPU or cluster setups are needed.
Which Devices Can Run Kimi Dev 72B?
BF16 · 145.0 GB4 devices with unified memory can run Kimi Dev 72B, including NVIDIA DGX H100, NVIDIA DGX A100 640GB, Mac Pro M2 Ultra (192 GB).
Runs great
— Plenty of headroomDecent
— Enough memory, may be tightRelated Models
Frequently Asked Questions
- How much VRAM does Kimi Dev 72B need?
Kimi Dev 72B requires 145.0 GB of VRAM at BF16. Full 131K context adds up to 42.3 GB (187.3 GB total).
VRAM = Weights + KV Cache + Overhead
Weights = 72B × 16 bits ÷ 8 = 144 GB
KV Cache + Overhead ≈ 1 GB (at 2K context + ~0.3 GB framework)
KV Cache + Overhead ≈ 43.3 GB (at full 131K context)
VRAM usage by quantization
BF16145.0 GBBF16 + full context187.3 GB- Can NVIDIA GeForce RTX 5090 run Kimi Dev 72B?
No — Kimi Dev 72B requires at least 145.0 GB at BF16, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.
- Can I run Kimi Dev 72B on a Mac?
Kimi Dev 72B requires at least 145.0 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
- Can I run Kimi Dev 72B locally?
Yes — Kimi Dev 72B can run locally on consumer hardware. At BF16 quantization it needs 145.0 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
- How fast is Kimi Dev 72B?
At BF16, Kimi Dev 72B can reach ~20 tok/s on AMD Instinct MI300X. Speed depends mainly on GPU memory bandwidth. Real-world results typically within ±20%.
tok/s = (bandwidth GB/s ÷ model GB) × efficiency
Example: AMD Instinct MI300X → 5300 ÷ 145.0 × 0.55 = ~20 tok/s
Estimated speed at BF16 (145.0 GB)
AMD Instinct MI300X~20 tok/sReal-world results typically within ±20%. Speed depends on batch size, quantization kernel, and software stack.
- What's the download size of Kimi Dev 72B?
At BF16, the download is about 144.00 GB.