Step 3.5 Flash — Hardware Requirements & GPU Compatibility
ChatStep 3.5 Flash is an efficient mixture-of-experts model from StepFun AI, a Chinese AI startup, featuring roughly 199 billion total parameters. The Flash designation signals its focus on speed and low-latency inference, making it well-suited for interactive applications despite its large total parameter count. Running it locally requires a multi-GPU setup, but its MoE architecture means only a portion of the model activates per token, delivering strong multilingual performance with better throughput than a comparably sized dense model.
Specifications
- Publisher
- stepfun-ai
- Parameters
- 199.4B
- Architecture
- Step3p5ForCausalLM
- Context Length
- 262,144 tokens
- Vocabulary Size
- 128,896
- Release Date
- 2026-03-09
- License
- Apache 2.0
Get Started
HuggingFace
How Much VRAM Does Step 3.5 Flash Need?
Select a quantization to see compatible GPUs below.
| Quantization | Bits | VRAM | + Context | File Size | Quality |
|---|---|---|---|---|---|
| IQ4_XS | 4.30 | 117.9 GB | — | 107.17 GB | Importance-weighted 4-bit, compact |
Which GPUs Can Run Step 3.5 Flash?
IQ4_XS · 117.9 GBStep 3.5 Flash (IQ4_XS) requires 117.9 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 154+ GB is recommended. No single GPU has enough memory — multi-GPU or cluster setups are needed.
Which Devices Can Run Step 3.5 Flash?
IQ4_XS · 117.9 GB5 devices with unified memory can run Step 3.5 Flash, including NVIDIA DGX H100, NVIDIA DGX A100 640GB, Mac Studio M4 Max (128 GB).
Runs great
— Plenty of headroomDecent
— Enough memory, may be tightRelated Models
Frequently Asked Questions
- How much VRAM does Step 3.5 Flash need?
Step 3.5 Flash requires 117.9 GB of VRAM at IQ4_XS.
VRAM = Weights + KV Cache + Overhead
Weights = 199.4B × 4.3 bits ÷ 8 = 107.2 GB
KV Cache + Overhead ≈ 10.7 GB (at 2K context + ~0.3 GB framework)
VRAM usage by quantization
IQ4_XS117.9 GB- Can NVIDIA GeForce RTX 5090 run Step 3.5 Flash?
No — Step 3.5 Flash requires at least 117.9 GB at IQ4_XS, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.
- Can I run Step 3.5 Flash on a Mac?
Step 3.5 Flash requires at least 117.9 GB at IQ4_XS, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
- Can I run Step 3.5 Flash locally?
Yes — Step 3.5 Flash can run locally on consumer hardware. At IQ4_XS quantization it needs 117.9 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
- How fast is Step 3.5 Flash?
At IQ4_XS, Step 3.5 Flash can reach ~25 tok/s on AMD Instinct MI300X. Speed depends mainly on GPU memory bandwidth. Real-world results typically within ±20%.
tok/s = (bandwidth GB/s ÷ model GB) × efficiency
Example: AMD Instinct MI300X → 5300 ÷ 117.9 × 0.55 = ~25 tok/s
Estimated speed at IQ4_XS (117.9 GB)
AMD Instinct MI300X~25 tok/sAMD Instinct MI250X~15 tok/sReal-world results typically within ±20%. Speed depends on batch size, quantization kernel, and software stack.
- What's the download size of Step 3.5 Flash?
At IQ4_XS, the download is about 107.17 GB.