How much VRAM do I need to run a Qwen 3.5 model?

The smallest Qwen 3.5 model, Qwen3.5 0.8B, runs from 0.7 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Qwen 3.5 models can I run on a 16 GB GPU?

20 of 27 Qwen 3.5 models fit in 16 GB of VRAM at some quantization, including Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled, Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled, Qwen3.5 9B.

What is the most popular Qwen 3.5 model to run locally?

Qwen3.5 397B A17B is the most downloaded Qwen 3.5 model in local-friendly quantized formats. It runs from 171.9 GB of VRAM.

Qwen 3.5 Models — Hardware Requirements

26 Qwen 3.5 models from Alibaba and the community, from the smallest that runs in 0.7 GB of VRAM up to 403.4B parameters. Every row links to full quantization tables and GPU compatibility.

All Qwen 3.5 Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
Qwen3.5 4B DFlash	537M	1.4 GB	262K	z-lab	—
Josiefied Qwen3.5 0.8B Gabliterated V1	853M	2.1 GB	262K	Goekdeniz-Guelmez	—
Qwen3.5 0.8B	873M	0.7 GB	262K	Alibaba	—
Qwen3.5 9B DFlash	1.0B	2.4 GB	262K	z-lab	—
Qwen3.5 2B Text Only	1.9B	4.2 GB	262K	principled-intelligence	—
Qwen3.5 2B Claude 4.6 Opus Reasoning Distilled	2.3B	1.4 GB	262K	Jackrong	—
Qwen3.5 4B Super Coder	4B	2.2 GB	262K	jica98	—
Qwen3.5 4B PTBR	4B	1.5 GB	—	lucasmg09	—
Qwen3.5 4B Safety Thinking	4.2B	2.3 GB	262K	MerlinSafety	—
Qwen3.5 4B Claude Opus 4.6 Distilled Heretic	4.5B	9.6 GB	262K	ghost-actual	—
Qwen3.5 4B	4.7B	2.5 GB	262K	Alibaba	—
Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled	4.7B	2.5 GB	262K	Jackrong	—
Qwen3.5 4B MiniFantasy MTP	4.7B	9.8 GB	262K	MuXodious	—
Qwen3.5 9B Abliterated	9.0B	4.4 GB	262K	lukey03	—
Qwen3.5 9B Uncensored	9B	4.2 GB	—	LEONW24	—
Qwen3.5 9B Humanize DPO Round2	9B	19.8 GB	—	XiangJinYu	—
GrepSeek Qwen3.5 9B GRPO	9.4B	19.4 GB	262K	alireza7	—
Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled	9.7B	4.7 GB	262K	Jackrong	2.4K
Qwen3.5 9B	9.7B	4.7 GB	262K	Alibaba	—
Qwen3.5 9B Gemini 3.1 Pro Reasoning Distill	9.7B	4.7 GB	262K	Jackrong	—
Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled Heretic v2	27.4B	12.4 GB	262K	llmfan46	—
Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled	27.8B	8.4 GB	262K	Jackrong	44.1K
Qwen3.5 35B A3B DFlash	35B	70.3 GB	262K	z-lab	—
PrunedHub Qwen3.5 35B A3B 80pct	35B	16.4 GB	—	GOBA-AI-Labs	—
Qwen3.5 35B A3B Claude 4.6 Opus Reasoning Distilled	36.0B	72.3 GB	262K	Jackrong	2.1K
Qwen3.5 122B A10B	125.1B	53.5 GB	262K	Alibaba	385.8K
Qwen3.5 397B A17B	403.4B	171.9 GB	262K	Alibaba	906.4K

Frequently Asked Questions

How much VRAM do I need to run a Qwen 3.5 model?: The smallest Qwen 3.5 model, Qwen3.5 0.8B, runs from 0.7 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Qwen 3.5 models can I run on a 16 GB GPU?: 20 of 27 Qwen 3.5 models fit in 16 GB of VRAM at some quantization, including Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled, Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled, Qwen3.5 9B.
What is the most popular Qwen 3.5 model to run locally?: Qwen3.5 397B A17B is the most downloaded Qwen 3.5 model in local-friendly quantized formats. It runs from 171.9 GB of VRAM.