How much VRAM do I need to run a Qwen model?

The smallest Qwen model, Qwen1.5 0.5B Chat, runs from 0.8 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Qwen models can I run on a 16 GB GPU?

26 of 28 Qwen models fit in 16 GB of VRAM at some quantization, including Qwen AgentWorld 35B A3B, Cogito V1 Preview Qwen 32B, Qwen1.5 0.5B Chat.

What is the most popular Qwen model to run locally?

Qwen AgentWorld 35B A3B is the most downloaded Qwen model in local-friendly quantized formats. It runs from 9.9 GB of VRAM.

How do Qwen models score on benchmarks?

Qwen 14B leads the family with an overall benchmark rating of 56.6/100, ranking #21 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Qwen Models — Hardware Requirements

23 Qwen models from Alibaba and the community, from the smallest that runs in 0.8 GB of VRAM up to 72.3B parameters. Every row links to full quantization tables and GPU compatibility.

All Qwen Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
SpatialLM1.1 Qwen 0.5B	604M	1.5 GB	33K	manycore-research	—
Qwen1.5 0.5B Chat	620M	0.8 GB	33K	Alibaba	472
Nemotron Research Reasoning Qwen 1.5B	1.8B	1.1 GB	131K	NVIDIA	—
Qwen1.5 1.8B Chat	1.8B	1.5 GB	33K	Alibaba	285
Qwen 1 8B	1.8B	0.9 GB	8K	Alibaba	22
Qwen1.5 1.8B	1.8B	1.5 GB	33K	Alibaba	—
Qwen1.5 MoE A2.7B Chat	2.7B	1.9 GB	33K	Alibaba	—
Qwen35 4B Soyuz Merged	4B	8.5 GB	262K	AlexWortega	—
CyberSecQwen 4B	4.0B	2.2 GB	262K	lablab-ai-amd-developer-hackathon	—
CodeQwen1.5 7B	7.3B	3.5 GB	66K	Alibaba	42
Qwen1.5 7B Chat	7.7B	4.7 GB	33K	Alibaba	223
Qwen1.5 7B	7.7B	4.7 GB	33K	Alibaba	89
Qwen 7B	7.7B	3.6 GB	33K	Alibaba	69
Qwen Marketing	8.2B	18.0 GB	—	marketeam	—
Qwen1.5 14B Chat	14.2B	8 GB	33K	Alibaba	232
Qwen 14B Chat	14.2B	6.6 GB	8K	Alibaba	173
Qwen 14B	14.2B	6.6 GB	8K	Alibaba	171
Qwen1.5 14B	14.2B	8 GB	33K	Alibaba	49
Qwen1.5 MoE A2.7B	14.3B	6.8 GB	8K	Alibaba	—
Qwen27b Abliterated Fable MTP	27B	12.6 GB	—	hotdogs	—
Cogito V1 Preview Qwen 32B	32B	10.4 GB	131K	deepcogito	3.3K
XiYanSQL QwenCoder 32B 2504	32B	14.4 GB	33K	XGenerationLab	—
Qwen1.5 32B Chat	32.5B	14.3 GB	33K	Alibaba	159
Qwen1.5 32B	32.5B	14.3 GB	33K	Alibaba	44
Qwen AgentWorld 35B A3B	34.7B	9.9 GB	262K	Alibaba	678.3K
Qwen35B Agent R2	34.7B	15.1 GB	262K	hotdogs	—
Qwen35b Agent R2O3	34.7B	15.1 GB	262K	hotdogs	—
Qwen1.5 72B Chat	72.3B	35.5 GB	33K	Alibaba	68

How Qwen Compares — Benchmark Rating

Qwen 14B is the highest-rated Qwen model with an overall benchmark rating of 56.6/100 — #21 among 73 open models. The top proprietary model, Claude Fable 5 Max, scores 89.9. Click a model to see its full benchmark breakdown.

Claude Fable 5 Max · proprietary89.9

GPT 5.5 · proprietary89.2

GPT 5.6 Sol · proprietary89.2

Claude Fable 5 · proprietary88.6

Claude Opus 4.8 · proprietary88.1

GLM 5.282.7

Inkling79.2

DeepSeek V4 Pro74.3

Qwen3.6 27B74.0

DeepSeek V4 Flash73.2

Qwen 14B56.6

Qwen 14B Chat56.5

Qwen 7B41.1

Qwen 1 8B17.9

Composite of normalized public benchmark scores (methodology) · ■ Qwen · ■ other models

Frequently Asked Questions

How much VRAM do I need to run a Qwen model?: The smallest Qwen model, Qwen1.5 0.5B Chat, runs from 0.8 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Qwen models can I run on a 16 GB GPU?: 26 of 28 Qwen models fit in 16 GB of VRAM at some quantization, including Qwen AgentWorld 35B A3B, Cogito V1 Preview Qwen 32B, Qwen1.5 0.5B Chat.
What is the most popular Qwen model to run locally?: Qwen AgentWorld 35B A3B is the most downloaded Qwen model in local-friendly quantized formats. It runs from 9.9 GB of VRAM.
How do Qwen models score on benchmarks?: Qwen 14B leads the family with an overall benchmark rating of 56.6/100, ranking #21 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.