How much VRAM do I need to run a Qwen 2.5 model?

The smallest Qwen 2.5 model, Qwen2.5 0.5B Instruct, runs from 0.5 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Qwen 2.5 models can I run on a 16 GB GPU?

29 of 32 Qwen 2.5 models fit in 16 GB of VRAM at some quantization, including Qwen2.5 7B Instruct, Qwen2.5 32B Instruct, Qwen2.5 Coder 7B Instruct.

What is the most popular Qwen 2.5 model to run locally?

Qwen2.5 7B Instruct is the most downloaded Qwen 2.5 model in local-friendly quantized formats. It runs from 2.7 GB of VRAM.

How do Qwen 2.5 models score on benchmarks?

Qwen2.5 72B Instruct leads the family with an overall benchmark rating of 49.4/100, ranking #34 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Qwen 2.5 Models — Hardware Requirements

29 Qwen 2.5 models from Alibaba and the community, from the smallest that runs in 0.5 GB of VRAM up to 72.7B parameters. Every row links to full quantization tables and GPU compatibility.

All Qwen 2.5 Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
Qwen2.5 0.5B Instruct	494M	0.5 GB	33K	Alibaba	217.6K
Qwen2.5 0.5B	494M	0.5 GB	33K	Alibaba	246
Qwen2.5 Coder 0.5B	494M	0.5 GB	33K	Alibaba	—
Qwen2.5 1.5B Instruct	1.5B	0.8 GB	33K	Alibaba	350.1K
Qwen2.5 Coder 1.5B Instruct	1.5B	0.9 GB	33K	Alibaba	79.9K
Qwen2.5 Coder 1.5B	1.5B	1.0 GB	33K	Alibaba	6.0K
Qwen2.5 1.5B	1.5B	1 GB	131K	Alibaba	774
Qwen2.5 1.5B Quantized.w8a8	1.8B	1.1 GB	33K	RedHatAI	—
Qwen2.5 Omni 3B MNN	3B	6.6 GB	—	taobao-mnn	—
Qwen2.5 3B Instruct	3.1B	1.3 GB	33K	Alibaba	246.6K
Qwen2.5 Coder 3B Instruct	3.1B	1.4 GB	33K	Alibaba	91.3K
Qwen2.5 Coder 3B	3.1B	1.4 GB	33K	Alibaba	10.1K
Qwen2.5 3B	3.1B	1.6 GB	33K	Alibaba	3.0K
Qwen2.5 Coder 3B Claude Opus 4.6 Distilled	3.1B	1.7 GB	33K	ryzdfm	—
Qwen2.5 7B Instruct	7.6B	2.7 GB	33K	Alibaba	3.6M
Qwen2.5 Coder 7B Instruct	7.6B	3.0 GB	33K	Alibaba	762.4K
Qwen2.5 Coder 7B	7.6B	3.6 GB	33K	Alibaba	11.6K
Qwen2.5 Coder 7B Instruct Abliterated	7.6B	3.0 GB	33K	huihui-ai	7.9K
Qwen2.5 7B Instruct Uncensored	7.6B	3.6 GB	33K	Orion-zhen	756
Qwen2.5 7B	7.6B	3.6 GB	131K	Alibaba	286
Qwen2.5 Coder 7B Bird Cot	7.6B	3.6 GB	33K	jk200201	—
Qwen2.5 Coder 14B Instruct	14.8B	5.1 GB	33K	Alibaba	219.2K
Qwen2.5 14B Instruct	14.8B	5.1 GB	33K	Alibaba	42.6K
Qwen2.5 14B	14.8B	6.8 GB	131K	Alibaba	866
Qwen2.5 Coder 14B	14.8B	7.0 GB	33K	Alibaba	—
Qwen2.5 32B Instruct	32.8B	9.8 GB	33K	Alibaba	822.4K
Qwen2.5 Coder 32B Instruct	32.8B	9.8 GB	33K	Alibaba	241.6K
Qwen2.5 Coder 32B	32.8B	9.8 GB	33K	Alibaba	13.7K
Qwen2.5 32B	32.8B	14.3 GB	131K	Alibaba	806
Qwen2.5 72B Instruct	72.7B	21.0 GB	33K	Alibaba	99.6K
Qwen2.5 72B Instruct Abliterated	72.7B	31.9 GB	33K	huihui-ai	926
Qwen2.5 72B	72.7B	31.0 GB	131K	Alibaba	346

How Qwen 2.5 Compares — Benchmark Rating

Qwen2.5 72B Instruct is the highest-rated Qwen 2.5 model with an overall benchmark rating of 49.4/100 — #34 among 73 open models. The top proprietary model, Claude Fable 5 Max, scores 89.9. Click a model to see its full benchmark breakdown.

Claude Fable 5 Max · proprietary89.9

GPT 5.5 · proprietary89.2

GPT 5.6 Sol · proprietary89.2

Claude Fable 5 · proprietary88.6

Claude Opus 4.8 · proprietary88.1

GLM 5.282.7

Inkling79.2

DeepSeek V4 Pro74.3

Qwen3.6 27B74.0

DeepSeek V4 Flash73.2

Qwen2.5 72B Instruct49.4

Composite of normalized public benchmark scores (methodology) · ■ Qwen 2.5 · ■ other models

Frequently Asked Questions

How much VRAM do I need to run a Qwen 2.5 model?: The smallest Qwen 2.5 model, Qwen2.5 0.5B Instruct, runs from 0.5 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Qwen 2.5 models can I run on a 16 GB GPU?: 29 of 32 Qwen 2.5 models fit in 16 GB of VRAM at some quantization, including Qwen2.5 7B Instruct, Qwen2.5 32B Instruct, Qwen2.5 Coder 7B Instruct.
What is the most popular Qwen 2.5 model to run locally?: Qwen2.5 7B Instruct is the most downloaded Qwen 2.5 model in local-friendly quantized formats. It runs from 2.7 GB of VRAM.
How do Qwen 2.5 models score on benchmarks?: Qwen2.5 72B Instruct leads the family with an overall benchmark rating of 49.4/100, ranking #34 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.