How much VRAM do I need to run a Qwen 3 model?

The smallest Qwen 3 model, Qwen3 0.6B, runs from 0.3 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Qwen 3 models can I run on a 16 GB GPU?

41 of 50 Qwen 3 models fit in 16 GB of VRAM at some quantization, including Qwen3 14B, Qwen3 32B, Qwen3 Coder 30B A3B Instruct.

What is the most popular Qwen 3 model to run locally?

Qwen3 14B is the most downloaded Qwen 3 model in local-friendly quantized formats. It runs from 4.7 GB of VRAM.

How do Qwen 3 models score on benchmarks?

Qwen3 235B A22B Thinking 2507 leads the family with an overall benchmark rating of 62.5/100, ranking #14 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Qwen 3 Models — Hardware Requirements

48 Qwen 3 models from Alibaba and the community, from the smallest that runs in 0.3 GB of VRAM up to 480.2B parameters. Every row links to full quantization tables and GPU compatibility.

All Qwen 3 Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
Qwen3 4B Domino B16	588M	0.6 GB	41K	Huang2020	—
Qwen3 0.6B Base	596M	0.7 GB	33K	Alibaba	1.6K
Qwen3 0.6B Heretic Abliterated Uncensored	596M	0.7 GB	41K	DavidAU	—
Distil Qwen3 0.6B Text2sql	596M	0.7 GB	41K	distil-labs	—
Qwen3 0.6B	0.6B	0.3 GB	—	litert-community	—
Qwen3 0.6B	752M	0.6 GB	41K	Alibaba	334.1K
Qwen3 14B PARO	1.6B	1.3 GB	41K	z-lab	—
Qwen3 1.7B Abliterated	1.7B	0.8 GB	—	huihui-ai	—
Qwen3 1.7B	1.7B	0.8 GB	—	AXERA-TECH	—
Qwen3 1.7B Base	1.7B	1.0 GB	33K	Alibaba	1.5K
Qwen3 1.7B	2.0B	1.1 GB	41K	Alibaba	197.3K
Qwen3 4B Z Image Engineer V4	4B	1.9 GB	—	BennyDaBall	—
Qwen3 4B Gemini 3.1 Pro Reasoning Distilled	4B	2.2 GB	262K	khazarai	—
Qwen3 Code Reasoning 4B	4B	2.2 GB	262K	GetSoloTech	—
Qwen3 4B	4.0B	1.6 GB	41K	Alibaba	635.0K
Qwen3 4B Instruct 2507	4.0B	1.6 GB	262K	Alibaba	193.8K
Qwen3 4B Thinking 2507	4.0B	1.6 GB	262K	Alibaba	93.2K
Qwen3 4B Base	4.0B	2.2 GB	33K	Alibaba	847
Qwen3 4B Instruct 2507 Heretic	4.0B	2.2 GB	262K	p-e-w	—
Huihui Qwen3 4B Abliterated v2	4.0B	2.2 GB	41K	huihui-ai	—
Parable Qwen3 4B Claude Fable 5	4.0B	2.2 GB	41K	AnkitAI	—
Qwen3 4B Heretic	4.0B	2.2 GB	41K	DreamFast	—
Qwen3 4B Abliterated	4.0B	1.9 GB	—	huihui-ai	—
Qwen3 4B Hindi Instruct v2	4.0B	2.2 GB	262K	pankajpandey-dev	—
Qwen3 8B	8B	3.7 GB	—	litert-community	—
Qwen3 8B	8.2B	2.9 GB	41K	Alibaba	555.7K
Qwen3 8B Base	8.2B	4.1 GB	33K	Alibaba	242
Qwen3Guard Gen 8B	8.2B	4.1 GB	33K	Alibaba	242
Huihui Qwen3 8B Abliterated v2	8.2B	4.1 GB	41K	huihui-ai	—
Josiefied Qwen3 8B Abliterated V1	8.2B	4.1 GB	41K	Goekdeniz-Guelmez	—
Qwen3 8B Heretic	8.2B	4.1 GB	41K	DreamFast	—
Qwen3 8B Abliterated	8.2B	3.8 GB	—	huihui-ai	—
Qwen3 14B	14.8B	4.7 GB	41K	Alibaba	2.3M
Qwen3 14B Base	14.8B	6.9 GB	33K	Alibaba	—
Qwen3 Coder 30B A3B Instruct	30.5B	8.8 GB	262K	Alibaba	816.9K
Qwen3 30B A3B	30.5B	8.8 GB	41K	Alibaba	540.4K
Qwen3 30B A3B Instruct 2507	30.5B	8.8 GB	262K	Alibaba	533.1K
Huihui Qwen3 Coder 30B A3B Instruct Abliterated	30.5B	8.8 GB	262K	huihui-ai	35.8K
Qwen3 30B A3B Thinking 2507	30.5B	8.8 GB	262K	Alibaba	7.0K
Qwen3 30B A3B Base	30.5B	13.4 GB	33K	Alibaba	—
Qwen3 32B	32.8B	9.7 GB	41K	Alibaba	948.2K
Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER	42.4B	18.4 GB	262K	DavidAU	—
Qwen3 Coder Next	79.7B	22.3 GB	262K	Alibaba	868.1K
Qwen3 Next 80B A3B Thinking	81.3B	22.8 GB	262K	Alibaba	146.1K
Qwen3 Next 80B A3B Instruct	81.3B	22.8 GB	262K	Alibaba	28.6K
Qwen3 235B A22B	235.1B	100.4 GB	41K	Alibaba	85.4K
Qwen3 235B A22B Thinking 2507	235.1B	71.0 GB	262K	Alibaba	36.5K
Qwen3 235B A22B Instruct 2507	235.1B	71.0 GB	262K	Alibaba	11.9K
Qwen3 Nemotron 235B A22B GenRM 2603	235.1B	100.4 GB	262K	NVIDIA	—
Qwen3 Coder 480B A35B Instruct	480.2B	144.6 GB	262K	Alibaba	9.1K

How Qwen 3 Compares — Benchmark Rating

Qwen3 235B A22B Thinking 2507 is the highest-rated Qwen 3 model with an overall benchmark rating of 62.5/100 — #14 among 73 open models. The top proprietary model, Claude Fable 5 Max, scores 89.9. Click a model to see its full benchmark breakdown.

Claude Fable 5 Max · proprietary89.9

GPT 5.5 · proprietary89.2

GPT 5.6 Sol · proprietary89.2

Claude Fable 5 · proprietary88.6

Claude Opus 4.8 · proprietary88.1

GLM 5.282.7

Inkling79.2

DeepSeek V4 Pro74.3

Qwen3.6 27B74.0

DeepSeek V4 Flash73.2

Qwen3 235B A22B Thinking 250762.5

Qwen3 235B A22B59.8

Qwen3 235B A22B Instruct 250750.6

Composite of normalized public benchmark scores (methodology) · ■ Qwen 3 · ■ other models

Frequently Asked Questions

How much VRAM do I need to run a Qwen 3 model?: The smallest Qwen 3 model, Qwen3 0.6B, runs from 0.3 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Qwen 3 models can I run on a 16 GB GPU?: 41 of 50 Qwen 3 models fit in 16 GB of VRAM at some quantization, including Qwen3 14B, Qwen3 32B, Qwen3 Coder 30B A3B Instruct.
What is the most popular Qwen 3 model to run locally?: Qwen3 14B is the most downloaded Qwen 3 model in local-friendly quantized formats. It runs from 4.7 GB of VRAM.
How do Qwen 3 models score on benchmarks?: Qwen3 235B A22B Thinking 2507 leads the family with an overall benchmark rating of 62.5/100, ranking #14 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.