How much VRAM do I need to run a Llama 3 model?

The smallest Llama 3 model, Llama 3.2 1B Instruct, runs from 0.4 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Which Llama 3 models can I run on a 16 GB GPU?

32 of 53 Llama 3 models fit in 16 GB of VRAM at some quantization, including Llama 3.2 1B Instruct, Llama 3.1 8B Instruct, Llama 3.2 3B Instruct.

What is the most popular Llama 3 model to run locally?

Llama 3.2 1B Instruct is the most downloaded Llama 3 model in local-friendly quantized formats. It runs from 0.4 GB of VRAM.

How do Llama 3 models score on benchmarks?

Llama 3.1 405B Instruct leads the family with an overall benchmark rating of 44.5/100, ranking #36 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Llama 3 Models — Hardware Requirements

52 Llama 3 models from Meta and the community, from the smallest that runs in 0.4 GB of VRAM up to 405.9B parameters. Every row links to full quantization tables and GPU compatibility.

All Llama 3 Models by Size

Model	Params	Runs from	Context	Publisher	Quant downloads
Llama 3.2 1B Instruct	1.2B	0.4 GB	131K	Meta	2.4M
Llama 3.2 1B	1.2B	0.6 GB	—	Meta	3.7K
Hermes 3 Llama 3.2 3B	3B	1.6 GB	131K	Nous Research	9.5K
Llama 3.2 3B Instruct	3.2B	1.0 GB	131K	Meta	546.1K
Llama 3.2 3B	3.2B	1.5 GB	—	Meta	—
Llama 3.2 Korean Bllossom 3B	3.2B	1.9 GB	131K	Bllossom	—
Llama3 OpenBioLLM 8B	8B	3.9 GB	8K	aaditya	466
Meta Llama 3 8B Instruct	8B	3.9 GB	8K	Nous Research	466
Llama 3.1 Nemotron Nano 8B V1	8B	2.8 GB	131K	NVIDIA	64
Llama 3.1 8B Instruct	8.0B	3.6 GB	131K	Meta	841.8K
Meta Llama 3.1 8B Instruct	8.0B	2.4 GB	—	Meta	527.9K
Meta Llama 3 8B Instruct	8.0B	2.6 GB	—	Meta	277.5K
Hermes 3 Llama 3.1 8B	8.0B	3.3 GB	131K	Nous Research	11.6K
Llama 3.1 8B Lexi Uncensored v2	8.0B	3.3 GB	131K	Orenguteng	10.6K
Llama 3.1 8B	8.0B	3.8 GB	—	Meta	6.5K
Meta Llama 3.1 8B Instruct Abliterated	8.0B	3.3 GB	131K	mlabonne	5.9K
Meta Llama 3 8B	8.0B	3.8 GB	—	Meta	1.9K
Saiga Llama3 8B	8.0B	4.0 GB	8K	IlyaGusev	670
Meta Llama 3.1 8B	8.0B	3.8 GB	—	Meta	—
Llama 3 ELYZA JP 8B	8.0B	4.0 GB	8K	elyza	—
Meta Llama 3 8B Instruct Abliterated v3	8.0B	4.0 GB	8K	failspy	—
Llama3 8B Chinese Chat	8.0B	4.0 GB	8K	shenzhi-wang	—
Llama 3 Korean Bllossom 8B	8.0B	4.0 GB	8K	MLP-KTLim	—
Bella Bartender 8B Llama3.1	8.0B	3.0 GB	131K	juiceb0xc0de	—
KONI Llama3.1 8B Instruct 20241024	8.0B	4.0 GB	131K	KISTI-KONI	—
Finance Llama3 8B	8.0B	4.0 GB	8K	instruction-pretrain	—
Llama 3.1 Nemotron Safety Guard 8B v3	8.0B	4.0 GB	131K	NVIDIA	—
Human Like LLama3 8B Instruct	8.0B	4.0 GB	8K	HumanLLMs	—
Dolphin 2.9 Llama3 8B	8.0B	4.0 GB	8K	dphn	—
Hermes 2 Pro Llama 3 8B	8.0B	4.0 GB	8K	Nous Research	—
Llama 3.2 11B Vision Instruct	10.7B	5.0 GB	—	Meta	17.4K
Llama 3 3 Nemotron Super 49B V1 5	49.9B	15.1 GB	131K	NVIDIA	1.2K
Llama 3 1 Nemotron 51B Instruct	51B	112.2 GB	131K	NVIDIA	—
Llama3 OpenBioLLM 70B	70B	30.7 GB	8K	aaditya	—
Llama 3.3 70B Instruct	70.6B	21.3 GB	131K	Meta	134.2K
Meta Llama 3.1 70B Instruct	70.6B	21.3 GB	—	Meta	84.8K
Llama 3.3 70B Instruct Abliterated	70.6B	20.4 GB	131K	huihui-ai	18.9K
Llama 3.1 Nemotron 70B Instruct HF	70.6B	20.4 GB	131K	NVIDIA	10.0K
Hermes 3 Llama 3.1 70B	70.6B	20.4 GB	131K	Nous Research	2.1K
Llama 3.1 70B LatamGPT SFT 1.0	70.6B	24.8 GB	4K	latam-gpt	1.7K
Meta Llama 3 70B Instruct	70.6B	23.3 GB	—	Meta	1.2K
Hermes 2 Theta Llama 3 70B	70.6B	20.4 GB	8K	Nous Research	584
Llama 3.1 70B Instruct	70.6B	33.0 GB	131K	Meta	137
Meta Llama 3 70B	70.6B	33.0 GB	—	Meta	—
Llama 3.1 70B	70.6B	33.0 GB	—	Meta	—
Llama 3.3 Nemotron 70B Reward	70.6B	31.0 GB	131K	NVIDIA	—
Dolphin 2.9.1 Llama 3 70B	70.6B	31.0 GB	8K	dphn	—
Llama 3.1 Tulu 3 70B DPO	70.6B	20.4 GB	131K	Allen AI	1.7K
Llama 3.2 90B Vision Instruct	88.6B	194.9 GB	—	Meta	—
Llama 3 1 Nemotron Ultra 253B V1	253.4B	557.5 GB	131K	NVIDIA	—
Meta Llama 3.1 405B Instruct	405.9B	189.7 GB	—	Meta	61.8K
Llama 3.1 405B Instruct	405.9B	189.7 GB	—	Meta	—
Llama 3.1 405B	405.9B	189.7 GB	—	Meta	—

How Llama 3 Compares — Benchmark Rating

Llama 3.1 405B Instruct is the highest-rated Llama 3 model with an overall benchmark rating of 44.5/100 — #36 among 73 open models. The top proprietary model, Claude Fable 5 Max, scores 89.9. Click a model to see its full benchmark breakdown.

Claude Fable 5 Max · proprietary89.9

GPT 5.5 · proprietary89.2

GPT 5.6 Sol · proprietary89.2

Claude Fable 5 · proprietary88.6

Claude Opus 4.8 · proprietary88.1

GLM 5.282.7

Inkling79.2

DeepSeek V4 Pro74.3

Qwen3.6 27B74.0

DeepSeek V4 Flash73.2

Llama 3.1 405B Instruct44.5

Llama 3.3 70B Instruct40.1

Llama 3.2 90B Vision Instruct38.7

Llama 3.1 70B Instruct37.3

Composite of normalized public benchmark scores (methodology) · ■ Llama 3 · ■ other models

Frequently Asked Questions

How much VRAM do I need to run a Llama 3 model?: The smallest Llama 3 model, Llama 3.2 1B Instruct, runs from 0.4 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Llama 3 models can I run on a 16 GB GPU?: 32 of 53 Llama 3 models fit in 16 GB of VRAM at some quantization, including Llama 3.2 1B Instruct, Llama 3.1 8B Instruct, Llama 3.2 3B Instruct.
What is the most popular Llama 3 model to run locally?: Llama 3.2 1B Instruct is the most downloaded Llama 3 model in local-friendly quantized formats. It runs from 0.4 GB of VRAM.
How do Llama 3 models score on benchmarks?: Llama 3.1 405B Instruct leads the family with an overall benchmark rating of 44.5/100, ranking #36 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.