Question 1

How much VRAM do I need to run a Llama model?

Accepted Answer

The smallest Llama model, Llama 68M, runs from 0.0 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which Llama models can I run on a 16 GB GPU?

Accepted Answer

11 of 12 Llama models fit in 16 GB of VRAM at some quantization, including Llama Guard 3 8B, Llama 68M, Llama 7B.

Question 3

What is the most popular Llama model to run locally?

Accepted Answer

Llama Guard 3 8B is the most downloaded Llama model in local-friendly quantized formats. It runs from 2.4 GB of VRAM.

Question 4

How do Llama models score on benchmarks?

Accepted Answer

Llama 7B leads the family with an overall benchmark rating of 28.6/100, ranking #62 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Model	Params	Runs from	Context	Publisher	Quant downloads
MicroLlama v2	45M	0.3 GB	2K	ViorikaAI-org	—
Llama 68M	68M	0.0 GB	2K	JackFram	318
Smol Llama 101M GQA	101M	0.4 GB	1K	BEE-spoke-data	—
MobileLLaMA 1.4B Chat	1.4B	1.3 GB	2K	mtgv	—
Llama Guard 3 1B	1.5B	3.3 GB	—	Meta	—
Llama 7B	6.7B	3.1 GB	2K	huggyllama	—
Llama XLAM 2 8B Fc R	8B	4.0 GB	131K	Salesforce	—
Llama Guard 3 8B	8.0B	2.4 GB	—	Meta	680
Meta Llama Guard 2 8B	8.0B	17.7 GB	—	Meta	—
Llama Poro 2 8B Instruct	8.0B	4.0 GB	8K	LumiOpen	—
Llamatron 8B V1	8.0B	4.0 GB	131K	Naphula	—
Llama Krikri 8B Instruct	8.2B	4.0 GB	131K	ilsp	—

Llama Models — Hardware Requirements

All Llama Models by Size

How Llama Compares — Benchmark Rating

Frequently Asked Questions