Question 1

How much VRAM do I need to run a Llama 2 model?

Accepted Answer

The smallest Llama 2 model, Llama 2 7B Chat HF, runs from 3.1 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which Llama 2 models can I run on a 16 GB GPU?

Accepted Answer

5 of 7 Llama 2 models fit in 16 GB of VRAM at some quantization, including Llama 2 7B Chat HF, Llama 2 7B Chat HF, Llama 2 7B HF.

Question 3

What is the most popular Llama 2 model to run locally?

Accepted Answer

Llama 2 7B Chat HF is the most downloaded Llama 2 model in local-friendly quantized formats. It runs from 3.1 GB of VRAM.

Question 4

How do Llama 2 models score on benchmarks?

Accepted Answer

Llama 2 70B HF leads the family with an overall benchmark rating of 63.8/100, ranking #11 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Model	Params	Runs from	Context	Publisher	Quant downloads
Llama 2 7B Chat HF	6.7B	3.1 GB	—	Meta	263
Llama 2 7B Chat HF	6.7B	4.2 GB	4K	Nous Research	109
Llama 2 7B HF	6.7B	3.1 GB	—	Meta	—
Llama 2 13B Chat HF	13.0B	6.1 GB	—	Meta	—
Llama 2 13B HF	13.0B	6.1 GB	—	Meta	—
Llama 2 70B Chat HF	69.0B	32.3 GB	—	Meta	—
Llama 2 70B HF	69.0B	151.8 GB	—	Meta	—

Llama 2 Models — Hardware Requirements

All Llama 2 Models by Size

How Llama 2 Compares — Benchmark Rating

Frequently Asked Questions