Llama Models — Hardware Requirements

11 Llama models from Meta and the community, from the smallest that runs in 0.0 GB of VRAM up to 8.2B parameters. Every row links to full quantization tables and GPU compatibility.

All Llama Models by Size

ModelParamsContext
Llama 68M68M2K
Smol Llama 101M GQA101M1K
MobileLLaMA 1.4B Chat1.4B2K
Llama Guard 3 1B1.5B
Llama 7B6.7B2K
Llama XLAM 2 8B Fc R8B131K
Llama Guard 3 8B8.0B
Meta Llama Guard 2 8B8.0B
Llama Poro 2 8B Instruct8.0B8K
Llamatron 8B V18.0B131K
Llama Krikri 8B Instruct8.2B131K

How Llama Compares — Benchmark Rating

Llama 7B is the highest-rated Llama model with an overall benchmark rating of 28.6/100 — #66 among 75 open models. The top proprietary model, GPT 5.5, scores 88.8. Click a model to see its full benchmark breakdown.

GPT 5.5 · proprietary88.8
Claude Opus 4.7 · proprietary87.6
Claude Fable 5 · proprietary86.6
GPT 5.4 · proprietary86.6
Claude Opus 4.8 · proprietary84.4
Composite of normalized public benchmark scores (methodology) · Llama · other models

Frequently Asked Questions

How much VRAM do I need to run a Llama model?
The smallest Llama model, Llama 68M, runs from 0.0 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Llama models can I run on a 16 GB GPU?
10 of 11 Llama models fit in 16 GB of VRAM at some quantization, including Llama Guard 3 8B, Llama 68M, Llama 7B.
What is the most popular Llama model to run locally?
Llama Guard 3 8B is the most downloaded Llama model in local-friendly quantized formats. It runs from 2.4 GB of VRAM.
How do Llama models score on benchmarks?
Llama 7B leads the family with an overall benchmark rating of 28.6/100, ranking #66 among 75 open models, while the top proprietary model, GPT 5.5, scores 88.8. See the comparison chart above for the full standings.