Llama 3 Models — Hardware Requirements

52 Llama 3 models from Meta and the community, from the smallest that runs in 0.4 GB of VRAM up to 405.9B parameters. Every row links to full quantization tables and GPU compatibility.

All Llama 3 Models by Size

ModelParamsContext
Llama 3.2 1B Instruct1.2B131K
Llama 3.2 1B1.2B
Hermes 3 Llama 3.2 3B3B131K
Llama 3.2 3B Instruct3.2B131K
Llama 3.2 3B3.2B
Llama 3.2 Korean Bllossom 3B3.2B131K
Llama3 OpenBioLLM 8B8B8K
Meta Llama 3 8B Instruct8B8K
Llama 3.1 Nemotron Nano 8B V18B131K
Llama 3.1 8B Instruct8.0B131K
Meta Llama 3.1 8B Instruct8.0B
Meta Llama 3 8B Instruct8.0B
Hermes 3 Llama 3.1 8B8.0B131K
Llama 3.1 8B Lexi Uncensored v28.0B131K
Llama 3.1 8B8.0B
Meta Llama 3 8B8.0B
Saiga Llama3 8B8.0B8K
Meta Llama 3.1 8B8.0B
Llama 3 ELYZA JP 8B8.0B8K
Meta Llama 3 8B Instruct Abliterated v38.0B8K
Llama3 8B Chinese Chat8.0B8K
Llama 3 Korean Bllossom 8B8.0B8K
Bella Bartender 8B Llama3.18.0B131K
KONI Llama3.1 8B Instruct 202410248.0B131K
Finance Llama3 8B8.0B8K
Llama 3.1 Nemotron Safety Guard 8B v38.0B131K
Human Like LLama3 8B Instruct8.0B8K
Dolphin 2.9 Llama3 8B8.0B8K
Hermes 2 Pro Llama 3 8B8.0B8K
Llama 3.2 11B Vision Instruct10.7B
Llama 3 3 Nemotron Super 49B V1 549.9B131K
Llama 3 1 Nemotron 51B Instruct51B131K
Llama3 OpenBioLLM 70B70B8K
Llama 3.3 70B Instruct70.6B131K
Meta Llama 3.1 70B Instruct70.6B
Llama 3.3 70B Instruct Abliterated70.6B131K
Llama 3.1 Nemotron 70B Instruct HF70.6B131K
Meta Llama 3 70B Instruct70.6B
Hermes 3 Llama 3.1 70B70.6B131K
Hermes 2 Theta Llama 3 70B70.6B8K
Llama 3.1 70B LatamGPT SFT 1.070.6B4K
Llama 3.1 70B Instruct70.6B131K
Meta Llama 3 70B70.6B
Llama 3.1 70B70.6B
Llama 3.3 Nemotron 70B Reward70.6B131K
Dolphin 2.9.1 Llama 3 70B70.6B8K
Llama 3.1 Tulu 3 70B DPO70.6B131K
Llama 3.2 90B Vision Instruct88.6B
Llama 3 1 Nemotron Ultra 253B V1253.4B131K
Meta Llama 3.1 405B Instruct405.9B
Llama 3.1 405B405.9B
Llama 3.1 405B Instruct405.9B

How Llama 3 Compares — Benchmark Rating

Llama 3.1 405B Instruct is the highest-rated Llama 3 model with an overall benchmark rating of 46.0/100 — #44 among 75 open models. The top proprietary model, GPT 5.5, scores 88.8. Click a model to see its full benchmark breakdown.

GPT 5.5 · proprietary88.8
Claude Opus 4.7 · proprietary87.6
Claude Fable 5 · proprietary86.6
GPT 5.4 · proprietary86.6
Claude Opus 4.8 · proprietary84.4
Composite of normalized public benchmark scores (methodology) · Llama 3 · other models

Frequently Asked Questions

How much VRAM do I need to run a Llama 3 model?
The smallest Llama 3 model, Llama 3.2 1B Instruct, runs from 0.4 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Llama 3 models can I run on a 16 GB GPU?
31 of 52 Llama 3 models fit in 16 GB of VRAM at some quantization, including Llama 3.2 1B Instruct, Llama 3.1 8B Instruct, Llama 3.2 3B Instruct.
What is the most popular Llama 3 model to run locally?
Llama 3.2 1B Instruct is the most downloaded Llama 3 model in local-friendly quantized formats. It runs from 0.4 GB of VRAM.
How do Llama 3 models score on benchmarks?
Llama 3.1 405B Instruct leads the family with an overall benchmark rating of 46.0/100, ranking #44 among 75 open models, while the top proprietary model, GPT 5.5, scores 88.8. See the comparison chart above for the full standings.