Meta·Llama 3

Llama 3.1 405B — Hardware Requirements & GPU Compatibility

Chat

Meta Llama 3.1 405B is the largest model in the Llama family with 405 billion parameters. It represents Meta's most capable open-weight model, delivering performance competitive with leading proprietary models across reasoning, coding, math, and multilingual tasks. It features a 128K token context window. Due to its massive size, running Llama 3.1 405B locally requires significant hardware, typically multiple high-end professional GPUs with a combined VRAM of 200GB or more at reduced precision. It is primarily used in quantized formats for local inference or via multi-node setups. Released under the Llama 3.1 Community License.

514.6K downloads 965 likesSep 2024

Specifications

Publisher
Meta
Family
Llama 3
Parameters
405B
Release Date
2024-09-25
License
Llama 3.1 Community

Get Started

How Much VRAM Does Llama 3.1 405B Need?

Select a quantization to see compatible GPUs below.

QuantizationBitsVRAM
BF1616.00891 GB

Which GPUs Can Run Llama 3.1 405B?

BF16 · 891 GB

Llama 3.1 405B (BF16) requires 891 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 1159+ GB is recommended. No single GPU has enough memory — multi-GPU or cluster setups are needed.

Related Models

Frequently Asked Questions

How much VRAM does Llama 3.1 405B need?

Llama 3.1 405B requires 891 GB of VRAM at BF16.

VRAM = Weights + KV Cache + Overhead

Weights = 405B × 16 bits ÷ 8 = 810 GB

KV Cache + Overhead 81 GB (at 2K context + ~0.3 GB framework)

VRAM usage by quantization

891.0 GB

Learn more about VRAM estimation →

Can NVIDIA GeForce RTX 5090 run Llama 3.1 405B?

No — Llama 3.1 405B requires at least 891 GB at BF16, which exceeds the NVIDIA GeForce RTX 5090's 32 GB of VRAM.

Can I run Llama 3.1 405B on a Mac?

Llama 3.1 405B requires at least 891 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.

Can I run Llama 3.1 405B locally?

Yes — Llama 3.1 405B can run locally on consumer hardware. At BF16 quantization it needs 891 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.

What's the download size of Llama 3.1 405B?

At BF16, the download is about 810.00 GB.