Can I run Meta Llama Guard 2 8B on a Mac?

Meta Llama Guard 2 8B requires at least 17.7 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.

Can I run Meta Llama Guard 2 8B locally?

Yes — Meta Llama Guard 2 8B can run locally on consumer hardware. At BF16 quantization it needs 17.7 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.

How fast is Meta Llama Guard 2 8B?

At BF16, Meta Llama Guard 2 8B can reach ~249 tok/s on AMD Instinct MI350X. On NVIDIA GeForce RTX 4090: ~37 tok/s. Speed depends mainly on GPU memory bandwidth. Real-world results typically within ±20%.

What's the download size of Meta Llama Guard 2 8B?

At BF16, the download is about 16.06 GB.

Meta·Llama

Meta Llama Guard 2 8B — Hardware Requirements & GPU Compatibility

Chat

Meta Llama Guard 2 8B is a 8.0B-parameter open language model from Meta in the Llama family. At BF16 it needs about 17.67 GB of VRAM — see which GPUs and Macs can run it below.

8.3K downloads 307 likesApr 2024

Specifications

Publisher: Meta
Family: Llama
Parameters: 8.0B
Release Date: 2024-04-17
License: Llama 3 Community

Get Started

HuggingFace

meta-llama/Meta-Llama-Guard-2-8B

How Much VRAM Does Meta Llama Guard 2 8B Need?

Select a quantization to see compatible GPUs below.

Quantization	Bits	VRAM	+ Context	File Size	Quality
BF16est.	16.00	17.7 GB	—	16.06 GB	Brain floating point 16 — preferred for training

est.= calculated VRAM estimate; no published GGUF file found for that quantization yet. Other rows are verified against real community uploads.

Which GPUs Can Run Meta Llama Guard 2 8B?

BF16 · 17.7 GB

Show professional

Meta Llama Guard 2 8B (BF16) requires 17.7 GB of VRAM to load the model weights. For comfortable inference with headroom for KV cache and system overhead, 23+ GB is recommended. 8 GPUs can run it, including NVIDIA GeForce RTX 5090, NVIDIA GeForce RTX 3090 Ti.

Runs great

— Plenty of headroom

NVIDIA GeForce RTX 5090~66 tok/s

Decent

— Enough VRAM, may be tight

NVIDIA GeForce RTX 3090 Ti~37 tok/s NVIDIA GeForce RTX 4090~37 tok/s NVIDIA GeForce RTX 3090~34 tok/s NVIDIA GeForce RTX 5090 Laptop GPU~33 tok/s AMD Radeon RX 7900 XTX~30 tok/s AMD Radeon RX 7900 XT~25 tok/s NVIDIA TITAN RTX~25 tok/s

Which Devices Can Run Meta Llama Guard 2 8B?

BF16 · 17.7 GB

41 devices with unified memory can run Meta Llama Guard 2 8B, including NVIDIA DGX H100, NVIDIA DGX A100 640GB, Mac Mini M4 Pro (24 GB).

Runs great

— Plenty of headroom

NVIDIA DGX H100~986 tok/s NVIDIA DGX A100 640GB~600 tok/s Mac Studio (M3 Ultra, 256GB)~32 tok/s Mac Studio (M3 Ultra, 512GB)~32 tok/s Mac Studio (M3 Ultra, 96GB)~32 tok/s Mac Pro M2 Ultra (192 GB)~32 tok/s Mac Studio M2 Ultra (192 GB)~32 tok/s MacBook Pro 16" M5 Max (128 GB)~24 tok/s Mac Studio M4 Max (128 GB)~22 tok/s Mac Studio M4 Max (64 GB)~22 tok/s MacBook Pro 16" M4 Max (48 GB)~22 tok/s MacBook Pro 16" M4 Max (64 GB)~22 tok/s Mac Studio M4 Max (36 GB)~16 tok/s MacBook Pro 14" M4 Max (36 GB)~16 tok/s MacBook Pro 16" M3 Max (48 GB)~16 tok/s MacBook Pro 14-inch (M5 Pro)~12 tok/s Mac Mini M4 Pro (48 GB)~11 tok/s ASUS Ascent GX10~10 tok/s NVIDIA DGX Spark~10 tok/s NVIDIA Jetson AGX Thor Developer Kit~10 tok/s Asus ROG Flow Z13 (2025, Ryzen AI Max+ 395, 128 GB)~9 tok/s Beelink GTR9 Pro (Ryzen AI Max+ 395, 128 GB)~9 tok/s Framework Desktop (Ryzen AI Max+ 395, 128 GB)~9 tok/s GMKtec EVO-X2 (Ryzen AI Max+ 395, 128 GB)~9 tok/s HP Z2 Mini G1a (Ryzen AI Max+ PRO 395, 128 GB)~9 tok/s HP ZBook Ultra G1a 14 (Ryzen AI Max+ PRO 395, 128 GB)~9 tok/s Minisforum MS-S1 MAX (Ryzen AI Max+ 395, 128 GB)~9 tok/s Snapdragon X2 Elite Extreme Copilot+ PC~8 tok/s NVIDIA Jetson AGX Orin 32GB~8 tok/s NVIDIA Jetson AGX Orin 64GB~8 tok/s Mac Mini M4 (32 GB)~5 tok/s

Decent

— Enough memory, may be tight

Mac Mini M4 Pro (24 GB)~11 tok/s MacBook Pro 14" M4 Pro (24 GB)~11 tok/s MacBook Pro 16" M4 Pro (24 GB)~11 tok/s MacBook Pro 14-inch (M5)~6 tok/s Snapdragon X Elite Copilot+ PC~5 tok/s MacBook Air 13" M4 (24 GB)~5 tok/s MacBook Air 15" M4 (24 GB)~5 tok/s MacBook Air 13" M3 (24 GB)~4 tok/s Intel Core Ultra 9 288V (Lunar Lake) Laptop~4 tok/s AMD Ryzen AI 9 HX 370 (Strix Point) Laptop~4 tok/s

Frequently Asked Questions

How much VRAM does Meta Llama Guard 2 8B need?: Meta Llama Guard 2 8B requires 17.7 GB of VRAM at BF16.
VRAM = Weights + KV Cache + Overhead
Weights = 8.0B × 16 bits ÷ 8 = 16.1 GB
KV Cache + Overhead ≈ 1.6 GB (at 2K context + ~0.3 GB framework)
VRAM usage by quantization
BF16
17.7 GB
Learn more about VRAM estimation →
Can I run Meta Llama Guard 2 8B on a Mac?: Meta Llama Guard 2 8B requires at least 17.7 GB at BF16, which exceeds the unified memory of most consumer Macs. You would need a Mac Studio or Mac Pro with a high-memory configuration.
Can I run Meta Llama Guard 2 8B locally?: Yes — Meta Llama Guard 2 8B can run locally on consumer hardware. At BF16 quantization it needs 17.7 GB of VRAM. Popular tools include Ollama, LM Studio, and llama.cpp.
How fast is Meta Llama Guard 2 8B?: At BF16, Meta Llama Guard 2 8B can reach ~249 tok/s on AMD Instinct MI350X. On NVIDIA GeForce RTX 4090: ~37 tok/s. Speed depends mainly on GPU memory bandwidth. Real-world results typically within ±20%.
tok/s = (bandwidth GB/s ÷ model GB) × efficiency
Example: NVIDIA B200 → 8000 ÷ 17.7 × 0.65 = ~294 tok/s
Estimated speed at BF16 (17.7 GB)
NVIDIA B200
~294 tok/s
NVIDIA GeForce RTX 4090
~37 tok/s
NVIDIA B300
~294 tok/s
AMD Instinct MI350X
~249 tok/s
Real-world results typically within ±20%. Speed depends on batch size, quantization kernel, and software stack.
Learn more about tok/s estimation →
What's the download size of Meta Llama Guard 2 8B?: At BF16, the download is about 16.06 GB.
Which GPUs can run Meta Llama Guard 2 8B?: 8 consumer GPUs can run Meta Llama Guard 2 8B at BF16 (17.7 GB). Top options include NVIDIA GeForce RTX 5090, AMD Radeon RX 7900 XT, AMD Radeon RX 7900 XTX. 1 GPU have plenty of headroom for comfortable inference.
Which devices can run Meta Llama Guard 2 8B?: 41 devices with unified memory can run Meta Llama Guard 2 8B at BF16 (17.7 GB), including AMD Ryzen AI 9 HX 370 (Strix Point) Laptop, ASUS Ascent GX10, Asus ROG Flow Z13 (2025, Ryzen AI Max+ 395, 128 GB), Beelink GTR9 Pro (Ryzen AI Max+ 395, 128 GB). Apple Silicon Macs use unified memory shared between CPU and GPU, making them well-suited for local LLM inference.

Meta Llama Guard 2 8B — Hardware Requirements & GPU Compatibility

Specifications

Get Started

HuggingFace

How Much VRAM Does Meta Llama Guard 2 8B Need?

Which GPUs Can Run Meta Llama Guard 2 8B?

Runs great

Decent

Which Devices Can Run Meta Llama Guard 2 8B?

Runs great

Decent

Related Models

Frequently Asked Questions