All LLM Models
Browse 36 LLM models with VRAM requirements, quantization options, and hardware compatibility.
Understanding LLM VRAM Requirements
How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.
Model List
Llama 2 13B HF
Meta · 13.0B · runs from 6.1 GB
Llama 2 13B HF is a 13.0B-parameter open language model from Meta in the Llama 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Llama 2 70B HF
Meta · 69.0B · runs from 151.8 GB
Llama 2 70B HF is a 69.0B-parameter open language model from Meta in the Llama 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Meta Llama Guard 2 8B
Meta · 8.0B · runs from 17.7 GB
Meta Llama Guard 2 8B is a 8.0B-parameter open language model from Meta in the Llama family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Llama 3.2 90B Vision Instruct
Meta · 88.6B · runs from 194.9 GB
Llama 3.2 90B Vision Instruct is a 88.6B-parameter open language model from Meta in the Llama 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
KernelLLM
Meta · 8.0B · runs from 4.0 GB
KernelLLM is a 8.0B-parameter open language model from Meta. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
MobileLLM R1.5 950M
Meta · 950M · runs from 2.1 GB
MobileLLM R1.5 950M is a 950M-parameter open language model from Meta. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.