All LLM Models
Browse 39 LLM models with VRAM requirements, quantization options, and hardware compatibility.
Understanding LLM VRAM Requirements
How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.
Model List
Gemma 2 2B Jpn IT
Google · 2.6B · runs from 5.8 GB
Gemma 2 2B Jpn IT is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Txgemma 2B Predict
Google · 2.6B · runs from 1.2 GB
Txgemma 2B Predict is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Shieldgemma 2B
Google · 2.6B · runs from 1.2 GB
Shieldgemma 2B is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Vaultgemma 1B
Google · 1.0B · runs from 2.3 GB
Vaultgemma 1B is a 1.0B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Gemma 3n E2B IT Litert Lm
Google · 2B · runs from 0.9 GB
Gemma 3n E2B IT Litert Lm is a 2B-parameter open language model from Google in the Gemma 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Gemma 3n E4B IT Litert Lm
Google · 4B · runs from 1.9 GB
Gemma 3n E4B IT Litert Lm is a 4B-parameter open language model from Google in the Gemma 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Codegemma 7B IT
Google · 8.5B · runs from 4.0 GB
Codegemma 7B IT is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
T5gemma L L Ul2 IT
Google · 1.2B · runs from 2.7 GB
T5gemma L L Ul2 IT is a 1.2B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
T5gemma B B Ul2 IT
Google · 591M · runs from 1.3 GB
T5gemma B B Ul2 IT is a 591M-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.