All LLM Models

Browse 48 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Model List

Gemma 2B

Google · 2.5B · runs from 1.2 GB

248.0K 1.2K

Gemma 2B is a 2.5B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 2 9B

Google · 9.2B · runs from 4.2 GB

75.4K 709

Gemma 2 9B is a 9.2B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 7B IT

Google · 8.5B · runs from 4.0 GB

21.4K 1.2K

Google Gemma 7B IT is a 7-billion parameter instruction-tuned model from the original Gemma generation. It is fine-tuned for conversational use and general instruction following, running efficiently on consumer GPUs with 8GB or more of VRAM. As a first-generation Gemma model, it has been superseded by Gemma 2 and Gemma 3 models in quality and capability, but it remains well-supported by inference frameworks. Released under the Gemma license.

Chat

Gemma 4 12B IT Assistant

Google · 12B · runs from 5.4 GB

29.2K 82

Gemma 4 12B IT Assistant is a 12B-parameter open language model from Google in the Gemma 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 2 2B

Google · 2.6B · runs from 1.2 GB

206.8K 655

Google Gemma 2 2B is a 2-billion parameter base (pretrained) model from Google's Gemma 2 family. As a base model, it is not instruction-tuned and is intended for fine-tuning, research, and custom downstream applications. Its compact size makes it suitable for experimentation, rapid prototyping, and domain-specific fine-tuning on consumer hardware with minimal VRAM. Released under the Gemma license.

Chat

Gemma 4 12B

Google · 12.0B · runs from 6.1 GB

198.3K 525

Gemma 4 12B is a 12.0B-parameter open language model from Google in the Gemma 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 3 1B Pt

Google · 1000M · runs from 0.5 GB

47.6K 196

Gemma 3 1B Pt is a 1000M-parameter open language model from Google in the Gemma 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Codegemma 2B

Google · 2.5B · runs from 1.2 GB

31.0K 100

Codegemma 2B is a 2.5B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

T5gemma 2B 2B Ul2

Google · 5.6B · runs from 2.6 GB

10.1K 25

T5gemma 2B 2B Ul2 is a 5.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 2 2B Jpn IT

Google · 2.6B · runs from 5.8 GB

8.0K 217

Gemma 2 2B Jpn IT is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Txgemma 2B Predict

Google · 2.6B · runs from 1.2 GB

5.8K 56

Txgemma 2B Predict is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Shieldgemma 2B

Google · 2.6B · runs from 1.2 GB

4.6K 122

Shieldgemma 2B is a 2.6B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Vaultgemma 1B

Google · 1.0B · runs from 2.3 GB

4.2K 240

Vaultgemma 1B is a 1.0B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 3n E2B IT Litert Lm

Google · 2B · runs from 0.9 GB

3.8K 439

Gemma 3n E2B IT Litert Lm is a 2B-parameter open language model from Google in the Gemma 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Gemma 3n E4B IT Litert Lm

Google · 4B · runs from 1.9 GB

3.0K 414

Gemma 3n E4B IT Litert Lm is a 4B-parameter open language model from Google in the Gemma 3 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Codegemma 7B IT

Google · 8.5B · runs from 4.0 GB

2.6K 255

Codegemma 7B IT is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

T5gemma L L Ul2 IT

Google · 1.2B · runs from 2.7 GB

1.0K 6

T5gemma L L Ul2 IT is a 1.2B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

T5gemma B B Ul2 IT

Google · 591M · runs from 1.3 GB

317 6

T5gemma B B Ul2 IT is a 591M-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat