All LLM Models

Browse 80 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Model List

CodeQwen1.5 7B

Alibaba · 7.3B · runs from 3.5 GB

2.2K 103

CodeQwen1.5 7B is a 7.3B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen1.5 14B

Alibaba · 14.2B · runs from 8 GB

9.9K 41

Qwen1.5 14B is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen 14B

Alibaba · 14.2B · runs from 6.6 GB

1.8K 214

Qwen 14B is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen1.5 32B

Alibaba · 32.5B · runs from 14.3 GB

9.5K 85

Qwen1.5 32B is a 32.5B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

QwQ 32B Preview

Alibaba · 32.8B · runs from 14.8 GB

20.8K 1.7K

QwQ 32B Preview is a 32.8B-parameter open language model from Alibaba in the QwQ family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen 1 8B

Alibaba · 1.8B · runs from 0.9 GB

1.7K 73

Qwen 1 8B is a 1.8B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen3.5 4B

Alibaba · 4.7B · runs from 2.5 GB

9.0M 632

Qwen3.5 4B is a 4.7B-parameter open language model from Alibaba in the Qwen 3.5 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Vision

Qwen3.5 9B

Alibaba · 9.7B · runs from 4.7 GB

8.5M 1.6K

Qwen3.5 9B is a 9.7B-parameter open language model from Alibaba in the Qwen 3.5 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Vision

Qwen3.5 0.8B

Alibaba · 873M · runs from 0.7 GB

2.4M 570

Qwen3.5 0.8B is a 873M-parameter open language model from Alibaba in the Qwen 3.5 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Vision

Qwen1.5 MoE A2.7B

Alibaba · 14.3B · runs from 6.8 GB

181.8K 225

Qwen1.5 MoE A2.7B is a Mixture of Experts (MoE) model from Alibaba Cloud's Qwen 1.5 generation, with 14.3 billion total parameters but only 2.7 billion active parameters per forward pass. The MoE architecture allows it to deliver performance closer to dense 7B models while requiring less compute during inference, as only a subset of expert layers are activated for each token. The model supports a 32K token context window and requires VRAM proportional to its total parameter count for loading, despite lower compute cost per token. It is an interesting architectural variant for users exploring efficient inference and MoE models locally. Released under a custom Qwen license.

Chat

Qwen2 1.5B

Alibaba · 1.5B · runs from 1.0 GB

108.4K 100

Qwen2 1.5B is a 1.5-billion parameter base (pretrained) model from Alibaba Cloud's older Qwen 2 generation. It was trained on a multilingual corpus and supports a context window of up to 32K tokens. As a base model, it is designed for fine-tuning and research rather than direct conversational use. While superseded by the Qwen 2.5 series in terms of training data quality and benchmark performance, Qwen2 1.5B remains a lightweight option for experimentation and as a baseline for comparison. Released under the Apache 2.0 license.

Chat

Qwen3 14B Base

Alibaba · 14.8B · runs from 6.9 GB

45.2K 50

Qwen3 14B Base is a 14.8B-parameter open language model from Alibaba in the Qwen 3 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen3 30B A3B Base

Alibaba · 30.5B · runs from 13.4 GB

44.9K 70

Qwen3 30B A3B Base is a 30.5B-parameter open language model from Alibaba in the Qwen 3 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen1.5 MoE A2.7B Chat

Alibaba · 2.7B · runs from 1.9 GB

30.4K 133

Qwen1.5 MoE A2.7B Chat is a 2.7B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen2.5 Coder 0.5B

Alibaba · 494M · runs from 0.5 GB

27.0K 56

Qwen2.5 Coder 0.5B is a 494-million parameter code-specialized model from Alibaba Cloud, the smallest in the Qwen 2.5 Coder series. It is designed for ultra-lightweight deployment where code-aware text generation is needed with minimal hardware resources. The model runs on virtually any GPU and even on CPU-only setups. While limited in capability compared to larger coding models, it is useful for basic code completion, prototyping, and experimentation. It supports a 128K token context window. Released under the Apache 2.0 license.

ChatCode

Qwen1.5 1.8B

Alibaba · 1.8B · runs from 1.5 GB

21.5K 58

Qwen1.5 1.8B is a 1.8B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen2 57B A14B Instruct

Alibaba · 57.4B · runs from 24.8 GB

16.0K 83

Qwen2 57B A14B Instruct is a 57.4B-parameter open language model from Alibaba in the Qwen 2 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen2.5 Coder 14B

Alibaba · 14.8B · runs from 7.0 GB

7.6K 75

Qwen2.5 Coder 14B is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

WebWorld 8B

Alibaba · 8.2B · runs from 4.1 GB

2.5K 59

WebWorld 8B is a 8.2B-parameter open language model from Alibaba. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

WebWorld 32B

Alibaba · 32.8B · runs from 14.6 GB

1.1K 64

WebWorld 32B is a 32.8B-parameter open language model from Alibaba. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat