All LLM Models

Browse 67 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Model List

Qwen3 30B A3B Base

Alibaba · 30.5B · runs from 13.4 GB

44.9K 70

Qwen3 30B A3B Base is a 30.5B-parameter open language model from Alibaba in the Qwen 3 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen1.5 MoE A2.7B Chat

Alibaba · 2.7B · runs from 1.9 GB

30.4K 133

Qwen1.5 MoE A2.7B Chat is a 2.7B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen2.5 Coder 0.5B

Alibaba · 494M · runs from 0.5 GB

27.0K 56

Qwen2.5 Coder 0.5B is a 494-million parameter code-specialized model from Alibaba Cloud, the smallest in the Qwen 2.5 Coder series. It is designed for ultra-lightweight deployment where code-aware text generation is needed with minimal hardware resources. The model runs on virtually any GPU and even on CPU-only setups. While limited in capability compared to larger coding models, it is useful for basic code completion, prototyping, and experimentation. It supports a 128K token context window. Released under the Apache 2.0 license.

ChatCode

Qwen1.5 1.8B

Alibaba · 1.8B · runs from 1.5 GB

21.5K 58

Qwen1.5 1.8B is a 1.8B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Qwen2.5 Coder 14B

Alibaba · 14.8B · runs from 7.0 GB

7.6K 75

Qwen2.5 Coder 14B is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

WebWorld 8B

Alibaba · 8.2B · runs from 4.1 GB

2.5K 59

WebWorld 8B is a 8.2B-parameter open language model from Alibaba. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

WebWorld 32B

Alibaba · 32.8B · runs from 14.6 GB

1.1K 64

WebWorld 32B is a 32.8B-parameter open language model from Alibaba. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat