All LLM Models

Browse 72 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Featured only

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

Qwen2.5 Coder 7B

Alibaba · 7.6B · runs from 3.6 GB

Qwen2.5 Coder 7B is a 7.6-billion parameter code-specialized base (pretrained) model from Alibaba Cloud's Qwen 2.5 Coder series. It is trained on a large dataset of source code and natural language but is not instruction-tuned, making it suitable for fine-tuning, code-related research, and custom downstream applications. The model supports a 128K token context window and runs efficiently on consumer GPUs. It serves as the foundation for the Qwen2.5 Coder 7B Instruct variant and community fine-tunes targeting specific programming languages or workflows. Released under the Apache 2.0 license.

Qwen3 30B A3B Thinking 2507

Alibaba · 30.5B · runs from 8.8 GB

Qwen3 30B A3B Thinking 2507 is the reasoning-focused variant of Alibaba's 30-billion-parameter mixture-of-experts model, updated in July 2025. Like its instruct sibling, it activates only about 3 billion parameters per token, keeping resource demands low while enabling multi-step reasoning and chain-of-thought problem solving. This thinking variant is designed for tasks that benefit from deliberate, step-by-step logic such as math, coding puzzles, and analytical questions. Its efficient MoE design means users with modest GPUs can still access strong reasoning capabilities without needing datacenter-class hardware.

Qwen2.5 Coder 3B

Alibaba · 3.1B · runs from 1.4 GB

Qwen2.5 Coder 3B is a 3.1B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen2.5 Coder 1.5B

Alibaba · 1.5B · runs from 1 GB

Qwen2.5 Coder 1.5B is a 1.5-billion parameter code-specialized model from Alibaba Cloud's Qwen 2.5 Coder series. It is the smallest Coder variant that balances meaningful code generation capability with extremely low resource requirements, running on GPUs with as little as 2-4GB of VRAM. The model is suitable for lightweight code completion, simple code generation tasks, and as a compact local coding assistant in resource-constrained environments. It supports a 128K token context window. Released under the Apache 2.0 license.

Qwen2.5 3B

Alibaba · 3.1B · runs from 1.6 GB

Qwen2.5 3B is a 3.1B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen3 0.6B Base

Alibaba · 596M · runs from 0.7 GB

Qwen3 0.6B Base is the smallest pretrained foundation model in Alibaba Cloud's Qwen 3 family, with approximately 600 million parameters. As a base model, it is not tuned for chat or instructions and is intended for fine-tuning, research, and experimentation. Its minimal size makes it suitable for rapid prototyping and resource-constrained training experiments. The model runs on virtually any hardware, including CPU-only setups. It is useful for educational purposes, architecture exploration, and as a compact foundation for task-specific fine-tuning where model size is a primary constraint. Released under the Apache 2.0 license.

Qwen3 1.7B Base

Alibaba · 1.7B · runs from 1.0 GB

Qwen3 1.7B Base is a 1.7-billion parameter pretrained foundation model from Alibaba Cloud's Qwen 3 family. It is a compact base model designed for fine-tuning, research, and custom applications rather than direct conversational use. Its small size makes it accessible for resource-constrained fine-tuning and rapid experimentation. The model can run on virtually any modern GPU and benefits from the improved pretraining data of the Qwen 3 generation. It is suitable as a lightweight foundation for domain-specific fine-tunes and student models in distillation pipelines. Released under the Apache 2.0 license.

Qwen2.5 7B

Alibaba · 7.6B · runs from 3.6 GB

Qwen2.5 7B is a 7.6B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen2.5 14B

Alibaba · 14.8B · runs from 6.8 GB

Qwen2.5 14B is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen3 4B Base

Alibaba · 4.0B · runs from 2.2 GB

Qwen3 4B Base is a 4.0B-parameter open language model from Alibaba in the Qwen 3 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen2.5 32B

Alibaba · 32.8B · runs from 14.3 GB

Qwen2.5 32B is a 32.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen2.5 1.5B

Alibaba · 1.5B · runs from 1 GB

Qwen2.5 1.5B is a 1.5B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 0.5B Chat

Alibaba · 620M · runs from 0.8 GB

Qwen1.5 0.5B Chat is an early-generation small language model from Alibaba's Qwen series with just 620 million parameters. As one of the smallest models in the Qwen family, it was designed to demonstrate that useful conversational ability is possible even at sub-billion parameter scales. This model runs easily on virtually any hardware including CPUs, older GPUs, and even mobile devices. While its capabilities are limited compared to larger Qwen models, it remains a useful option for embedded applications, rapid prototyping, or situations where minimal resource consumption is the top priority.

Qwen2.5 0.5B

Alibaba · 494M · runs from 0.5 GB

Qwen2.5 0.5B is the smallest base (pretrained) model in Alibaba Cloud's Qwen 2.5 family, with 494 million parameters. As a base model, it is not instruction-tuned and is intended for fine-tuning, research, and as a foundation for custom applications. It supports a 128K token context window. Its minimal size makes it suitable for experimentation, rapid prototyping, and resource-constrained fine-tuning tasks. The model can run on virtually any hardware. Released under the Apache 2.0 license.

Qwen3 8B Base

Alibaba · 8.2B · runs from 4.1 GB

Qwen3 8B Base is an 8.2-billion parameter pretrained foundation model from Alibaba Cloud's Qwen 3 series. As a base model, it is not instruction-tuned and is intended for fine-tuning, research, and as a starting point for custom downstream applications. It was trained on a large multilingual corpus with improved data quality and training methodology compared to the Qwen 2.5 generation. The model runs efficiently on consumer GPUs with 8GB or more of VRAM and serves as the foundation for the Qwen3 8B instruction-tuned variant and community fine-tunes. It is a strong choice for practitioners building specialized models through further training. Released under the Apache 2.0 license.

Qwen3Guard Gen 8B

Alibaba · 8.2B · runs from 4.1 GB

Qwen3Guard Gen 8B is a 8.2B-parameter open language model from Alibaba in the Qwen 3 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 14B Chat

Alibaba · 14.2B · runs from 8 GB

Qwen1.5 14B Chat is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen2 72B Instruct

Alibaba · 72.7B · runs from 21.0 GB

Qwen2 72B Instruct is a 72.7B-parameter open language model from Alibaba in the Qwen 2 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 7B Chat

Alibaba · 7.7B · runs from 4.7 GB

Qwen1.5 7B Chat is a 7.7B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 32B Chat

Alibaba · 32.5B · runs from 14.3 GB

Qwen1.5 32B Chat is a 32.5B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen 14B Chat

Alibaba · 14.2B · runs from 6.6 GB

Qwen 14B Chat is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 7B

Alibaba · 7.7B · runs from 4.7 GB

Qwen1.5 7B is a 7.7B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen 7B

Alibaba · 7.7B · runs from 3.6 GB

Qwen 7B is a 7.7B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

CodeQwen1.5 7B

Alibaba · 7.3B · runs from 3.5 GB

CodeQwen1.5 7B is a 7.3B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 14B

Alibaba · 14.2B · runs from 8 GB

Qwen1.5 14B is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen 14B

Alibaba · 14.2B · runs from 6.6 GB

Qwen 14B is a 14.2B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen1.5 32B

Alibaba · 32.5B · runs from 14.3 GB

Qwen1.5 32B is a 32.5B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

QwQ 32B Preview

Alibaba · 32.8B · runs from 14.8 GB

QwQ 32B Preview is a 32.8B-parameter open language model from Alibaba in the QwQ family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen 1 8B

Alibaba · 1.8B · runs from 0.9 GB

Qwen 1 8B is a 1.8B-parameter open language model from Alibaba in the Qwen family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Qwen3.5 4B

Alibaba · 4.7B · runs from 2.5 GB

Qwen3.5 4B is a 4.7B-parameter open language model from Alibaba in the Qwen 3.5 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.