All LLM Models

Browse 51 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Featured only

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

QwQ 32B

Alibaba · 32B

52.6K 2.9K

QwQ 32B is a 32-billion parameter reasoning-focused model from Alibaba Cloud's Qwen family. Unlike standard chat models, QwQ is specifically optimized for step-by-step logical reasoning, complex problem solving, and mathematical tasks. It employs extended chain-of-thought processing, generating detailed internal reasoning before producing final answers, which significantly improves accuracy on challenging analytical problems. The model requires a GPU with at least 24GB of VRAM for quantized inference and delivers reasoning performance competitive with much larger models. It is particularly well suited for users who need strong analytical capabilities for math, science, coding logic, and multi-step problem solving. Released under the Apache 2.0 license.

ChatReasoning

Qwen2.5 Coder 32B Instruct

Alibaba · 32.8B

761.3K 2.0K

Qwen2.5 Coder 32B Instruct is a 32.8-billion parameter code-specialized model from Alibaba Cloud, instruction-tuned for programming assistance and code generation. It is trained on a large corpus of source code alongside natural language data, making it highly capable for tasks such as code completion, debugging, code explanation, and software engineering dialogue. The model supports a 128K token context window and delivers code generation quality competitive with the best open-weight coding models at any scale. It requires a GPU with at least 24GB of VRAM for quantized inference. Released under the Apache 2.0 license.

ChatCode

Qwen3 Coder 480B A35B Instruct

Alibaba · 480.2B

76.6K 1.3K

Qwen3 Coder 480B A35B Instruct is Alibaba's largest code-specialized model, a massive 480.2-billion-parameter mixture-of-experts system with roughly 35 billion parameters active per token. This is the most powerful open-weight coding model in the Qwen3 family, designed for professional-grade code generation, analysis, and software engineering tasks. Running this model locally is a serious undertaking that requires multi-GPU server-class hardware with several hundred gigabytes of combined VRAM. For users with access to such infrastructure, it offers exceptional code quality and understanding that rivals leading proprietary coding assistants, all while keeping data and computation entirely under local control.

ChatCode

Qwen3 0.6B

Alibaba · 752M

12.2M 1.1K

Qwen3 0.6B is the smallest instruction-tuned model in Alibaba Cloud's Qwen 3 family, with approximately 752 million parameters. It is designed for ultra-lightweight deployment where minimal hardware resources are available, running comfortably on virtually any modern GPU or CPU-only setups. The model supports hybrid thinking mode despite its tiny footprint. While limited in reasoning depth compared to larger variants, Qwen3 0.6B handles basic chat, simple summarization, and lightweight instruction following. It is primarily useful for edge deployment, rapid prototyping, and experimentation where model size is a critical constraint. Released under the Apache 2.0 license.

All LLM Models

Understanding LLM VRAM Requirements

Model List

QwQ 32B

Qwen2.5 Coder 32B Instruct

Qwen3 Coder 480B A35B Instruct

Qwen3 0.6B

Qwen2.5 7B Instruct

Qwen3 Coder Next

Qwen3 235B A22B

Qwen3 8B

Qwen3 Coder 30B A3B Instruct

Qwen3 Next 80B A3B Instruct

Qwen2.5 72B Instruct

Qwen3 30B A3B

Qwen3 30B A3B Instruct 2507

Qwen3 4B Instruct 2507

Qwen3 235B A22B Instruct 2507

Qwen3 32B

Qwen2.5 Coder 7B Instruct

Qwen2.5 1.5B Instruct

Qwen3 4B

Qwen3 4B Thinking 2507

Qwen2.5 0.5B Instruct

Qwen3 1.7B

Qwen2.5 3B Instruct

Qwen3 235B A22B Thinking 2507

Qwen2.5 0.5B

Qwen3 14B

Qwen3 30B A3B Thinking 2507

Qwen2.5 32B Instruct

Qwen2.5 14B Instruct

Qwen1.5 MoE A2.7B