All LLM Models
Browse 15 LLM models with VRAM requirements, quantization options, and hardware compatibility.
Understanding LLM VRAM Requirements
How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.
Model List
DeepSeek R1 0528 Qwen3 8B
DeepSeek · 8.2B · runs from 2.9 GB
DeepSeek R1 0528 Qwen3 8B is a 8.2B-parameter open language model from DeepSeek in the DeepSeek R1 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Distill Qwen 14B
DeepSeek · 14.8B · runs from 5.1 GB
DeepSeek R1 Distill Qwen 14B sits in a sweet spot between the smaller 7B distill and the more demanding 32B version, offering strong reasoning performance at 14.8 billion parameters on the Qwen 2.5 architecture. It captures a meaningful share of the full R1's chain-of-thought capabilities while keeping resource requirements within the range of mainstream consumer GPUs. Quantized to 4-bit, it fits comfortably on GPUs with 12 GB of VRAM, delivering reliable step-by-step reasoning for math, logic, and analytical problems.
DeepSeek R1 Distill Qwen 1.5B
DeepSeek · 1.8B · runs from 0.8 GB
DeepSeek R1 Distill Qwen 1.5B is the smallest model in the R1 distillation family, packing chain-of-thought reasoning capabilities into just 1.5 billion parameters using the Qwen 2.5 architecture. It represents an ambitious attempt to bring structured reasoning to the smallest practical model size. At this scale, the model can run on virtually any modern GPU and even on CPU-only setups with acceptable speed. While its reasoning depth is naturally limited compared to its larger siblings, it still demonstrates structured thinking patterns that set it apart from generic models of similar size.
DeepSeek R1 Distill Qwen 7B
DeepSeek · 7.6B · runs from 3.0 GB
DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.
DeepSeek R1 Distill Llama 8B
DeepSeek · 8.0B · runs from 2.8 GB
DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.
Deepseek Coder 6.7B Instruct
DeepSeek · 6.7B · runs from 4.2 GB
DeepSeek Coder 6.7B Instruct is a first-generation code-specialized model trained on a large corpus of source code and programming-related data. At 6.7 billion parameters, it provides solid code completion, generation, and explanation capabilities across popular programming languages while remaining small enough to run on most consumer GPUs. While newer models in the DeepSeek lineup have surpassed it in raw capability, this model remains a practical choice for users who need a lightweight local coding assistant with minimal hardware requirements. It runs well on GPUs with as little as 6 GB of VRAM when quantized.
DeepSeek Coder v2 Lite Instruct
DeepSeek · 15.7B · runs from 7.2 GB
DeepSeek Coder V2 Lite Instruct is a code-focused mixture-of-experts model with 15.7 billion total parameters, trained to handle both programming tasks and general conversation. It supports a wide range of programming languages and excels at code generation, debugging, explanation, and refactoring. The MoE architecture keeps compute costs manageable despite the model's broad capabilities, and the Lite variant is sized to run on a single consumer GPU. For developers looking for a capable local coding assistant that can also handle general chat, this model offers an appealing combination of code specialization and practical hardware requirements.
Deepseek Coder 1.3B Instruct
DeepSeek · 1.3B · runs from 1.3 GB
DeepSeek Coder 1.3B Instruct is an ultra-compact code model designed for environments where hardware resources are extremely limited. Despite having just 1.3 billion parameters, it can handle basic code completion, simple generation tasks, and code Q&A across common programming languages. This is one of the smallest viable code models available, capable of running on integrated graphics or very low-end dedicated GPUs. It is well suited for edge deployment, embedded development environments, or as a fast local autocomplete engine where response speed matters more than handling complex multi-file reasoning tasks.
DeepSeek v2 Lite Chat
DeepSeek · 15.7B · runs from 5.1 GB
DeepSeek v2 Lite Chat is a 15.7B-parameter open language model from DeepSeek in the DeepSeek V2 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek v2 Lite
DeepSeek · 15.7B · runs from 7.4 GB
DeepSeek V2 Lite is a compact mixture-of-experts model with 15.7 billion total parameters, designed to deliver a strong quality-to-compute ratio for general chat and instruction following. It uses the same innovative MLA (Multi-Head Latent Attention) architecture as the larger V2, which reduces memory requirements during inference. With its modest parameter count, V2 Lite runs comfortably on a single consumer GPU, making it accessible to users who want to try DeepSeek's MoE approach without needing specialized hardware. It handles everyday conversational tasks, summarization, and light analysis well, offering a practical entry point into the DeepSeek model family.
Deepseek Llm 7B Base
DeepSeek · 7B · runs from 4.3 GB
Deepseek Llm 7B Base is a 7B-parameter open language model from DeepSeek in the DeepSeek family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Moe 16B Base
DeepSeek · 16.4B · runs from 7.7 GB
Deepseek Moe 16B Base is a 16.4B-parameter open language model from DeepSeek in the DeepSeek family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 1.3B Base
DeepSeek · 1.3B · runs from 1.3 GB
Deepseek Coder 1.3B Base is a 1.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 7B Instruct V1.5
DeepSeek · 6.9B · runs from 4.2 GB
Deepseek Coder 7B Instruct V1.5 is a 6.9B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek Coder v2 Lite Base
DeepSeek · 15.7B · runs from 7.4 GB
DeepSeek Coder v2 Lite Base is a 15.7B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.