All LLM Models

Browse 32 LLM models with VRAM requirements, quantization options, and hardware compatibility.

Featured only

Understanding LLM VRAM Requirements

How much VRAM you need depends on the model size and quantization level. Quantization reduces the precision of model weights, trading small quality losses for significantly lower VRAM usage. For example, a 7B parameter model needs ~14 GB at FP16 but only ~4 GB at Q4_K_M quantization.

DeepSeek V4 Flash

DeepSeek · 158.1B · runs from 43.8 GB

2.2M 1.5K

DeepSeek V4 Flash is a 158.1B-parameter open language model from DeepSeek in the DeepSeek V4 family. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

DeepSeek R1 0528 Qwen3 8B

DeepSeek · 8.2B · runs from 2.9 GB

337.8K 1.1K

DeepSeek R1 0528 Qwen3 8B is a 8.2B-parameter open language model from DeepSeek in the DeepSeek R1 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

DeepSeek R1 0528

DeepSeek · 684.5B · runs from 192.1 GB

6.8M 2.5K

DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.

ChatReasoning

DeepSeek v3 0324

DeepSeek · 684.5B · runs from 192.1 GB

896.3K 3.1K

DeepSeek V3 0324 is DeepSeek's flagship general-purpose chat model, featuring a 684.5 billion parameter mixture-of-experts architecture with roughly 37 billion parameters active per token. It delivers strong performance across a wide range of tasks including conversation, writing, analysis, coding, and instruction following, competing with the best closed-source models available. Like other large MoE models, V3 requires substantial memory to load all expert weights even though only a fraction are used during inference. Quantized versions make it feasible on multi-GPU setups, and its combination of broad capability with open weights has made it one of the most widely deployed open models for local and self-hosted use.

Chat

DeepSeek R1 Distill Qwen 14B

DeepSeek · 14.8B · runs from 5.1 GB

742.0K 613

DeepSeek R1 Distill Qwen 14B sits in a sweet spot between the smaller 7B distill and the more demanding 32B version, offering strong reasoning performance at 14.8 billion parameters on the Qwen 2.5 architecture. It captures a meaningful share of the full R1's chain-of-thought capabilities while keeping resource requirements within the range of mainstream consumer GPUs. Quantized to 4-bit, it fits comfortably on GPUs with 12 GB of VRAM, delivering reliable step-by-step reasoning for math, logic, and analytical problems.

ChatReasoning

DeepSeek R1 Distill Qwen 1.5B

DeepSeek · 1.8B · runs from 0.8 GB

681.8K 1.5K

DeepSeek R1 Distill Qwen 1.5B is the smallest model in the R1 distillation family, packing chain-of-thought reasoning capabilities into just 1.5 billion parameters using the Qwen 2.5 architecture. It represents an ambitious attempt to bring structured reasoning to the smallest practical model size. At this scale, the model can run on virtually any modern GPU and even on CPU-only setups with acceptable speed. While its reasoning depth is naturally limited compared to its larger siblings, it still demonstrates structured thinking patterns that set it apart from generic models of similar size.

ChatReasoning

DeepSeek R1 Distill Qwen 32B

DeepSeek · 32.8B · runs from 9.8 GB

490.3K 1.6K

DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.

ChatReasoning

DeepSeek R1

DeepSeek · 684.5B · runs from 192.1 GB

5.6M 13.4K

DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.

ChatReasoning

DeepSeek R1 Distill Qwen 7B

DeepSeek · 7.6B · runs from 3.0 GB

478.1K 842

DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.

ChatReasoning

DeepSeek R1 Distill Llama 8B

DeepSeek · 8.0B · runs from 2.8 GB

439.0K 864

DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.

ChatReasoning

DeepSeek R1 Distill Llama 70B

DeepSeek · 70.6B · runs from 20.4 GB

119.2K 778

DeepSeek R1 Distill Llama 70B is the largest model in the R1 distillation lineup, combining the reasoning capabilities developed in the full 684.5B R1 with the robust Llama 3.1 70B architecture. At 70 billion parameters, it delivers the strongest reasoning performance of any dense R1 distill, approaching the full R1's quality on many math and coding benchmarks. Running this model locally requires a multi-GPU setup or a single GPU with very high VRAM capacity, though quantized versions can fit on hardware with 48 GB or more. For users who need top-tier open-weight reasoning and have the hardware to support a 70B dense model, this is one of the strongest options available.

ChatReasoning

Deepseek Coder 6.7B Instruct

DeepSeek · 6.7B · runs from 4.2 GB

143.7K 496

DeepSeek Coder 6.7B Instruct is a first-generation code-specialized model trained on a large corpus of source code and programming-related data. At 6.7 billion parameters, it provides solid code completion, generation, and explanation capabilities across popular programming languages while remaining small enough to run on most consumer GPUs. While newer models in the DeepSeek lineup have surpassed it in raw capability, this model remains a practical choice for users who need a lightweight local coding assistant with minimal hardware requirements. It runs well on GPUs with as little as 6 GB of VRAM when quantized.

ChatCode

DeepSeek V4 Pro

DeepSeek · 861.6B · runs from 366.5 GB

3.4M 4.8K

DeepSeek V4 Pro is a 861.6B-parameter open language model from DeepSeek in the DeepSeek V4 family. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

DeepSeek Coder v2 Lite Instruct

DeepSeek · 15.7B · runs from 7.2 GB

894.1K 609

DeepSeek Coder V2 Lite Instruct is a code-focused mixture-of-experts model with 15.7 billion total parameters, trained to handle both programming tasks and general conversation. It supports a wide range of programming languages and excels at code generation, debugging, explanation, and refactoring. The MoE architecture keeps compute costs manageable despite the model's broad capabilities, and the Lite variant is sized to run on a single consumer GPU. For developers looking for a capable local coding assistant that can also handle general chat, this model offers an appealing combination of code specialization and practical hardware requirements.

ChatCode

Deepseek Coder 33B Instruct

DeepSeek · 33.3B · runs from 14.6 GB

6.2K 573

Deepseek Coder 33B Instruct is a 33.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

DeepSeek V3.1

DeepSeek · 684.5B · runs from 192.1 GB

217.9K 824

DeepSeek V3.1 is a 684.5B-parameter open language model from DeepSeek in the DeepSeek V3 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Deepseek Coder 1.3B Instruct

DeepSeek · 1.3B · runs from 1.3 GB

43.3K 167

DeepSeek Coder 1.3B Instruct is an ultra-compact code model designed for environments where hardware resources are extremely limited. Despite having just 1.3 billion parameters, it can handle basic code completion, simple generation tasks, and code Q&A across common programming languages. This is one of the smallest viable code models available, capable of running on integrated graphics or very low-end dedicated GPUs. It is well suited for edge deployment, embedded development environments, or as a fast local autocomplete engine where response speed matters more than handling complex multi-file reasoning tasks.

ChatCode

DeepSeek V3.2

DeepSeek · 685.4B · runs from 192.4 GB

3.4M 1.4K

DeepSeek V3.2 is the latest iteration of DeepSeek's general-purpose flagship, building on the V3 architecture with 685.4 billion total parameters in a mixture-of-experts configuration. This update refines the model's conversational abilities, instruction following, and multilingual performance compared to earlier V3 releases. Running V3.2 locally requires significant GPU resources due to the large total parameter count, though the MoE design means only a subset of parameters are active for any given token. Users with multi-GPU workstations or servers can run quantized versions effectively, making this one of the most powerful open-weight chat models available for self-hosted deployment.

Chat

DeepSeek v3

DeepSeek · 684.5B · runs from 294.8 GB

998.8K 4.1K

DeepSeek v3 is a 684.5B-parameter open language model from DeepSeek in the DeepSeek V3 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

DeepSeek V2.5

DeepSeek · 235.7B · runs from 67.7 GB

7.7K 734

DeepSeek V2.5 is a 235.7B-parameter open language model from DeepSeek in the DeepSeek V2 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

DeepSeek v2 Lite Chat

DeepSeek · 15.7B · runs from 5.1 GB

968.2K 141

DeepSeek v2 Lite Chat is a 15.7B-parameter open language model from DeepSeek in the DeepSeek V2 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

DeepSeek Coder v2 Instruct

DeepSeek · 235.7B · runs from 73.5 GB

4.0K 688

DeepSeek Coder v2 Instruct is a 235.7B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

DeepSeek v2 Lite

DeepSeek · 15.7B · runs from 7.4 GB

339.9K 178

DeepSeek V2 Lite is a compact mixture-of-experts model with 15.7 billion total parameters, designed to deliver a strong quality-to-compute ratio for general chat and instruction following. It uses the same innovative MLA (Multi-Head Latent Attention) architecture as the larger V2, which reduces memory requirements during inference. With its modest parameter count, V2 Lite runs comfortably on a single consumer GPU, making it accessible to users who want to try DeepSeek's MoE approach without needing specialized hardware. It handles everyday conversational tasks, summarization, and light analysis well, offering a practical entry point into the DeepSeek model family.

Chat

DeepSeek V3.2 Exp

DeepSeek · 685.4B · runs from 295.2 GB

174.1K 992

DeepSeek V3.2 Exp is a 685.4B-parameter open language model from DeepSeek in the DeepSeek V3 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Deepseek Llm 7B Base

DeepSeek · 7B · runs from 4.3 GB

39.1K 138

Deepseek Llm 7B Base is a 7B-parameter open language model from DeepSeek in the DeepSeek family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Deepseek Moe 16B Base

DeepSeek · 16.4B · runs from 7.7 GB

36.1K 149

Deepseek Moe 16B Base is a 16.4B-parameter open language model from DeepSeek in the DeepSeek family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

Deepseek Coder 1.3B Base

DeepSeek · 1.3B · runs from 1.3 GB

16.9K 111

Deepseek Coder 1.3B Base is a 1.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Deepseek Coder 7B Instruct V1.5

DeepSeek · 6.9B · runs from 4.2 GB

12.9K 147

Deepseek Coder 7B Instruct V1.5 is a 6.9B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

DeepSeek R1 Zero

DeepSeek · 684.5B · runs from 294.8 GB

8.8K 958

DeepSeek R1 Zero is a 684.5B-parameter open language model from DeepSeek in the DeepSeek R1 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

DeepSeek V3.2 Speciale

DeepSeek · 685.4B · runs from 295.2 GB

7.1K 709

DeepSeek V3.2 Speciale is a 685.4B-parameter open language model from DeepSeek in the DeepSeek V3 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

Chat

All LLM Models

Understanding LLM VRAM Requirements

Model List

DeepSeek V4 Flash

DeepSeek R1 0528 Qwen3 8B

DeepSeek R1 0528

DeepSeek v3 0324

DeepSeek R1 Distill Qwen 14B

DeepSeek R1 Distill Qwen 1.5B

DeepSeek R1 Distill Qwen 32B

DeepSeek R1

DeepSeek R1 Distill Qwen 7B

DeepSeek R1 Distill Llama 8B

DeepSeek R1 Distill Llama 70B

Deepseek Coder 6.7B Instruct

DeepSeek V4 Pro

DeepSeek Coder v2 Lite Instruct

Deepseek Coder 33B Instruct

DeepSeek V3.1

Deepseek Coder 1.3B Instruct

DeepSeek V3.2

DeepSeek v3

DeepSeek V2.5

DeepSeek v2 Lite Chat

DeepSeek Coder v2 Instruct

DeepSeek v2 Lite

DeepSeek V3.2 Exp

Deepseek Llm 7B Base

Deepseek Moe 16B Base

Deepseek Coder 1.3B Base

Deepseek Coder 7B Instruct V1.5

DeepSeek R1 Zero

DeepSeek V3.2 Speciale