Best Local LLMs for Coding in 2026
These are the open-weight code models you can run on your own hardware — ranked by real-world popularity. Local coding models keep your codebase private, work offline, and cost nothing per token. For most developers a 7B–32B model at Q4_K_M is the sweet spot: small enough to fit a single consumer GPU, capable enough for autocomplete, refactors, and agentic coding. Pick a model below to see exactly which GPU or Mac runs it and how fast.
77 Coding Models You Can Run Locally
Qwen2.5 Coder 14B Instruct
Alibaba · 14.8B
Qwen2.5 Coder 14B Instruct is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen2.5 Coder 7B Instruct
Alibaba · 7.6B
Qwen2.5 Coder 7B Instruct is a 7.6-billion parameter code-specialized instruction-tuned model from Alibaba Cloud. It is trained on a large corpus of source code and natural language, fine-tuned for programming assistance tasks such as code generation, completion, debugging, and code explanation. The model supports a 128K token context window and runs efficiently on consumer GPUs with 8GB or more of VRAM. It provides a good balance between coding capability and hardware requirements for developers looking to run a local coding assistant. Released under the Apache 2.0 license.
Qwen3 Coder 30B A3B Instruct
Alibaba · 30.5B
Qwen3 Coder 30B A3B Instruct is a code-specialized Mixture of Experts (MoE) model from Alibaba Cloud's Qwen 3 Coder series, with 30 billion total parameters and approximately 3 billion active parameters per forward pass. The MoE architecture allows it to deliver strong coding performance while keeping per-token compute costs low, making it faster at inference than comparably capable dense models. The model is instruction-tuned for programming assistance, code generation, debugging, and software engineering conversation. It requires VRAM proportional to its total 30B parameter count for loading weights, but benefits from efficient inference throughput due to its low active parameter count. Released under the Apache 2.0 license.
Qwen2.5 Coder 32B Instruct
Alibaba · 32.8B
Qwen2.5 Coder 32B Instruct is a 32.8-billion parameter code-specialized model from Alibaba Cloud, instruction-tuned for programming assistance and code generation. It is trained on a large corpus of source code alongside natural language data, making it highly capable for tasks such as code completion, debugging, code explanation, and software engineering dialogue. The model supports a 128K token context window and delivers code generation quality competitive with the best open-weight coding models at any scale. It requires a GPU with at least 24GB of VRAM for quantized inference. Released under the Apache 2.0 license.
Phi 4 Mini Instruct
Microsoft · 3.8B
Microsoft Phi 4 Mini Instruct is a 3.8-billion parameter instruction-tuned model from Microsoft Research's Phi 4 family. It applies the Phi series' data-centric training philosophy to a compact model, delivering strong performance in coding, reasoning, and chat tasks relative to its small footprint. The model runs on consumer GPUs with as little as 4-6GB of VRAM when quantized, making it accessible on mainstream and even entry-level hardware. Released under the MIT license.
Qwen3 Coder Next
Alibaba · 79.7B
Qwen3 Coder Next is a 79.7-billion parameter code-specialized instruction-tuned model from Alibaba Cloud, the next generation of the Qwen Coder series. It is trained extensively on source code and programming-related data, delivering strong performance across code generation, completion, debugging, refactoring, and software engineering dialogue. The model represents a significant step up in coding capability within the Qwen family. Due to its large parameter count, running Qwen3 Coder Next locally requires substantial VRAM, typically 48GB or more at reduced precision, placing it in the territory of professional GPUs or multi-GPU consumer setups. It is a top-tier choice for developers who need the most capable local coding assistant available. Released under the Apache 2.0 license.
DeepSeek Coder v2 Lite Instruct
DeepSeek · 15.7B
DeepSeek Coder V2 Lite Instruct is a code-focused mixture-of-experts model with 15.7 billion total parameters, trained to handle both programming tasks and general conversation. It supports a wide range of programming languages and excels at code generation, debugging, explanation, and refactoring. The MoE architecture keeps compute costs manageable despite the model's broad capabilities, and the Lite variant is sized to run on a single consumer GPU. For developers looking for a capable local coding assistant that can also handle general chat, this model offers an appealing combination of code specialization and practical hardware requirements.
Phi 3.5 Mini Instruct
Microsoft · 3.8B
Phi 3.5 Mini Instruct is a 3.8B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4
Microsoft · 14.7B
Microsoft Phi 4 is a 14-billion parameter language model from Microsoft Research's Phi series, designed to deliver strong reasoning, mathematical, and coding performance at an efficient size. Phi 4 continues the Phi family's focus on maximizing capability per parameter through high-quality training data curation, achieving benchmark scores that rival much larger models on reasoning and STEM tasks. The model runs well on consumer GPUs with 12-16GB of VRAM in quantized formats. It excels at mathematical problem solving, code generation, and structured reasoning. Released under the MIT license.
Phi 3 Mini 4k Instruct
Microsoft · 3.8B
Microsoft Phi 3 Mini 4K Instruct is a 3.8-billion parameter instruction-tuned model from Microsoft Research's Phi 3 generation, with a 4K token context window. The Phi 3 family demonstrated that small models trained on carefully curated, high-quality data can achieve performance competitive with models several times their size. The model runs on consumer GPUs with as little as 4-6GB of VRAM when quantized, making it one of the most accessible capable chat models for local deployment. Released under the MIT license.
Qwen2.5 Coder 1.5B
Alibaba · 1.5B
Qwen2.5 Coder 1.5B is a 1.5-billion parameter code-specialized model from Alibaba Cloud's Qwen 2.5 Coder series. It is the smallest Coder variant that balances meaningful code generation capability with extremely low resource requirements, running on GPUs with as little as 2-4GB of VRAM. The model is suitable for lightweight code completion, simple code generation tasks, and as a compact local coding assistant in resource-constrained environments. It supports a 128K token context window. Released under the Apache 2.0 license.
Phi 2
Microsoft · 2.8B
Microsoft Phi 2 is a 2.8-billion parameter language model from Microsoft Research that pioneered the concept of small but highly capable language models. Released in late 2023, Phi 2 demonstrated that strategic data curation and training methodology could allow a sub-3B model to outperform many 7B and 13B models on reasoning and coding benchmarks. The model runs on virtually any modern GPU and even on CPU-only setups. While succeeded by Phi 3 and Phi 4, Phi 2 remains historically significant as the model that proved small-scale language models could be genuinely useful for practical tasks. Released under the MIT license.
Phi 3 Mini 128k Instruct
Microsoft · 3.8B
Phi 3 Mini 128k Instruct is a 3.8B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen2.5 Coder 3B Instruct
Alibaba · 3.1B
Qwen2.5 Coder 3B Instruct is a 3.1B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen2.5 Coder 7B
Alibaba · 7.6B
Qwen2.5 Coder 7B is a 7.6-billion parameter code-specialized base (pretrained) model from Alibaba Cloud's Qwen 2.5 Coder series. It is trained on a large dataset of source code and natural language but is not instruction-tuned, making it suitable for fine-tuning, code-related research, and custom downstream applications. The model supports a 128K token context window and runs efficiently on consumer GPUs. It serves as the foundation for the Qwen2.5 Coder 7B Instruct variant and community fine-tunes targeting specific programming languages or workflows. Released under the Apache 2.0 license.
Deepseek Coder 6.7B Instruct
DeepSeek · 6.7B
DeepSeek Coder 6.7B Instruct is a first-generation code-specialized model trained on a large corpus of source code and programming-related data. At 6.7 billion parameters, it provides solid code completion, generation, and explanation capabilities across popular programming languages while remaining small enough to run on most consumer GPUs. While newer models in the DeepSeek lineup have surpassed it in raw capability, this model remains a practical choice for users who need a lightweight local coding assistant with minimal hardware requirements. It runs well on GPUs with as little as 6 GB of VRAM when quantized.
MiMo V2.5 Pro
XiaomiMiMo · 1023.2B
MiMo V2.5 Pro is a 1023.2B-parameter open language model from XiaomiMiMo. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 1 5
Microsoft · 1.4B
Phi 1 5 is a 1.4B-parameter open language model from Microsoft in the Phi family. It supports a context window of up to 2,048 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Mini Reasoning
Microsoft · 3.8B
Phi 4 Mini Reasoning is a 3.8B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 1.3B Instruct
DeepSeek · 1.3B
DeepSeek Coder 1.3B Instruct is an ultra-compact code model designed for environments where hardware resources are extremely limited. Despite having just 1.3 billion parameters, it can handle basic code completion, simple generation tasks, and code Q&A across common programming languages. This is one of the smallest viable code models available, capable of running on integrated graphics or very low-end dedicated GPUs. It is well suited for edge deployment, embedded development environments, or as a fast local autocomplete engine where response speed matters more than handling complex multi-file reasoning tasks.
Qwen3 Coder 480B A35B Instruct
Alibaba · 480.2B
Qwen3 Coder 480B A35B Instruct is Alibaba's largest code-specialized model, a massive 480.2-billion-parameter mixture-of-experts system with roughly 35 billion parameters active per token. This is the most powerful open-weight coding model in the Qwen3 family, designed for professional-grade code generation, analysis, and software engineering tasks. Running this model locally is a serious undertaking that requires multi-GPU server-class hardware with several hundred gigabytes of combined VRAM. For users with access to such infrastructure, it offers exceptional code quality and understanding that rivals leading proprietary coding assistants, all while keeping data and computation entirely under local control.
Qwen2.5 Coder 0.5B
Alibaba · 494M
Qwen2.5 Coder 0.5B is a 494-million parameter code-specialized model from Alibaba Cloud, the smallest in the Qwen 2.5 Coder series. It is designed for ultra-lightweight deployment where code-aware text generation is needed with minimal hardware resources. The model runs on virtually any GPU and even on CPU-only setups. While limited in capability compared to larger coding models, it is useful for basic code completion, prototyping, and experimentation. It supports a 128K token context window. Released under the Apache 2.0 license.
Sqlcoder 7B 2
defog · 6.7B
SQLCoder 7B 2 is a 6.7-billion-parameter model from Defog, purpose-built for converting natural-language questions into SQL queries. Fine-tuned specifically on text-to-SQL tasks, it consistently outperforms much larger general-purpose models when the job is generating accurate, executable SQL against real database schemas. For developers and data analysts who regularly query databases, running SQLCoder locally means fast, private SQL generation without sending proprietary schema details to an external API. It works best when provided with table definitions as context and is particularly strong on PostgreSQL, MySQL, and SQLite dialects.
Phi 4 Reasoning Plus
Microsoft · 14.7B
Phi 4 Reasoning Plus is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Starcoder
BigCode · 15.8B
Starcoder is a 15.8B-parameter open language model from BigCode in the StarCoder family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Starcoder2 7B
BigCode · 7.2B
Starcoder2 7B is a 7.2B-parameter open language model from BigCode in the StarCoder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
LocoOperator 4B
LocoreMind · 4.0B
LocoOperator 4B is a 4.0B-parameter open language model from LocoreMind. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 1.3B Base
DeepSeek · 1.3B
Deepseek Coder 1.3B Base is a 1.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 7B Instruct V1.5
DeepSeek · 6.9B
Deepseek Coder 7B Instruct V1.5 is a 6.9B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 3 Medium 4k Instruct
Microsoft · 14.0B
Phi 3 Medium 4k Instruct is a 14.0B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Reasoning
Microsoft · 14.7B
Phi 4 Reasoning is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
IQuest Coder V1 40B Loop Instruct
IQuestLab · 39.8B
IQuest Coder V1 40B Loop Instruct is a 39.8B-parameter open language model from IQuestLab. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Starcoder2 15B
BigCode · 16.0B
Starcoder2 15B is a 16.0B-parameter open language model from BigCode in the StarCoder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen2.5 Coder 14B
Alibaba · 14.8B
Qwen2.5 Coder 14B is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 33B Instruct
DeepSeek · 33.3B
Deepseek Coder 33B Instruct is a 33.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Deepseek Coder 1.3B Kexer
JetBrains · 1.3B
Deepseek Coder 1.3B Kexer is a 1.3B-parameter open language model from JetBrains in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Codegemma 7B IT
Google · 8.5B
Codegemma 7B IT is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Codegemma 2B
Google · 2.5B
Codegemma 2B is a 2.5B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OmniCoder 9B
Tesslate · 9B
OmniCoder 9B is a 9B-parameter open language model from Tesslate. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek Coder v2 Lite Base
DeepSeek · 15.7B
DeepSeek Coder v2 Lite Base is a 15.7B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepHat V1 7B
DeepHat · 7.6B
DeepHat V1 7B is a 7.6B-parameter open language model from DeepHat. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Mellum 4B Base
JetBrains · 4.0B
Mellum 4B Base is a 4.0B-parameter open language model from JetBrains. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Starcoderbase 1B
BigCode · 1.1B
Starcoderbase 1B is a 1.1B-parameter open language model from BigCode in the StarCoder family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Kimi Dev 72B
Moonshot AI · 72.7B
Kimi Dev 72B is Moonshot AI's developer-focused model built on the Qwen2.5-72B architecture, specifically optimized for coding tasks, tool use, and agentic workflows. It combines strong general-purpose chat abilities with specialized developer capabilities, making it a compelling choice for software engineering assistance. At 72 billion parameters it requires substantial hardware, typically needing 40+ GB of VRAM at 4-bit quantization, which puts it in reach of dual consumer GPU setups or single professional cards like the A100 or RTX 6000 Ada. If you are primarily looking for a local coding assistant with strong reasoning skills, Kimi Dev is a top-tier option in the 70B class.
OpenCoder 1.5B Base
infly · 1.9B
OpenCoder 1.5B Base is a 1.9B-parameter open language model from infly. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Jan Code 4B
janhq · 4.4B
Jan Code 4B is a 4.4B-parameter open language model from janhq. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenCoder 8B Instruct
infly · 7.8B
OpenCoder 8B Instruct is a 7.8B-parameter open language model from infly. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Codegemma 7B
Google · 8.5B
Codegemma 7B is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER
DavidAU · 42.4B
Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER is a 42.4B-parameter open language model from DavidAU in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Jan v3 4B Base Instruct
janhq · 4.4B
Jan v3 4B Base Instruct is a 4.4B-parameter open language model from janhq. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Granite 8B Code Instruct 128k
IBM · 8.1B
Granite 8B Code Instruct 128k is a 8.1B-parameter open language model from IBM. It supports a context window of up to 128,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Soren 1 Small
syntropy-ai · 1.9B
Soren 1 Small is a 1.9B-parameter open language model from syntropy-ai. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen2.5 Coder 7B Instruct Abliterated
huihui-ai · 7.6B
Qwen2.5 Coder 7B Instruct Abliterated is a 7.6B-parameter open language model from huihui-ai in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwopus3.5 4B Coder
Jackrong · 4.7B
Qwopus3.5 4B Coder is a 4.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
CodeLlama 7B Instruct HF
Meta · 6.7B
CodeLlama 7B Instruct HF is a 6.7B-parameter open language model from Meta in the Code Llama family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
LocoTrainer 4B
LocoreMind · 4.0B
LocoTrainer 4B is a 4.0B-parameter open language model from LocoreMind. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Abliterated
huihui-ai · 14.7B
Phi 4 Abliterated is a 14.7B-parameter open language model from huihui-ai in the Phi 4 family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenCoder 8B Base
infly · 7.8B
OpenCoder 8B Base is a 7.8B-parameter open language model from infly. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Mini Flash Reasoning
Microsoft · 3.9B
Phi 4 Mini Flash Reasoning is a 3.9B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Sweep Next Edit v2 7B
sweepai · 7.6B
Sweep Next Edit v2 7B is a 7.6B-parameter open language model from sweepai. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
VibeThinker 1.5B
WeiboAI · 1.8B
VibeThinker 1.5B is a 1.8B-parameter open language model from WeiboAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
MiMo V2.5 Pro Base
XiaomiMiMo · 1023.2B
MiMo V2.5 Pro Base is a 1023.2B-parameter open language model from XiaomiMiMo. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenReasoning Nemotron 32B
NVIDIA · 32.8B
OpenReasoning Nemotron 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Stable DiffCoder 8B Instruct
ByteDance-Seed · 8.3B
Stable DiffCoder 8B Instruct is a 8.3B-parameter open language model from ByteDance-Seed. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Huihui Qwen3 Coder 30B A3B Instruct Abliterated
huihui-ai · 30.5B
Huihui Qwen3 Coder 30B A3B Instruct Abliterated is a 30.5B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Steelman 14B Ada
the-clanker-lover · 14B
Steelman 14B Ada is a 14B-parameter open language model from the-clanker-lover. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Kai 30B Instruct
NoesisLab · 32.8B
Kai 30B Instruct is a 32.8B-parameter open language model from NoesisLab. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Quantized.w8a8
RedHatAI · 14.7B
Phi 4 Quantized.w8a8 is a 14.7B-parameter open language model from RedHatAI in the Phi 4 family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Dhara 70M
codelion · 71M
Dhara 70M is a 71M-parameter open language model from codelion. It supports a context window of up to 1,024 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nerdsking Python Coder 7B I
Nerdsking · 7B
Nerdsking Python Coder 7B I is a 7B-parameter open language model from Nerdsking. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3 Code Reasoning 4B
GetSoloTech · 4B
Qwen3 Code Reasoning 4B is a 4B-parameter open language model from GetSoloTech in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
SmolLM2 70M
codelion · 69M
SmolLM2 70M is a 69M-parameter open language model from codelion. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Mistral Small 3.2 24B Qiskit
Qiskit · 24.0B
Mistral Small 3.2 24B Qiskit is a 24.0B-parameter open language model from Qiskit in the Mistral family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
NousCoder 14B
Nous Research · 14.8B
NousCoder 14B is a 14.8B-parameter open language model from Nous Research. It supports a context window of up to 81,920 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenCodeReasoning Nemotron 1.1 32B
NVIDIA · 32.8B
OpenCodeReasoning Nemotron 1.1 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
NextCoder 7B
Microsoft · 7.6B
NextCoder 7B is a 7.6B-parameter open language model from Microsoft. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
XiYanSQL QwenCoder 32B 2504
XGenerationLab · 32B
XiYanSQL QwenCoder 32B 2504 is a 32B-parameter open language model from XGenerationLab in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
How much VRAM do you need for a local coding model?
A coding model's hardware needs scale with parameter count and quantization. A 7B model runs comfortably on an 8 GB GPU at Q4_K_M; a 32B coder wants ~20 GB (a 24 GB card like the RTX 3090/4090); and mixture-of-experts coders like Qwen3-Coder-30B-A3B run far faster than their total size suggests because only a few billion parameters are active per token. Apple Silicon Macs with 32 GB+ of unified memory are excellent for larger coders. Open any model to see its full VRAM-by-quantization table and the exact hardware that fits.
Frequently Asked Questions
- What is the best local LLM for coding in 2026?
For most setups, Qwen2.5-Coder (7B for 8 GB GPUs, 32B for 24 GB) and Qwen3-Coder-30B-A3B are the strongest open coding models you can run locally. Microsoft's Phi-4 is a great smaller all-rounder. The right pick depends on your VRAM — open any model below to confirm it fits your hardware.
- Can I run a coding LLM on my laptop?
Yes. 1.5B–7B coding models (e.g. Qwen2.5-Coder 1.5B/7B) run on laptops with 8–16 GB of RAM or VRAM via Ollama or LM Studio. MoE models with low active-parameter counts also run well on modest hardware.
- Are local coding models as good as cloud models?
The best open coders (32B+ and large MoE) now rival mid-tier cloud models for everyday coding, autocomplete, and refactoring, while running fully offline and privately. Very large frontier models still lead on the hardest tasks, but the gap keeps closing.
- How do I run these coding models?
Install Ollama or LM Studio, then pull the model — for example `ollama run qwen2.5-coder:7b`. Each model page lists the exact install command and the GPUs and Macs that can run it at each quantization level.