Best Local LLMs for Coding in 2026

These are the open-weight code models you can run on your own hardware — ranked by real-world popularity. Local coding models keep your codebase private, work offline, and cost nothing per token. For most developers a 7B–32B model at Q4_K_M is the sweet spot: small enough to fit a single consumer GPU, capable enough for autocomplete, refactors, and agentic coding. Pick a model below to see exactly which GPU or Mac runs it and how fast.

77 Coding Models You Can Run Locally

Qwen2.5 Coder 14B Instruct

Alibaba · 14.8B

2.9M 159

Qwen2.5 Coder 14B Instruct is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen2.5 Coder 7B Instruct

Alibaba · 7.6B

2.2M 722

Qwen2.5 Coder 7B Instruct is a 7.6-billion parameter code-specialized instruction-tuned model from Alibaba Cloud. It is trained on a large corpus of source code and natural language, fine-tuned for programming assistance tasks such as code generation, completion, debugging, and code explanation. The model supports a 128K token context window and runs efficiently on consumer GPUs with 8GB or more of VRAM. It provides a good balance between coding capability and hardware requirements for developers looking to run a local coding assistant. Released under the Apache 2.0 license.

ChatCode

Qwen3 Coder 30B A3B Instruct

Alibaba · 30.5B

2.1M 1.1K

Qwen3 Coder 30B A3B Instruct is a code-specialized Mixture of Experts (MoE) model from Alibaba Cloud's Qwen 3 Coder series, with 30 billion total parameters and approximately 3 billion active parameters per forward pass. The MoE architecture allows it to deliver strong coding performance while keeping per-token compute costs low, making it faster at inference than comparably capable dense models. The model is instruction-tuned for programming assistance, code generation, debugging, and software engineering conversation. It requires VRAM proportional to its total 30B parameter count for loading weights, but benefits from efficient inference throughput due to its low active parameter count. Released under the Apache 2.0 license.

ChatCode

Qwen2.5 Coder 32B Instruct

Alibaba · 32.8B

1.5M 2.0K

Qwen2.5 Coder 32B Instruct is a 32.8-billion parameter code-specialized model from Alibaba Cloud, instruction-tuned for programming assistance and code generation. It is trained on a large corpus of source code alongside natural language data, making it highly capable for tasks such as code completion, debugging, code explanation, and software engineering dialogue. The model supports a 128K token context window and delivers code generation quality competitive with the best open-weight coding models at any scale. It requires a GPU with at least 24GB of VRAM for quantized inference. Released under the Apache 2.0 license.

ChatCode

Phi 4 Mini Instruct

Microsoft · 3.8B

1.5M 760

Microsoft Phi 4 Mini Instruct is a 3.8-billion parameter instruction-tuned model from Microsoft Research's Phi 4 family. It applies the Phi series' data-centric training philosophy to a compact model, delivering strong performance in coding, reasoning, and chat tasks relative to its small footprint. The model runs on consumer GPUs with as little as 4-6GB of VRAM when quantized, making it accessible on mainstream and even entry-level hardware. Released under the MIT license.

ChatCode

Qwen3 Coder Next

Alibaba · 79.7B

979.4K 1.4K

Qwen3 Coder Next is a 79.7-billion parameter code-specialized instruction-tuned model from Alibaba Cloud, the next generation of the Qwen Coder series. It is trained extensively on source code and programming-related data, delivering strong performance across code generation, completion, debugging, refactoring, and software engineering dialogue. The model represents a significant step up in coding capability within the Qwen family. Due to its large parameter count, running Qwen3 Coder Next locally requires substantial VRAM, typically 48GB or more at reduced precision, placing it in the territory of professional GPUs or multi-GPU consumer setups. It is a top-tier choice for developers who need the most capable local coding assistant available. Released under the Apache 2.0 license.

ChatCode

DeepSeek Coder v2 Lite Instruct

DeepSeek · 15.7B

877.2K 606

DeepSeek Coder V2 Lite Instruct is a code-focused mixture-of-experts model with 15.7 billion total parameters, trained to handle both programming tasks and general conversation. It supports a wide range of programming languages and excels at code generation, debugging, explanation, and refactoring. The MoE architecture keeps compute costs manageable despite the model's broad capabilities, and the Lite variant is sized to run on a single consumer GPU. For developers looking for a capable local coding assistant that can also handle general chat, this model offers an appealing combination of code specialization and practical hardware requirements.

ChatCode

Phi 3.5 Mini Instruct

Microsoft · 3.8B

850.5K 985

Phi 3.5 Mini Instruct is a 3.8B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Phi 4

Microsoft · 14.7B

845.1K 2.3K

Microsoft Phi 4 is a 14-billion parameter language model from Microsoft Research's Phi series, designed to deliver strong reasoning, mathematical, and coding performance at an efficient size. Phi 4 continues the Phi family's focus on maximizing capability per parameter through high-quality training data curation, achieving benchmark scores that rival much larger models on reasoning and STEM tasks. The model runs well on consumer GPUs with 12-16GB of VRAM in quantized formats. It excels at mathematical problem solving, code generation, and structured reasoning. Released under the MIT license.

ChatMathCode

Phi 3 Mini 4k Instruct

Microsoft · 3.8B

655.9K 1.4K

Microsoft Phi 3 Mini 4K Instruct is a 3.8-billion parameter instruction-tuned model from Microsoft Research's Phi 3 generation, with a 4K token context window. The Phi 3 family demonstrated that small models trained on carefully curated, high-quality data can achieve performance competitive with models several times their size. The model runs on consumer GPUs with as little as 4-6GB of VRAM when quantized, making it one of the most accessible capable chat models for local deployment. Released under the MIT license.

ChatCode

Qwen2.5 Coder 1.5B

Alibaba · 1.5B

584.8K 85

Qwen2.5 Coder 1.5B is a 1.5-billion parameter code-specialized model from Alibaba Cloud's Qwen 2.5 Coder series. It is the smallest Coder variant that balances meaningful code generation capability with extremely low resource requirements, running on GPUs with as little as 2-4GB of VRAM. The model is suitable for lightweight code completion, simple code generation tasks, and as a compact local coding assistant in resource-constrained environments. It supports a 128K token context window. Released under the Apache 2.0 license.

ChatCode

Phi 2

Microsoft · 2.8B

510.8K 3.5K

Microsoft Phi 2 is a 2.8-billion parameter language model from Microsoft Research that pioneered the concept of small but highly capable language models. Released in late 2023, Phi 2 demonstrated that strategic data curation and training methodology could allow a sub-3B model to outperform many 7B and 13B models on reasoning and coding benchmarks. The model runs on virtually any modern GPU and even on CPU-only setups. While succeeded by Phi 3 and Phi 4, Phi 2 remains historically significant as the model that proved small-scale language models could be genuinely useful for practical tasks. Released under the MIT license.

ChatCode

Phi 3 Mini 128k Instruct

Microsoft · 3.8B

248.6K 1.7K

Phi 3 Mini 128k Instruct is a 3.8B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen2.5 Coder 3B Instruct

Alibaba · 3.1B

219.5K 111

Qwen2.5 Coder 3B Instruct is a 3.1B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen2.5 Coder 7B

Alibaba · 7.6B

205.3K 139

Qwen2.5 Coder 7B is a 7.6-billion parameter code-specialized base (pretrained) model from Alibaba Cloud's Qwen 2.5 Coder series. It is trained on a large dataset of source code and natural language but is not instruction-tuned, making it suitable for fine-tuning, code-related research, and custom downstream applications. The model supports a 128K token context window and runs efficiently on consumer GPUs. It serves as the foundation for the Qwen2.5 Coder 7B Instruct variant and community fine-tunes targeting specific programming languages or workflows. Released under the Apache 2.0 license.

ChatCode

Deepseek Coder 6.7B Instruct

DeepSeek · 6.7B

127.0K 481

DeepSeek Coder 6.7B Instruct is a first-generation code-specialized model trained on a large corpus of source code and programming-related data. At 6.7 billion parameters, it provides solid code completion, generation, and explanation capabilities across popular programming languages while remaining small enough to run on most consumer GPUs. While newer models in the DeepSeek lineup have surpassed it in raw capability, this model remains a practical choice for users who need a lightweight local coding assistant with minimal hardware requirements. It runs well on GPUs with as little as 6 GB of VRAM when quantized.

ChatCode

MiMo V2.5 Pro

XiaomiMiMo · 1023.2B

89.8K 590

MiMo V2.5 Pro is a 1023.2B-parameter open language model from XiaomiMiMo. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatFunctionsCode

Phi 1 5

Microsoft · 1.4B

60.9K 1.4K

Phi 1 5 is a 1.4B-parameter open language model from Microsoft in the Phi family. It supports a context window of up to 2,048 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Phi 4 Mini Reasoning

Microsoft · 3.8B

57.2K 231

Phi 4 Mini Reasoning is a 3.8B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

Deepseek Coder 1.3B Instruct

DeepSeek · 1.3B

46.5K 165

DeepSeek Coder 1.3B Instruct is an ultra-compact code model designed for environments where hardware resources are extremely limited. Despite having just 1.3 billion parameters, it can handle basic code completion, simple generation tasks, and code Q&A across common programming languages. This is one of the smallest viable code models available, capable of running on integrated graphics or very low-end dedicated GPUs. It is well suited for edge deployment, embedded development environments, or as a fast local autocomplete engine where response speed matters more than handling complex multi-file reasoning tasks.

ChatCode

Qwen3 Coder 480B A35B Instruct

Alibaba · 480.2B

35.6K 1.3K

Qwen3 Coder 480B A35B Instruct is Alibaba's largest code-specialized model, a massive 480.2-billion-parameter mixture-of-experts system with roughly 35 billion parameters active per token. This is the most powerful open-weight coding model in the Qwen3 family, designed for professional-grade code generation, analysis, and software engineering tasks. Running this model locally is a serious undertaking that requires multi-GPU server-class hardware with several hundred gigabytes of combined VRAM. For users with access to such infrastructure, it offers exceptional code quality and understanding that rivals leading proprietary coding assistants, all while keeping data and computation entirely under local control.

ChatCode

Qwen2.5 Coder 0.5B

Alibaba · 494M

27.0K 56

Qwen2.5 Coder 0.5B is a 494-million parameter code-specialized model from Alibaba Cloud, the smallest in the Qwen 2.5 Coder series. It is designed for ultra-lightweight deployment where code-aware text generation is needed with minimal hardware resources. The model runs on virtually any GPU and even on CPU-only setups. While limited in capability compared to larger coding models, it is useful for basic code completion, prototyping, and experimentation. It supports a 128K token context window. Released under the Apache 2.0 license.

ChatCode

Sqlcoder 7B 2

defog · 6.7B

26.5K 435

SQLCoder 7B 2 is a 6.7-billion-parameter model from Defog, purpose-built for converting natural-language questions into SQL queries. Fine-tuned specifically on text-to-SQL tasks, it consistently outperforms much larger general-purpose models when the job is generating accurate, executable SQL against real database schemas. For developers and data analysts who regularly query databases, running SQLCoder locally means fast, private SQL generation without sending proprietary schema details to an external API. It works best when provided with table definitions as context and is particularly strong on PostgreSQL, MySQL, and SQLite dialects.

ChatCode

Phi 4 Reasoning Plus

Microsoft · 14.7B

26.4K 343

Phi 4 Reasoning Plus is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

Starcoder

BigCode · 15.8B

25.4K 3.0K

Starcoder is a 15.8B-parameter open language model from BigCode in the StarCoder family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Starcoder2 7B

BigCode · 7.2B

19.8K 210

Starcoder2 7B is a 7.2B-parameter open language model from BigCode in the StarCoder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

LocoOperator 4B

LocoreMind · 4.0B

17.1K 287

LocoOperator 4B is a 4.0B-parameter open language model from LocoreMind. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeFunctions

Deepseek Coder 1.3B Base

DeepSeek · 1.3B

16.9K 111

Deepseek Coder 1.3B Base is a 1.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Deepseek Coder 7B Instruct V1.5

DeepSeek · 6.9B

12.9K 147

Deepseek Coder 7B Instruct V1.5 is a 6.9B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Phi 3 Medium 4k Instruct

Microsoft · 14.0B

11.3K 225

Phi 3 Medium 4k Instruct is a 14.0B-parameter open language model from Microsoft in the Phi 3 family. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Phi 4 Reasoning

Microsoft · 14.7B

9.6K 227

Phi 4 Reasoning is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

IQuest Coder V1 40B Loop Instruct

IQuestLab · 39.8B

8.6K 322

IQuest Coder V1 40B Loop Instruct is a 39.8B-parameter open language model from IQuestLab. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Starcoder2 15B

BigCode · 16.0B

8.4K 671

Starcoder2 15B is a 16.0B-parameter open language model from BigCode in the StarCoder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen2.5 Coder 14B

Alibaba · 14.8B

7.6K 75

Qwen2.5 Coder 14B is a 14.8B-parameter open language model from Alibaba in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Deepseek Coder 33B Instruct

DeepSeek · 33.3B

7.6K 573

Deepseek Coder 33B Instruct is a 33.3B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Deepseek Coder 1.3B Kexer

JetBrains · 1.3B

7.3K 8

Deepseek Coder 1.3B Kexer is a 1.3B-parameter open language model from JetBrains in the DeepSeek Coder family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Codegemma 7B IT

Google · 8.5B

6.8K 254

Codegemma 7B IT is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Codegemma 2B

Google · 2.5B

5.8K 94

Codegemma 2B is a 2.5B-parameter open language model from Google in the Gemma 2 family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

OmniCoder 9B

Tesslate · 9B

5.7K 199

OmniCoder 9B is a 9B-parameter open language model from Tesslate. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeFunctions

DeepSeek Coder v2 Lite Base

DeepSeek · 15.7B

4.8K 105

DeepSeek Coder v2 Lite Base is a 15.7B-parameter open language model from DeepSeek in the DeepSeek Coder family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

DeepHat V1 7B

DeepHat · 7.6B

4.6K 145

DeepHat V1 7B is a 7.6B-parameter open language model from DeepHat. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Mellum 4B Base

JetBrains · 4.0B

3.2K 447

Mellum 4B Base is a 4.0B-parameter open language model from JetBrains. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Starcoderbase 1B

BigCode · 1.1B

2.9K 102

Starcoderbase 1B is a 1.1B-parameter open language model from BigCode in the StarCoder family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Kimi Dev 72B

Moonshot AI · 72.7B

2.8K 385

Kimi Dev 72B is Moonshot AI's developer-focused model built on the Qwen2.5-72B architecture, specifically optimized for coding tasks, tool use, and agentic workflows. It combines strong general-purpose chat abilities with specialized developer capabilities, making it a compelling choice for software engineering assistance. At 72 billion parameters it requires substantial hardware, typically needing 40+ GB of VRAM at 4-bit quantization, which puts it in reach of dual consumer GPU setups or single professional cards like the A100 or RTX 6000 Ada. If you are primarily looking for a local coding assistant with strong reasoning skills, Kimi Dev is a top-tier option in the 70B class.

ChatCode

OpenCoder 1.5B Base

infly · 1.9B

2.3K 25

OpenCoder 1.5B Base is a 1.9B-parameter open language model from infly. It supports a context window of up to 4,096 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Jan Code 4B

janhq · 4.4B

2.1K 68

Jan Code 4B is a 4.4B-parameter open language model from janhq. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatFunctionsCode

OpenCoder 8B Instruct

infly · 7.8B

2.0K 203

OpenCoder 8B Instruct is a 7.8B-parameter open language model from infly. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Codegemma 7B

Google · 8.5B

2.0K 220

Codegemma 7B is a 8.5B-parameter open language model from Google in the Gemma family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER

DavidAU · 42.4B

1.9K 35

Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER is a 42.4B-parameter open language model from DavidAU in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

Jan v3 4B Base Instruct

janhq · 4.4B

1.9K 59

Jan v3 4B Base Instruct is a 4.4B-parameter open language model from janhq. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Granite 8B Code Instruct 128k

IBM · 8.1B

1.9K 25

Granite 8B Code Instruct 128k is a 8.1B-parameter open language model from IBM. It supports a context window of up to 128,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Soren 1 Small

syntropy-ai · 1.9B

1.7K 27

Soren 1 Small is a 1.9B-parameter open language model from syntropy-ai. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningCodeMath

Qwen2.5 Coder 7B Instruct Abliterated

huihui-ai · 7.6B

1.6K 14

Qwen2.5 Coder 7B Instruct Abliterated is a 7.6B-parameter open language model from huihui-ai in the Qwen 2.5 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwopus3.5 4B Coder

Jackrong · 4.7B

1.4K 11

Qwopus3.5 4B Coder is a 4.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningFunctionsCode

CodeLlama 7B Instruct HF

Meta · 6.7B

1.3K 61

CodeLlama 7B Instruct HF is a 6.7B-parameter open language model from Meta in the Code Llama family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

LocoTrainer 4B

LocoreMind · 4.0B

1.2K 163

LocoTrainer 4B is a 4.0B-parameter open language model from LocoreMind. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeFunctions

Phi 4 Abliterated

huihui-ai · 14.7B

1.2K 21

Phi 4 Abliterated is a 14.7B-parameter open language model from huihui-ai in the Phi 4 family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCode

OpenCoder 8B Base

infly · 7.8B

1.1K 32

OpenCoder 8B Base is a 7.8B-parameter open language model from infly. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Phi 4 Mini Flash Reasoning

Microsoft · 3.9B

1.1K 279

Phi 4 Mini Flash Reasoning is a 3.9B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

Sweep Next Edit v2 7B

sweepai · 7.6B

995 32

Sweep Next Edit v2 7B is a 7.6B-parameter open language model from sweepai. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

VibeThinker 1.5B

WeiboAI · 1.8B

968 524

VibeThinker 1.5B is a 1.8B-parameter open language model from WeiboAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCode

MiMo V2.5 Pro Base

XiaomiMiMo · 1023.2B

899 40

MiMo V2.5 Pro Base is a 1023.2B-parameter open language model from XiaomiMiMo. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatFunctionsCode

OpenReasoning Nemotron 32B

NVIDIA · 32.8B

702 126

OpenReasoning Nemotron 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

Stable DiffCoder 8B Instruct

ByteDance-Seed · 8.3B

691 126

Stable DiffCoder 8B Instruct is a 8.3B-parameter open language model from ByteDance-Seed. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Huihui Qwen3 Coder 30B A3B Instruct Abliterated

huihui-ai · 30.5B

586 31

Huihui Qwen3 Coder 30B A3B Instruct Abliterated is a 30.5B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Steelman 14B Ada

the-clanker-lover · 14B

524 4

Steelman 14B Ada is a 14B-parameter open language model from the-clanker-lover. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Kai 30B Instruct

NoesisLab · 32.8B

490 21

Kai 30B Instruct is a 32.8B-parameter open language model from NoesisLab. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathReasoningCode

Phi 4 Quantized.w8a8

RedHatAI · 14.7B

438 3

Phi 4 Quantized.w8a8 is a 14.7B-parameter open language model from RedHatAI in the Phi 4 family. It supports a context window of up to 16,384 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCode

Dhara 70M

codelion · 71M

357 47

Dhara 70M is a 71M-parameter open language model from codelion. It supports a context window of up to 1,024 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Nerdsking Python Coder 7B I

Nerdsking · 7B

311 18

Nerdsking Python Coder 7B I is a 7B-parameter open language model from Nerdsking. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Qwen3 Code Reasoning 4B

GetSoloTech · 4B

284 15

Qwen3 Code Reasoning 4B is a 4B-parameter open language model from GetSoloTech in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

SmolLM2 70M

codelion · 69M

197 3

SmolLM2 70M is a 69M-parameter open language model from codelion. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

Mistral Small 3.2 24B Qiskit

Qiskit · 24.0B

184 7

Mistral Small 3.2 24B Qiskit is a 24.0B-parameter open language model from Qiskit in the Mistral family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

NousCoder 14B

Nous Research · 14.8B

169 209

NousCoder 14B is a 14.8B-parameter open language model from Nous Research. It supports a context window of up to 81,920 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

OpenCodeReasoning Nemotron 1.1 32B

NVIDIA · 32.8B

136 48

OpenCodeReasoning Nemotron 1.1 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

NextCoder 7B

Microsoft · 7.6B

135 33

NextCoder 7B is a 7.6B-parameter open language model from Microsoft. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

XiYanSQL QwenCoder 32B 2504

XGenerationLab · 32B

127 19

XiYanSQL QwenCoder 32B 2504 is a 32B-parameter open language model from XGenerationLab in the Qwen family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCode

How much VRAM do you need for a local coding model?

A coding model's hardware needs scale with parameter count and quantization. A 7B model runs comfortably on an 8 GB GPU at Q4_K_M; a 32B coder wants ~20 GB (a 24 GB card like the RTX 3090/4090); and mixture-of-experts coders like Qwen3-Coder-30B-A3B run far faster than their total size suggests because only a few billion parameters are active per token. Apple Silicon Macs with 32 GB+ of unified memory are excellent for larger coders. Open any model to see its full VRAM-by-quantization table and the exact hardware that fits.

Frequently Asked Questions

What is the best local LLM for coding in 2026?

For most setups, Qwen2.5-Coder (7B for 8 GB GPUs, 32B for 24 GB) and Qwen3-Coder-30B-A3B are the strongest open coding models you can run locally. Microsoft's Phi-4 is a great smaller all-rounder. The right pick depends on your VRAM — open any model below to confirm it fits your hardware.

Can I run a coding LLM on my laptop?

Yes. 1.5B–7B coding models (e.g. Qwen2.5-Coder 1.5B/7B) run on laptops with 8–16 GB of RAM or VRAM via Ollama or LM Studio. MoE models with low active-parameter counts also run well on modest hardware.

Are local coding models as good as cloud models?

The best open coders (32B+ and large MoE) now rival mid-tier cloud models for everyday coding, autocomplete, and refactoring, while running fully offline and privately. Very large frontier models still lead on the hardest tasks, but the gap keeps closing.

How do I run these coding models?

Install Ollama or LM Studio, then pull the model — for example `ollama run qwen2.5-coder:7b`. Each model page lists the exact install command and the GPUs and Macs that can run it at each quantization level.