Best Local Reasoning LLMs in 2026
Reasoning models think step-by-step before answering, excelling at math, logic, and multi-step problems. Thanks to distillation, you no longer need a data center to run them: compact distills of DeepSeek R1 and models like QwQ-32B bring chain-of-thought reasoning to a single consumer GPU. Below are the open-weight reasoning models you can run locally, ranked by popularity — pick one to see the exact hardware that runs it.
74 Reasoning Models You Can Run Locally
DeepSeek R1 0528
DeepSeek · 684.5B
DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.
DeepSeek R1
DeepSeek · 684.5B
DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.
DeepSeek R1 Distill Qwen 1.5B
DeepSeek · 1.8B
DeepSeek R1 Distill Qwen 1.5B is the smallest model in the R1 distillation family, packing chain-of-thought reasoning capabilities into just 1.5 billion parameters using the Qwen 2.5 architecture. It represents an ambitious attempt to bring structured reasoning to the smallest practical model size. At this scale, the model can run on virtually any modern GPU and even on CPU-only setups with acceptable speed. While its reasoning depth is naturally limited compared to its larger siblings, it still demonstrates structured thinking patterns that set it apart from generic models of similar size.
DeepSeek R1 Distill Qwen 14B
DeepSeek · 14.8B
DeepSeek R1 Distill Qwen 14B sits in a sweet spot between the smaller 7B distill and the more demanding 32B version, offering strong reasoning performance at 14.8 billion parameters on the Qwen 2.5 architecture. It captures a meaningful share of the full R1's chain-of-thought capabilities while keeping resource requirements within the range of mainstream consumer GPUs. Quantized to 4-bit, it fits comfortably on GPUs with 12 GB of VRAM, delivering reliable step-by-step reasoning for math, logic, and analytical problems.
DeepSeek R1 Distill Qwen 32B
DeepSeek · 32.8B
DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.
DeepSeek R1 Distill Qwen 7B
DeepSeek · 7.6B
DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.
DeepSeek R1 Distill Llama 8B
DeepSeek · 8.0B
DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.
DeepSeek R1 0528 Qwen3 8B
DeepSeek · 8.2B
DeepSeek R1 0528 Qwen3 8B is a 8.2B-parameter open language model from DeepSeek in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Distill Llama 70B
DeepSeek · 70B
DeepSeek R1 Distill Llama 70B is the largest model in the R1 distillation lineup, combining the reasoning capabilities developed in the full 684.5B R1 with the robust Llama 3.1 70B architecture. At 70 billion parameters, it delivers the strongest reasoning performance of any dense R1 distill, approaching the full R1's quality on many math and coding benchmarks. Running this model locally requires a multi-GPU setup or a single GPU with very high VRAM capacity, though quantized versions can fit on hardware with 48 GB or more. For users who need top-tier open-weight reasoning and have the hardware to support a 70B dense model, this is one of the strongest options available.
Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled
Jackrong · 27.8B
The full-precision version of Jackrong's Qwen3.5 27B reasoning distillation from Claude 4.6 Opus. With 27.8 billion parameters in unquantized form, this model preserves the maximum quality from the distillation process but requires significantly more VRAM, typically 56 GB or more in BF16. It is primarily intended for users with professional-grade GPUs or multi-GPU setups. This variant is ideal for further fine-tuning, experimentation, or running at full fidelity when hardware allows. Most users looking to run the model locally for inference should consider the GGUF-quantized version instead, which offers a much better tradeoff between quality and resource usage.
VulnLLM R 7B
UCSB-SURFI · 7.6B
VulnLLM R 7B is a security-focused model developed by UCSB-SURFI, built on the Qwen2.5-7B base and fine-tuned specifically for vulnerability analysis and security reasoning. With 7.6 billion parameters, it targets tasks like identifying code vulnerabilities, explaining security flaws, and reasoning about attack vectors. This model fills a niche for security researchers and developers who want a locally-hosted assistant for code auditing and vulnerability assessment without sending sensitive code to external APIs. Its specialized training gives it an edge over general-purpose models on security-related tasks, though it is not a replacement for professional security tools. Runs on consumer GPUs with 8 GB of VRAM at typical quantization levels.
MN 12B Mag Mell R1
inflatebot · 12.2B
MN 12B Mag Mell R1 is a 12.2B-parameter open language model from inflatebot. It supports a context window of up to 1,024,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
QwQ 32B
Alibaba · 32.8B
QwQ 32B is a 32-billion parameter reasoning-focused model from Alibaba Cloud's Qwen family. Unlike standard chat models, QwQ is specifically optimized for step-by-step logical reasoning, complex problem solving, and mathematical tasks. It employs extended chain-of-thought processing, generating detailed internal reasoning before producing final answers, which significantly improves accuracy on challenging analytical problems. The model requires a GPU with at least 24GB of VRAM for quantized inference and delivers reasoning performance competitive with much larger models. It is particularly well suited for users who need strong analytical capabilities for math, science, coding logic, and multi-step problem solving. Released under the Apache 2.0 license.
Phi 4 Mini Reasoning
Microsoft · 3.8B
Phi 4 Mini Reasoning is a 3.8B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron Cascade 2 30B A3B
NVIDIA · 31.6B
Nemotron Cascade 2 30B A3B is a 31.6B-parameter open language model from NVIDIA. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Hermes 4 14B
Nous Research · 424960
Hermes 4 14B is a 424960-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Distill Qwen 1.5B
litert-community · 1.5B
DeepSeek R1 Distill Qwen 1.5B is a 1.5B-parameter open language model from litert-community in the Qwen family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron Cascade 8B
NVIDIA · 8B
Nemotron Cascade 8B is a 8B-parameter open language model from NVIDIA. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Hermes 4.3 36B
Nous Research · 36.2B
Hermes 4.3 36B is a 36.2B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 524,288 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Reasoning Plus
Microsoft · 14.7B
Phi 4 Reasoning Plus is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
QwQ 32B Preview
Alibaba · 32.8B
QwQ 32B Preview is a 32.8B-parameter open language model from Alibaba in the QwQ family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Ouro 2.6B Thinking
ByteDance · 2.6B
Ouro 2.6B Thinking is a 2.6B-parameter open language model from ByteDance. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Huihui Qwen3.6 35B A3B Claude 4.7 Opus Abliterated
huihui-ai · 36.0B
Huihui Qwen3.6 35B A3B Claude 4.7 Opus Abliterated is a 36.0B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Trinity Large Thinking
Arcee AI · 398.6B
Trinity Large Thinking is a 398.6B-parameter open language model from Arcee AI. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled
lordx64 · 36.0B
Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled is a 36.0B-parameter open language model from lordx64 in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Reasoning
Microsoft · 14.7B
Phi 4 Reasoning is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Zero
DeepSeek · 684.5B
DeepSeek R1 Zero is a 684.5B-parameter open language model from DeepSeek in the DeepSeek R1 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen Marketing
marketeam · 8.2B
Qwen Marketing is a 8.2B-parameter open language model from marketeam in the Qwen family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Tri 21B Think
trillionlabs · 20.7B
Tri 21B Think is a 20.7B-parameter open language model from trillionlabs. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Turkish Gemma 9B T1
ytu-ce-cosmos · 9B
Turkish Gemma 9B T1 is a 9B-parameter open language model from ytu-ce-cosmos in the Gemma family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Distill Qwen 32B Abliterated
huihui-ai · 32.8B
DeepSeek R1 Distill Qwen 32B Abliterated is a 32.8B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled
Jackrong · 9.7B
Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled is a 9.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3 4B Gemini 3.1 Pro Reasoning Distilled
khazarai · 4B
Qwen3 4B Gemini 3.1 Pro Reasoning Distilled is a 4B-parameter open language model from khazarai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron Research Reasoning Qwen 1.5B
NVIDIA · 1.8B
Nemotron Research Reasoning Qwen 1.5B is a 1.8B-parameter open language model from NVIDIA in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
AI21 Jamba Reasoning 3B
AI21 Labs · 3.2B
AI21 Jamba Reasoning 3B is a 3.2B-parameter open language model from AI21 Labs. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 2B Claude 4.6 Opus Reasoning Distilled
Jackrong · 2.3B
Qwen3.5 2B Claude 4.6 Opus Reasoning Distilled is a 2.3B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 4B Safety Thinking
MerlinSafety · 4.2B
Qwen3.5 4B Safety Thinking is a 4.2B-parameter open language model from MerlinSafety in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled
Jackrong · 4.7B
Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled is a 4.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 35B A3B Claude 4.6 Opus Reasoning Distilled
Jackrong · 36.0B
Qwen3.5 35B A3B Claude 4.6 Opus Reasoning Distilled is a 36.0B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron Content Safety Reasoning 4B
NVIDIA · 4.3B
Nemotron Content Safety Reasoning 4B is a 4.3B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Darwin 36B Opus
FINAL-Bench · 34.7B
Darwin 36B Opus is a 34.7B-parameter open language model from FINAL-Bench. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER
DavidAU · 42.4B
Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER is a 42.4B-parameter open language model from DavidAU in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Domyn Small v1.0
domyn · 9.8B
Domyn Small v1.0 is a 9.8B-parameter open language model from domyn. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Soren 1 Small
syntropy-ai · 1.9B
Soren 1 Small is a 1.9B-parameter open language model from syntropy-ai. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwopus3.5 4B Coder
Jackrong · 4.7B
Qwopus3.5 4B Coder is a 4.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 4B Claude Opus 4.6 Distilled Heretic
ghost-actual · 4.5B
Qwen3.5 4B Claude Opus 4.6 Distilled Heretic is a 4.5B-parameter open language model from ghost-actual in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Phi 4 Mini Flash Reasoning
Microsoft · 3.9B
Phi 4 Mini Flash Reasoning is a 3.9B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwopus3.5 9B V3.5
Jackrong · 9.7B
Qwopus3.5 9B V3.5 is a 9.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Supra 50M Reasoning
SupraLabs · 52M
Supra 50M Reasoning is a 52M-parameter open language model from SupraLabs. It supports a context window of up to 1,024 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Gemma 4 12B IT AEON Abliterated K4 BF16
AEON-7 · 12.0B
Gemma 4 12B IT AEON Abliterated K4 BF16 is a 12.0B-parameter open language model from AEON-7 in the Gemma family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenReasoning Nemotron 32B
NVIDIA · 32.8B
OpenReasoning Nemotron 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Darwin 60B DUO
FINAL-Bench · 60B
Darwin 60B DUO is a 60B-parameter open language model from FINAL-Bench. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron H 8B Reasoning 128K
NVIDIA · 8.1B
Nemotron H 8B Reasoning 128K is a 8.1B-parameter open language model from NVIDIA. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Hermes 4 405B
Nous Research · 405.9B
Hermes 4 405B is a 405.9B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
WorldSim Opus 3.6 35B A3B
Gryphe · 35.1B
WorldSim Opus 3.6 35B A3B is a 35.1B-parameter open language model from Gryphe. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 9B Gemini 3.1 Pro Reasoning Distill
Jackrong · 9.7B
Qwen3.5 9B Gemini 3.1 Pro Reasoning Distill is a 9.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Kai 30B Instruct
NoesisLab · 32.8B
Kai 30B Instruct is a 32.8B-parameter open language model from NoesisLab. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Darwin 4B Genesis
FINAL-Bench · 7.5B
Darwin 4B Genesis is a 7.5B-parameter open language model from FINAL-Bench. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Pantheon Reasoning 27B
Gryphe · 27.8B
Pantheon Reasoning 27B is a 27.8B-parameter open language model from Gryphe. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
CyberPal2.0 20B
cyber-pal-security · 20.9B
CyberPal2.0 20B is a 20.9B-parameter open language model from cyber-pal-security. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Nemotron H 47B Reasoning 128K
NVIDIA · 46.8B
Nemotron H 47B Reasoning 128K is a 46.8B-parameter open language model from NVIDIA. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
DeepSeek R1 Distill Qwen 14B Abliterated v2
huihui-ai · 14.8B
DeepSeek R1 Distill Qwen 14B Abliterated v2 is a 14.8B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Aryabhata 2.0
PhysicsWallahAI · 20.9B
Aryabhata 2.0 is a 20.9B-parameter open language model from PhysicsWallahAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Hermes 4 70B
Nous Research · 70B
Hermes 4 70B is a 70B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Datarus R1 14B Preview
DatarusAI · 14.8B
Datarus R1 14B Preview is a 14.8B-parameter open language model from DatarusAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3 Code Reasoning 4B
GetSoloTech · 4B
Qwen3 Code Reasoning 4B is a 4B-parameter open language model from GetSoloTech in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Scout 4B
vanta-research · 4.3B
Scout 4B is a 4.3B-parameter open language model from vanta-research. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Turkish Gemma 4B T1 Scout
ytu-ce-cosmos · 4.3B
Turkish Gemma 4B T1 Scout is a 4.3B-parameter open language model from ytu-ce-cosmos in the Gemma family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
LFM2.5 8B A1B Opus Distil
reaperdoesntknow · 8.5B
LFM2.5 8B A1B Opus Distil is a 8.5B-parameter open language model from reaperdoesntknow. It supports a context window of up to 128,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
MIST Mini 8B Thinking
olaverse · 8.0B
MIST Mini 8B Thinking is a 8.0B-parameter open language model from olaverse. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
MAI DS R1
Microsoft · 671.0B
MAI DS R1 is a 671.0B-parameter open language model from Microsoft. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled Heretic v2
llmfan46 · 27.4B
Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled Heretic v2 is a 27.4B-parameter open language model from llmfan46 in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
OpenCodeReasoning Nemotron 1.1 32B
NVIDIA · 32.8B
OpenCodeReasoning Nemotron 1.1 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
MobileLLM R1.5 950M
Meta · 950M
MobileLLM R1.5 950M is a 950M-parameter open language model from Meta. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
What hardware do reasoning models need?
Reasoning models generate long chains of thought, so context length and generation speed matter as much as raw VRAM. A distilled 7B–8B reasoner (e.g. DeepSeek-R1-Distill-Qwen-7B) fits an 8 GB GPU; a 32B reasoner like QwQ-32B wants a 24 GB card; and the largest MoE reasoners run best on multi-GPU or high-memory Apple Silicon. Because reasoning models emit many tokens, faster memory bandwidth noticeably improves the experience. Open any model to see its VRAM-by-quantization table and estimated tokens/sec on your hardware.
Frequently Asked Questions
- What is the best local reasoning LLM in 2026?
DeepSeek-R1 distills (the 7B/8B variants for consumer GPUs) and QwQ-32B are among the strongest reasoning models you can run locally. For maximum capability, larger MoE reasoners exist but need serious hardware. Open any model below to confirm it fits your GPU or Mac.
- Can I run DeepSeek R1 locally?
The full DeepSeek R1 is very large, but its distilled versions (DeepSeek-R1-Distill-Qwen-7B, -Llama-8B, -Qwen-32B) are designed to run on consumer hardware — the 7B/8B distills fit a single 8–12 GB GPU at Q4_K_M.
- Do reasoning models need more VRAM than regular models?
Not for the weights — VRAM for weights depends on size and quantization like any model. But reasoning models produce long outputs, so allow extra VRAM for the KV cache at long context, and prefer hardware with high memory bandwidth for faster generation.