Best Local Reasoning LLMs in 2026

Reasoning models think step-by-step before answering, excelling at math, logic, and multi-step problems. Thanks to distillation, you no longer need a data center to run them: compact distills of DeepSeek R1 and models like QwQ-32B bring chain-of-thought reasoning to a single consumer GPU. Below are the open-weight reasoning models you can run locally, ranked by popularity — pick one to see the exact hardware that runs it.

74 Reasoning Models You Can Run Locally

DeepSeek R1 0528

DeepSeek · 684.5B

6.3M 2.5K

DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.

ChatReasoning

DeepSeek R1

DeepSeek · 684.5B

5.7M 13.4K

DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.

ChatReasoning

DeepSeek R1 Distill Qwen 1.5B

DeepSeek · 1.8B

788.2K 1.5K

DeepSeek R1 Distill Qwen 1.5B is the smallest model in the R1 distillation family, packing chain-of-thought reasoning capabilities into just 1.5 billion parameters using the Qwen 2.5 architecture. It represents an ambitious attempt to bring structured reasoning to the smallest practical model size. At this scale, the model can run on virtually any modern GPU and even on CPU-only setups with acceptable speed. While its reasoning depth is naturally limited compared to its larger siblings, it still demonstrates structured thinking patterns that set it apart from generic models of similar size.

ChatReasoning

DeepSeek R1 Distill Qwen 14B

DeepSeek · 14.8B

742.0K 613

DeepSeek R1 Distill Qwen 14B sits in a sweet spot between the smaller 7B distill and the more demanding 32B version, offering strong reasoning performance at 14.8 billion parameters on the Qwen 2.5 architecture. It captures a meaningful share of the full R1's chain-of-thought capabilities while keeping resource requirements within the range of mainstream consumer GPUs. Quantized to 4-bit, it fits comfortably on GPUs with 12 GB of VRAM, delivering reliable step-by-step reasoning for math, logic, and analytical problems.

ChatReasoning

DeepSeek R1 Distill Qwen 32B

DeepSeek · 32.8B

561.6K 1.6K

DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.

ChatReasoning

DeepSeek R1 Distill Qwen 7B

DeepSeek · 7.6B

515.5K 842

DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.

ChatReasoning

DeepSeek R1 Distill Llama 8B

DeepSeek · 8.0B

486.3K 863

DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.

ChatReasoning

DeepSeek R1 0528 Qwen3 8B

DeepSeek · 8.2B

273.7K 1.1K

DeepSeek R1 0528 Qwen3 8B is a 8.2B-parameter open language model from DeepSeek in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

DeepSeek R1 Distill Llama 70B

DeepSeek · 70B

92.5K 753

DeepSeek R1 Distill Llama 70B is the largest model in the R1 distillation lineup, combining the reasoning capabilities developed in the full 684.5B R1 with the robust Llama 3.1 70B architecture. At 70 billion parameters, it delivers the strongest reasoning performance of any dense R1 distill, approaching the full R1's quality on many math and coding benchmarks. Running this model locally requires a multi-GPU setup or a single GPU with very high VRAM capacity, though quantized versions can fit on hardware with 48 GB or more. For users who need top-tier open-weight reasoning and have the hardware to support a 70B dense model, this is one of the strongest options available.

ChatReasoning

Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled

Jackrong · 27.8B

61.6K 695

The full-precision version of Jackrong's Qwen3.5 27B reasoning distillation from Claude 4.6 Opus. With 27.8 billion parameters in unquantized form, this model preserves the maximum quality from the distillation process but requires significantly more VRAM, typically 56 GB or more in BF16. It is primarily intended for users with professional-grade GPUs or multi-GPU setups. This variant is ideal for further fine-tuning, experimentation, or running at full fidelity when hardware allows. Most users looking to run the model locally for inference should consider the GGUF-quantized version instead, which offers a much better tradeoff between quality and resource usage.

ChatReasoning

VulnLLM R 7B

UCSB-SURFI · 7.6B

59.7K 179

VulnLLM R 7B is a security-focused model developed by UCSB-SURFI, built on the Qwen2.5-7B base and fine-tuned specifically for vulnerability analysis and security reasoning. With 7.6 billion parameters, it targets tasks like identifying code vulnerabilities, explaining security flaws, and reasoning about attack vectors. This model fills a niche for security researchers and developers who want a locally-hosted assistant for code auditing and vulnerability assessment without sending sensitive code to external APIs. Its specialized training gives it an edge over general-purpose models on security-related tasks, though it is not a replacement for professional security tools. Runs on consumer GPUs with 8 GB of VRAM at typical quantization levels.

ChatReasoning

MN 12B Mag Mell R1

inflatebot · 12.2B

59.4K 239

MN 12B Mag Mell R1 is a 12.2B-parameter open language model from inflatebot. It supports a context window of up to 1,024,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

QwQ 32B

Alibaba · 32.8B

58.5K 2.9K

QwQ 32B is a 32-billion parameter reasoning-focused model from Alibaba Cloud's Qwen family. Unlike standard chat models, QwQ is specifically optimized for step-by-step logical reasoning, complex problem solving, and mathematical tasks. It employs extended chain-of-thought processing, generating detailed internal reasoning before producing final answers, which significantly improves accuracy on challenging analytical problems. The model requires a GPU with at least 24GB of VRAM for quantized inference and delivers reasoning performance competitive with much larger models. It is particularly well suited for users who need strong analytical capabilities for math, science, coding logic, and multi-step problem solving. Released under the Apache 2.0 license.

ChatReasoning

Phi 4 Mini Reasoning

Microsoft · 3.8B

57.2K 231

Phi 4 Mini Reasoning is a 3.8B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

Nemotron Cascade 2 30B A3B

NVIDIA · 31.6B

49.4K 503

Nemotron Cascade 2 30B A3B is a 31.6B-parameter open language model from NVIDIA. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Hermes 4 14B

Nous Research · 424960

37.3K 152

Hermes 4 14B is a 424960-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 40,960 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningRoleplay

DeepSeek R1 Distill Qwen 1.5B

litert-community · 1.5B

32.8K 35

DeepSeek R1 Distill Qwen 1.5B is a 1.5B-parameter open language model from litert-community in the Qwen family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Nemotron Cascade 8B

NVIDIA · 8B

31.7K 65

Nemotron Cascade 8B is a 8B-parameter open language model from NVIDIA. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Hermes 4.3 36B

Nous Research · 36.2B

26.7K 233

Hermes 4.3 36B is a 36.2B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 524,288 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningRoleplay

Phi 4 Reasoning Plus

Microsoft · 14.7B

26.4K 343

Phi 4 Reasoning Plus is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

QwQ 32B Preview

Alibaba · 32.8B

24.4K 1.7K

QwQ 32B Preview is a 32.8B-parameter open language model from Alibaba in the QwQ family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Ouro 2.6B Thinking

ByteDance · 2.6B

17.2K 96

Ouro 2.6B Thinking is a 2.6B-parameter open language model from ByteDance. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Huihui Qwen3.6 35B A3B Claude 4.7 Opus Abliterated

huihui-ai · 36.0B

16.2K 122

Huihui Qwen3.6 35B A3B Claude 4.7 Opus Abliterated is a 36.0B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Trinity Large Thinking

Arcee AI · 398.6B

13.5K 177

Trinity Large Thinking is a 398.6B-parameter open language model from Arcee AI. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled

lordx64 · 36.0B

12.7K 172

Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled is a 36.0B-parameter open language model from lordx64 in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Phi 4 Reasoning

Microsoft · 14.7B

9.6K 227

Phi 4 Reasoning is a 14.7B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

DeepSeek R1 Zero

DeepSeek · 684.5B

8.8K 958

DeepSeek R1 Zero is a 684.5B-parameter open language model from DeepSeek in the DeepSeek R1 family. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen Marketing

marketeam · 8.2B

5.7K 41

Qwen Marketing is a 8.2B-parameter open language model from marketeam in the Qwen family. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Tri 21B Think

trillionlabs · 20.7B

5.6K 27

Tri 21B Think is a 20.7B-parameter open language model from trillionlabs. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Turkish Gemma 9B T1

ytu-ce-cosmos · 9B

5.4K 166

Turkish Gemma 9B T1 is a 9B-parameter open language model from ytu-ce-cosmos in the Gemma family. It supports a context window of up to 8,192 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

DeepSeek R1 Distill Qwen 32B Abliterated

huihui-ai · 32.8B

5.3K 241

DeepSeek R1 Distill Qwen 32B Abliterated is a 32.8B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled

Jackrong · 9.7B

5.0K 29

Qwen3.5 9B Claude 4.6 Opus Reasoning Distilled is a 9.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3 4B Gemini 3.1 Pro Reasoning Distilled

khazarai · 4B

3.6K 2

Qwen3 4B Gemini 3.1 Pro Reasoning Distilled is a 4B-parameter open language model from khazarai in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Nemotron Research Reasoning Qwen 1.5B

NVIDIA · 1.8B

3.3K 242

Nemotron Research Reasoning Qwen 1.5B is a 1.8B-parameter open language model from NVIDIA in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

AI21 Jamba Reasoning 3B

AI21 Labs · 3.2B

2.9K 133

AI21 Jamba Reasoning 3B is a 3.2B-parameter open language model from AI21 Labs. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 2B Claude 4.6 Opus Reasoning Distilled

Jackrong · 2.3B

2.8K 7

Qwen3.5 2B Claude 4.6 Opus Reasoning Distilled is a 2.3B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 4B Safety Thinking

MerlinSafety · 4.2B

2.8K 10

Qwen3.5 4B Safety Thinking is a 4.2B-parameter open language model from MerlinSafety in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled

Jackrong · 4.7B

2.7K 9

Qwen3.5 4B Claude 4.6 Opus Reasoning Distilled is a 4.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 35B A3B Claude 4.6 Opus Reasoning Distilled

Jackrong · 36.0B

2.5K 28

Qwen3.5 35B A3B Claude 4.6 Opus Reasoning Distilled is a 36.0B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Nemotron Content Safety Reasoning 4B

NVIDIA · 4.3B

2.3K 19

Nemotron Content Safety Reasoning 4B is a 4.3B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Darwin 36B Opus

FINAL-Bench · 34.7B

2.0K 71

Darwin 36B Opus is a 34.7B-parameter open language model from FINAL-Bench. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER

DavidAU · 42.4B

1.9K 35

Qwen3 42B A3B 2507 Thinking Abliterated Uncensored TOTAL RECALL v2 Medium MASTER CODER is a 42.4B-parameter open language model from DavidAU in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

Domyn Small v1.0

domyn · 9.8B

1.9K 15

Domyn Small v1.0 is a 9.8B-parameter open language model from domyn. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Soren 1 Small

syntropy-ai · 1.9B

1.7K 27

Soren 1 Small is a 1.9B-parameter open language model from syntropy-ai. It supports a context window of up to 1,048,576 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningCodeMath

Qwopus3.5 4B Coder

Jackrong · 4.7B

1.4K 11

Qwopus3.5 4B Coder is a 4.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningFunctionsCode

Qwen3.5 4B Claude Opus 4.6 Distilled Heretic

ghost-actual · 4.5B

1.4K 3

Qwen3.5 4B Claude Opus 4.6 Distilled Heretic is a 4.5B-parameter open language model from ghost-actual in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Phi 4 Mini Flash Reasoning

Microsoft · 3.9B

1.1K 279

Phi 4 Mini Flash Reasoning is a 3.9B-parameter open language model from Microsoft in the Phi 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathCodeReasoning

Qwopus3.5 9B V3.5

Jackrong · 9.7B

889 25

Qwopus3.5 9B V3.5 is a 9.7B-parameter open language model from Jackrong. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningFunctions

Supra 50M Reasoning

SupraLabs · 52M

808 30

Supra 50M Reasoning is a 52M-parameter open language model from SupraLabs. It supports a context window of up to 1,024 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Gemma 4 12B IT AEON Abliterated K4 BF16

AEON-7 · 12.0B

711 20

Gemma 4 12B IT AEON Abliterated K4 BF16 is a 12.0B-parameter open language model from AEON-7 in the Gemma family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningFunctions

OpenReasoning Nemotron 32B

NVIDIA · 32.8B

702 126

OpenReasoning Nemotron 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

Darwin 60B DUO

FINAL-Bench · 60B

669 32

Darwin 60B DUO is a 60B-parameter open language model from FINAL-Bench. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Nemotron H 8B Reasoning 128K

NVIDIA · 8.1B

628 26

Nemotron H 8B Reasoning 128K is a 8.1B-parameter open language model from NVIDIA. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Hermes 4 405B

Nous Research · 405.9B

546 85

Hermes 4 405B is a 405.9B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningRoleplay

WorldSim Opus 3.6 35B A3B

Gryphe · 35.1B

521 24

WorldSim Opus 3.6 35B A3B is a 35.1B-parameter open language model from Gryphe. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatRoleplayReasoning

Qwen3.5 9B Gemini 3.1 Pro Reasoning Distill

Jackrong · 9.7B

499 3

Qwen3.5 9B Gemini 3.1 Pro Reasoning Distill is a 9.7B-parameter open language model from Jackrong in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Kai 30B Instruct

NoesisLab · 32.8B

490 21

Kai 30B Instruct is a 32.8B-parameter open language model from NoesisLab. It supports a context window of up to 32,768 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatMathReasoningCode

Darwin 4B Genesis

FINAL-Bench · 7.5B

447 41

Darwin 4B Genesis is a 7.5B-parameter open language model from FINAL-Bench. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Pantheon Reasoning 27B

Gryphe · 27.8B

415 22

Pantheon Reasoning 27B is a 27.8B-parameter open language model from Gryphe. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatRoleplayReasoning

CyberPal2.0 20B

cyber-pal-security · 20.9B

403 8

CyberPal2.0 20B is a 20.9B-parameter open language model from cyber-pal-security. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Nemotron H 47B Reasoning 128K

NVIDIA · 46.8B

372 21

Nemotron H 47B Reasoning 128K is a 46.8B-parameter open language model from NVIDIA. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

DeepSeek R1 Distill Qwen 14B Abliterated v2

huihui-ai · 14.8B

361 150

DeepSeek R1 Distill Qwen 14B Abliterated v2 is a 14.8B-parameter open language model from huihui-ai in the Qwen family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Aryabhata 2.0

PhysicsWallahAI · 20.9B

331 3

Aryabhata 2.0 is a 20.9B-parameter open language model from PhysicsWallahAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Hermes 4 70B

Nous Research · 70B

321 174

Hermes 4 70B is a 70B-parameter open language model from Nous Research in the Hermes family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningRoleplay

Datarus R1 14B Preview

DatarusAI · 14.8B

289 141

Datarus R1 14B Preview is a 14.8B-parameter open language model from DatarusAI. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3 Code Reasoning 4B

GetSoloTech · 4B

284 15

Qwen3 Code Reasoning 4B is a 4B-parameter open language model from GetSoloTech in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

Scout 4B

vanta-research · 4.3B

263 18

Scout 4B is a 4.3B-parameter open language model from vanta-research. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoningRoleplay

Turkish Gemma 4B T1 Scout

ytu-ce-cosmos · 4.3B

234 9

Turkish Gemma 4B T1 Scout is a 4.3B-parameter open language model from ytu-ce-cosmos in the Gemma family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatFunctionsReasoning

LFM2.5 8B A1B Opus Distil

reaperdoesntknow · 8.5B

229 4

LFM2.5 8B A1B Opus Distil is a 8.5B-parameter open language model from reaperdoesntknow. It supports a context window of up to 128,000 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

MIST Mini 8B Thinking

olaverse · 8.0B

201 2

MIST Mini 8B Thinking is a 8.0B-parameter open language model from olaverse. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

MAI DS R1

Microsoft · 671.0B

181 296

MAI DS R1 is a 671.0B-parameter open language model from Microsoft. It supports a context window of up to 163,840 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled Heretic v2

llmfan46 · 27.4B

164 3

Qwen3.5 27B Claude 4.6 Opus Reasoning Distilled Heretic v2 is a 27.4B-parameter open language model from llmfan46 in the Qwen family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

OpenCodeReasoning Nemotron 1.1 32B

NVIDIA · 32.8B

136 48

OpenCodeReasoning Nemotron 1.1 32B is a 32.8B-parameter open language model from NVIDIA. It supports a context window of up to 65,536 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatCodeReasoning

MobileLLM R1.5 950M

Meta · 950M

56 19

MobileLLM R1.5 950M is a 950M-parameter open language model from Meta. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.

ChatReasoning

What hardware do reasoning models need?

Reasoning models generate long chains of thought, so context length and generation speed matter as much as raw VRAM. A distilled 7B–8B reasoner (e.g. DeepSeek-R1-Distill-Qwen-7B) fits an 8 GB GPU; a 32B reasoner like QwQ-32B wants a 24 GB card; and the largest MoE reasoners run best on multi-GPU or high-memory Apple Silicon. Because reasoning models emit many tokens, faster memory bandwidth noticeably improves the experience. Open any model to see its VRAM-by-quantization table and estimated tokens/sec on your hardware.

Frequently Asked Questions

What is the best local reasoning LLM in 2026?

DeepSeek-R1 distills (the 7B/8B variants for consumer GPUs) and QwQ-32B are among the strongest reasoning models you can run locally. For maximum capability, larger MoE reasoners exist but need serious hardware. Open any model below to confirm it fits your GPU or Mac.

Can I run DeepSeek R1 locally?

The full DeepSeek R1 is very large, but its distilled versions (DeepSeek-R1-Distill-Qwen-7B, -Llama-8B, -Qwen-32B) are designed to run on consumer hardware — the 7B/8B distills fit a single 8–12 GB GPU at Q4_K_M.

Do reasoning models need more VRAM than regular models?

Not for the weights — VRAM for weights depends on size and quantization like any model. But reasoning models produce long outputs, so allow extra VRAM for the KV cache at long context, and prefer hardware with high memory bandwidth for faster generation.