Can I Run LLM Model Locally?

Find out which AI models your machine can actually run. Check GPU compatibility, VRAM requirements, and expected performance.

Model Rankings

Top models ranked by compatibility for 24 GB VRAM

40 models · 2 excellent · 6 good

LLM models ranked by compatibility and performance
ModelVRAMGrade
Gemma 3 27B IT27.4B
Q4_K_M· tok/s·131K ctx·GREAT FIT
18.1 GBS90
Q4_K_M· tok/s·8K ctx·GREAT FIT
18.0 GBS90
Qwen3 32B32B
Q4_K_M· tok/s·41K ctx·GOOD FIT
19.8 GBA77
Q4_K_M· tok/s·33K ctx·GOOD FIT
15.1 GBA80
GPT OSS 20B21.5B
Q4_K_M· tok/s·131K ctx·GOOD FIT
13.3 GBA70
Q4_K_M· tok/s·131K ctx·GOOD FIT
20.5 GBA70
Q4_K_M· tok/s·33K ctx·GOOD FIT
20.5 GBA70
QwQ 32B32B
Q4_K_M· tok/s·41K ctx·GOOD FIT
20.0 GBA73
Phi 414B
Q4_K_M· tok/s·16K ctx·FAIR FIT
9.1 GBB53
Q4_K_M· tok/s·4K ctx·FAIR FIT
21.4 GBB56

Browse by VRAM

Find the best models for your VRAM tier

Popular Devices

All Hardware →

GPUs, MacBooks, AI boxes, and more — find what runs AI best

Popular Models

View all →

DeepSeek R1

DeepSeek · 684.5B

1.3M 13.1K

DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.

ChatReasoning

DeepSeek R1 0528

DeepSeek · 684.5B

1.1M 2.4K

DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.

ChatReasoning

DeepSeek R1 Distill Llama 8B

DeepSeek · 8B

857.1K 850

DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.

ChatReasoning

DeepSeek R1 Distill Qwen 32B

DeepSeek · 32.8B

938.1K 1.5K

DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.

ChatReasoning

DeepSeek R1 Distill Qwen 7B

DeepSeek · 7.6B

613.6K 804

DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.

ChatReasoning

DeepSeek v3 0324

DeepSeek · 684.5B

328.9K 3.1K

DeepSeek V3 0324 is DeepSeek's flagship general-purpose chat model, featuring a 684.5 billion parameter mixture-of-experts architecture with roughly 37 billion parameters active per token. It delivers strong performance across a wide range of tasks including conversation, writing, analysis, coding, and instruction following, competing with the best closed-source models available. Like other large MoE models, V3 requires substantial memory to load all expert weights even though only a fraction are used during inference. Quantized versions make it feasible on multi-GPU setups, and its combination of broad capability with open weights has made it one of the most widely deployed open models for local and self-hosted use.

Chat

How It Works

Three steps to find your perfect local AI setup

1

Select Your Hardware

Pick your GPU or Apple Silicon device from the dropdown.

2

Check Compatibility

See which models fit in your VRAM with performance grades.

3

Run It

Install via Ollama, LM Studio, or download the GGUF file.