Can I Run LLM Model Locally?
Find out which AI models your machine can actually run. Check GPU compatibility, VRAM requirements, and expected performance.
Model Rankings
Top models ranked by compatibility for 24 GB VRAM
40 models · 2 excellent · 6 good
| Model | Quant | VRAM | Speed | Context | Status | Grade |
|---|---|---|---|---|---|---|
Q4_K_M·— tok/s·131K ctx·GREAT FIT | Q4_K_M | 18.1 GB | — | 131K | GREAT FIT | S90 |
Q4_K_M·— tok/s·8K ctx·GREAT FIT | Q4_K_M | 18.0 GB | — | 8K | GREAT FIT | S90 |
Q4_K_M·— tok/s·41K ctx·GOOD FIT | Q4_K_M | 19.8 GB | — | 41K | GOOD FIT | A77 |
Q4_K_M·— tok/s·33K ctx·GOOD FIT | Q4_K_M | 15.1 GB | — | 33K | GOOD FIT | A80 |
Q4_K_M·— tok/s·131K ctx·GOOD FIT | Q4_K_M | 13.3 GB | — | 131K | GOOD FIT | A70 |
Q4_K_M·— tok/s·131K ctx·GOOD FIT | Q4_K_M | 20.5 GB | — | 131K | GOOD FIT | A70 |
Q4_K_M·— tok/s·33K ctx·GOOD FIT | Q4_K_M | 20.5 GB | — | 33K | GOOD FIT | A70 |
Q4_K_M·— tok/s·41K ctx·GOOD FIT | Q4_K_M | 20.0 GB | — | 41K | GOOD FIT | A73 |
Q4_K_M·— tok/s·16K ctx·FAIR FIT | Q4_K_M | 9.1 GB | — | 16K | FAIR FIT | B53 |
Q4_K_M·— tok/s·4K ctx·FAIR FIT | Q4_K_M | 21.4 GB | — | 4K | FAIR FIT | B56 |
Browse by VRAM
Find the best models for your VRAM tier
Entry-level for LLMs (RTX 4060, RX 7600, Apple M-series base) — 7B models at Q4, small models at Q8
Mid-range (RTX 3060, RTX 4070, RTX 5070) — 7-13B models at Q4-Q6
Upper mid-range (RTX 4080, RTX 5070 Ti, Arc A770, Apple M4 16GB) — 13B models, some 30B at Q4
Enthusiast (RTX 3090, RTX 4090, RX 7900 XTX) — 30B+ models at Q4-Q6, 70B at aggressive quant
Popular Devices
All Hardware →GPUs, MacBooks, AI boxes, and more — find what runs AI best
NVIDIA GeForce RTX 4090
NVIDIA · Ada Lovelace
NVIDIA GeForce RTX 5090
NVIDIA · Blackwell
NVIDIA GeForce RTX 5080
NVIDIA · Blackwell
Mac Studio M4 Max (128 GB)
Apple · M4 Max · Desktop
MacBook Pro 16" M4 Max (64 GB)
Apple · M4 Max · Laptop
Mac Studio M4 Max (64 GB)
Apple · M4 Max · Desktop
Popular Models
View all →DeepSeek R1
DeepSeek · 684.5B
DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.
DeepSeek R1 0528
DeepSeek · 684.5B
DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.
DeepSeek R1 Distill Llama 8B
DeepSeek · 8B
DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.
DeepSeek R1 Distill Qwen 32B
DeepSeek · 32.8B
DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.
DeepSeek R1 Distill Qwen 7B
DeepSeek · 7.6B
DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.
DeepSeek v3 0324
DeepSeek · 684.5B
DeepSeek V3 0324 is DeepSeek's flagship general-purpose chat model, featuring a 684.5 billion parameter mixture-of-experts architecture with roughly 37 billion parameters active per token. It delivers strong performance across a wide range of tasks including conversation, writing, analysis, coding, and instruction following, competing with the best closed-source models available. Like other large MoE models, V3 requires substantial memory to load all expert weights even though only a fraction are used during inference. Quantized versions make it feasible on multi-GPU setups, and its combination of broad capability with open weights has made it one of the most widely deployed open models for local and self-hosted use.
How It Works
Three steps to find your perfect local AI setup
Select Your Hardware
Pick your GPU or Apple Silicon device from the dropdown.
Check Compatibility
See which models fit in your VRAM with performance grades.
Run It
Install via Ollama, LM Studio, or download the GGUF file.