Can I Run LLM Model Locally?

Find out which AI models your machine can actually run. Check GPU compatibility, VRAM requirements, and expected performance.

Model Rankings

Top models ranked by compatibility for 24 GB VRAM

40 models · 2 excellent · 6 good

Featured only

LLM models ranked by compatibility and performance
Model	Quant	VRAM	Speed	Context	Status	Grade
Gemma 3 27B IT27.4BVision Q4_K_M·— tok/s·131K ctx·GREAT FIT	Q4_K_M	18.1 GB75%	—	131K	GREAT FIT	S90
Gemma 2 27B IT27.2BChat Q4_K_M·— tok/s·8K ctx·GREAT FIT	Q4_K_M	18.0 GB75%	—	8K	GREAT FIT	S90
Qwen3 32B32BChat Q4_K_M·— tok/s·41K ctx·GOOD FIT	Q4_K_M	19.8 GB83%	—	41K	GOOD FIT	A77
Mistral Small 24B Instruct 250124BChat Q4_K_M·— tok/s·33K ctx·GOOD FIT	Q4_K_M	15.1 GB63%	—	33K	GOOD FIT	A80
GPT OSS 20B21.5BChat Q4_K_M·— tok/s·131K ctx·GOOD FIT	Q4_K_M	13.3 GB55%	—	131K	GOOD FIT	A70
DeepSeek R1 Distill Qwen 32B32.8BChatReasoning Q4_K_M·— tok/s·131K ctx·GOOD FIT	Q4_K_M	20.5 GB85%	—	131K	GOOD FIT	A70
Qwen2.5 Coder 32B Instruct32.8BChatCode Q4_K_M·— tok/s·33K ctx·GOOD FIT	Q4_K_M	20.5 GB85%	—	33K	GOOD FIT	A70
QwQ 32B32BChatReasoning Q4_K_M·— tok/s·41K ctx·GOOD FIT	Q4_K_M	20.0 GB84%	—	41K	GOOD FIT	A73
Phi 414BChatMathCode Q4_K_M·— tok/s·16K ctx·FAIR FIT	Q4_K_M	9.1 GB38%	—	16K	FAIR FIT	B53
Yi 1.5 34B Chat34.4BChat Q4_K_M·— tok/s·4K ctx·FAIR FIT	Q4_K_M	21.4 GB89%	—	4K	FAIR FIT	B56

Show all models →

Browse by VRAM

Find the best models for your VRAM tier

8GBVRAM

Entry-level for LLMs (RTX 4060, RX 7600, Apple M-series base) — 7B models at Q4, small models at Q8

12GBVRAM

Mid-range (RTX 3060, RTX 4070, RTX 5070) — 7-13B models at Q4-Q6

16GBVRAM

Upper mid-range (RTX 4080, RTX 5070 Ti, Arc A770, Apple M4 16GB) — 13B models, some 30B at Q4

24GBVRAM

Enthusiast (RTX 3090, RTX 4090, RX 7900 XTX) — 30B+ models at Q4-Q6, 70B at aggressive quant

48GBVRAM

Professional / Apple Silicon (RTX 6000 Ada, L40S, MacBook Pro M4 Max 48GB) — 70B at Q4-Q5

Popular Devices

All Hardware →

GPUs, MacBooks, AI boxes, and more — find what runs AI best

NVIDIA GeForce RTX 4090

NVIDIA · Ada Lovelace

24 GB

1008.0 GB/s16,384 CUDA450W TDP$1,599

NVIDIA GeForce RTX 5090

NVIDIA · Blackwell

32 GB

1792.0 GB/s21,760 CUDA575W TDP$1,999

NVIDIA GeForce RTX 5080

NVIDIA · Blackwell

16 GB

960.0 GB/s10,752 CUDA360W TDP$999

Mac Studio M4 Max (128 GB)

Apple · M4 Max · Desktop

128 GB

546.0 GB/s40 GPU cores16 CPU cores

MacBook Pro 16" M4 Max (64 GB)

Apple · M4 Max · Laptop

64 GB

546.0 GB/s40 GPU cores16 CPU cores

Mac Studio M4 Max (64 GB)

Apple · M4 Max · Desktop

64 GB

546.0 GB/s40 GPU cores16 CPU cores

Popular Models

View all →

DeepSeek R1

DeepSeek · 684.5B

1.3M 13.1K

DeepSeek R1 is a groundbreaking reasoning model that uses reinforcement learning to develop chain-of-thought capabilities without relying on supervised fine-tuning. With 684.5 billion total parameters in a mixture-of-experts architecture (only 37 billion active per token), R1 achieves performance competitive with OpenAI's o1 on math, coding, and complex reasoning benchmarks while remaining fully open-weight. Running the full R1 locally is a serious undertaking, requiring well over 300 GB of VRAM at full precision, though quantized versions bring it within reach of multi-GPU setups. For users who want R1-level reasoning on more modest hardware, DeepSeek also released a family of distilled models that pack R1's reasoning patterns into smaller dense architectures.

ChatReasoning

DeepSeek R1 0528

DeepSeek · 684.5B

1.1M 2.4K

DeepSeek R1 0528 is an updated release of the R1 reasoning model, incorporating improvements to training and inference that sharpen its performance on complex multi-step problems. It retains the same 684.5 billion parameter mixture-of-experts architecture as the original R1, with approximately 37 billion parameters active per forward pass. This revision addresses several edge cases where the original R1 struggled, delivering more consistent reasoning chains and fewer hallucinations on difficult math and coding tasks. Hardware requirements remain identical to the original R1, so users already set up to run the first version can swap in the 0528 weights with no changes to their infrastructure.

ChatReasoning

DeepSeek R1 Distill Llama 8B

DeepSeek · 8B

857.1K 850

DeepSeek R1 Distill Llama 8B brings R1's reinforcement-learned reasoning capabilities to the widely supported Llama 3.1 8B architecture. By distilling the full 684.5B R1 model's reasoning patterns into this 8 billion parameter dense model, DeepSeek created a version that benefits from the extensive Llama ecosystem of tools, quantizations, and inference engines. For users who prefer the Llama architecture or already have tooling built around it, this model offers a plug-and-play path to chain-of-thought reasoning. Its hardware requirements are very approachable, running well on consumer GPUs with 8 GB or more of VRAM at common quantization levels.

ChatReasoning

DeepSeek R1 Distill Qwen 32B

DeepSeek · 32.8B

938.1K 1.5K

DeepSeek R1 Distill Qwen 32B takes the reasoning capabilities developed in the full 684.5B R1 model and distills them into the 32.8 billion parameter Qwen 2.5 architecture. The result is a dense model that punches well above its weight class on math, science, and coding reasoning tasks, often matching models two to three times its size. At around 32.8 billion parameters, this model fits comfortably on a single high-end consumer GPU when quantized to 4-bit precision, making it one of the most capable reasoning models you can run on a desktop workstation.

ChatReasoning

DeepSeek R1 Distill Qwen 7B

DeepSeek · 7.6B

613.6K 804

DeepSeek R1 Distill Qwen 7B compresses the reasoning techniques from DeepSeek's full R1 model into a compact 7.6 billion parameter dense model built on the Qwen 2.5 architecture. Despite its small footprint, it demonstrates surprisingly capable step-by-step reasoning on math and logic problems that would stump many models several times its size. This is one of the most accessible reasoning models available for local use, fitting comfortably on GPUs with 6 GB or more of VRAM when quantized. It strikes a practical balance between genuine chain-of-thought reasoning ability and the hardware constraints of a typical consumer setup.

ChatReasoning

DeepSeek v3 0324

DeepSeek · 684.5B

328.9K 3.1K

DeepSeek V3 0324 is DeepSeek's flagship general-purpose chat model, featuring a 684.5 billion parameter mixture-of-experts architecture with roughly 37 billion parameters active per token. It delivers strong performance across a wide range of tasks including conversation, writing, analysis, coding, and instruction following, competing with the best closed-source models available. Like other large MoE models, V3 requires substantial memory to load all expert weights even though only a fraction are used during inference. Quantized versions make it feasible on multi-GPU setups, and its combination of broad capability with open weights has made it one of the most widely deployed open models for local and self-hosted use.

Chat

How It Works

Three steps to find your perfect local AI setup

Select Your Hardware

Pick your GPU or Apple Silicon device from the dropdown.

→↓

Check Compatibility

See which models fit in your VRAM with performance grades.

→↓

Run It

Install via Ollama, LM Studio, or download the GGUF file.