Can I Run LLM Model Locally?
Find out which AI models your machine can actually run. Check GPU compatibility, VRAM requirements, and expected performance.
Model Rankings
Top models ranked by compatibility for 24 GB VRAM
145 models · 7 excellent · 15 good
| Model | Quant | VRAM | Speed | Context | Status | Grade |
|---|---|---|---|---|---|---|
Q4_K_M·— tok/s·131K ctx·GREAT FIT | Q4_K_M | 18.1 GB | — | 131K | GREAT FIT | S90 |
Q4_K_M·— tok/s·262K ctx·GREAT FIT | Q4_K_M | 17.4 GB | — | 262K | GREAT FIT | S88 |
Q4_K_M·— tok/s·262K ctx·GREAT FIT | Q4_K_M | 16.6 GB | — | 262K | GREAT FIT | S85 |
Q4_K_M·— tok/s·8K ctx·GREAT FIT | Q4_K_M | 18.0 GB | — | 8K | GREAT FIT | S90 |
Q4_K_M·— tok/s·262K ctx·GREAT FIT | Q4_K_M | 18.7 GB | — | 262K | GREAT FIT | S86 |
Q4_K_M·— tok/s·262K ctx·GREAT FIT | Q4_K_M | 18.7 GB | — | 262K | GREAT FIT | S86 |
Q4_K_M·— tok/s·262K ctx·GREAT FIT | Q4_K_M | 18.7 GB | — | 262K | GREAT FIT | S86 |
Q4_K_M·— tok/s·131K ctx·GOOD FIT | Q4_K_M | 15.1 GB | — | 131K | GOOD FIT | A80 |
BF16·— tok/s·4K ctx·GOOD FIT | BF16 | 15.4 GB | — | 4K | GOOD FIT | A81 |
BF16·— tok/s·8K ctx·GOOD FIT | BF16 | 15.3 GB | — | 8K | GOOD FIT | A81 |
Browse by VRAM
Find the best models for your VRAM tier
Entry-level for LLMs (RTX 4060, RX 7600, Apple M-series base) — 7B models at Q4, small models at Q8
Mid-range (RTX 3060, RTX 4070, RTX 5070) — 7-13B models at Q4-Q6
Upper mid-range (RTX 4080, RTX 5070 Ti, Arc A770, Apple M4 16GB) — 13B models, some 30B at Q4
Enthusiast (RTX 3090, RTX 4090, RX 7900 XTX) — 30B+ models at Q4-Q6, 70B at aggressive quant
Popular Devices
All Hardware →GPUs, MacBooks, AI boxes, and more — find what runs AI best
NVIDIA GeForce RTX 4090
NVIDIA · Ada Lovelace
NVIDIA GeForce RTX 5090
NVIDIA · Blackwell
NVIDIA GeForce RTX 5080
NVIDIA · Blackwell
Mac Studio M4 Max (128 GB)
Apple · M4 Max · Desktop
MacBook Pro 16" M4 Max (64 GB)
Apple · M4 Max · Laptop
Mac Studio M4 Max (64 GB)
Apple · M4 Max · Desktop
Popular Models
View all →Qwen2.5 7B Instruct
Alibaba · 7.6B · runs from 2.7 GB
Qwen2.5 7B Instruct is a 7.6-billion parameter instruction-tuned model from Alibaba Cloud's Qwen 2.5 series. It supports a 128K token context window and is fine-tuned for conversational AI, instruction following, and general assistant tasks. Its efficient size makes it well-suited for local deployment on consumer GPUs with 8GB or more of VRAM. The model delivers strong performance for its parameter class across reasoning, multilingual understanding, and coding tasks. It benefits from the improved pretraining data and techniques of the Qwen 2.5 generation. Released under the Apache 2.0 license and widely supported by inference frameworks such as llama.cpp, vLLM, and Ollama.
Gemma 4 26B A4B IT
Google · 26.5B · runs from 8.0 GB
Gemma 4 26B A4B IT is a 26.5B-parameter open language model from Google in the Gemma 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Gemma 4 31B IT
Google · 32.7B · runs from 10.6 GB
Gemma 4 31B IT is a 32.7B-parameter open language model from Google in the Gemma 4 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Llama 3.2 1B Instruct
Meta · 1.2B · runs from 0.4 GB
Meta Llama 3.2 1B Instruct is a 1-billion parameter instruction-tuned model from Meta, the smallest in the Llama 3.2 family. It is designed for ultra-lightweight deployment scenarios where minimal hardware resources are available, supporting a 128K token context window despite its compact size. This model is suitable for basic conversational tasks, text summarization, and simple instruction following. It can run on virtually any modern GPU and even on CPU-only setups with acceptable performance. Released under the Llama 3.2 Community License.
Qwen3.6 35B A3B
Alibaba · 36.0B · runs from 10.3 GB
Qwen3.6 35B A3B is a 36.0B-parameter open language model from Alibaba in the Qwen 3.6 family. It supports a context window of up to 262,144 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
Gemma 4 E4B IT
Google · 8.0B · runs from 3.2 GB
Gemma 4 E4B IT is a 8.0B-parameter open language model from Google in the Gemma 4 family. It supports a context window of up to 131,072 tokens. See its VRAM requirements by quantization and which GPUs and Macs can run it locally below.
How It Works
Three steps to find your perfect local AI setup
Select Your Hardware
Pick your GPU or Apple Silicon device from the dropdown.
Check Compatibility
See which models fit in your VRAM with performance grades.
Run It
Install via Ollama, LM Studio, or download the GGUF file.