Question 1

How much VRAM do I need to run a Phi model?

Accepted Answer

The smallest Phi model, Phi 1 5, runs from 0.7 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which Phi models can I run on a 16 GB GPU?

Accepted Answer

8 of 8 Phi models fit in 16 GB of VRAM at some quantization, including Dolphin Mistral 24B Venice Edition, Dolphin Mistral 24B Venice Edition, Dolphin 2.9.1 Yi 1.5 34B.

Question 3

What is the most popular Phi model to run locally?

Accepted Answer

Dolphin Mistral 24B Venice Edition is the most downloaded Phi model in local-friendly quantized formats. It runs from 7.3 GB of VRAM.

Model	Params	Runs from	Context	Publisher	Quant downloads
TinyDolphin 2.8 1.1B	1.1B	0.8 GB	4K	QuixiAI	—
Phi 1 5	1.4B	0.7 GB	2K	Microsoft	136
Phi 1	1.4B	0.7 GB	2K	Microsoft	—
MediPhi Instruct	3.8B	2.7 GB	131K	Microsoft	—
Dolphin X1 Trinity Nano	6.1B	3.0 GB	131K	dphn	—
Dolphin Mistral 24B Venice Edition	24.0B	7.3 GB	131K	Cognitive Computations	18.5K
Dolphin Mistral 24B Venice Edition	24.0B	10.9 GB	131K	dphn	1.4K
Dolphin 2.9.1 Yi 1.5 34B	34.4B	10.3 GB	8K	dphn	273

Phi Models — Hardware Requirements

All Phi Models by Size

Frequently Asked Questions