Question 1

How much VRAM do I need to run a Phi 3 model?

Accepted Answer

The smallest Phi 3 model, Phi 3.5 Mini Instruct, runs from 2.3 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which Phi 3 models can I run on a 16 GB GPU?

Accepted Answer

6 of 6 Phi 3 models fit in 16 GB of VRAM at some quantization, including Phi 3.5 Mini Instruct, Phi 3 Mini 4k Instruct, Phi 3.5 MoE Instruct.

Question 3

What is the most popular Phi 3 model to run locally?

Accepted Answer

Phi 3.5 Mini Instruct is the most downloaded Phi 3 model in local-friendly quantized formats. It runs from 2.3 GB of VRAM.

Model	Params	Runs from	Context	Publisher	Quant downloads
Phi 3.5 Mini Instruct	3.8B	2.3 GB	131K	Microsoft	290.6K
Phi 3 Mini 4k Instruct	3.8B	2.7 GB	4K	Microsoft	4.7K
Phi 3 Mini 128k Instruct	3.8B	2.7 GB	131K	Microsoft	71
Phi 3 Small 8k Instruct	7.4B	15.3 GB	8K	Microsoft	—
Phi 3 Medium 4k Instruct	14.0B	6.7 GB	4K	Microsoft	—
Phi 3.5 MoE Instruct	41.9B	12.1 GB	131K	Microsoft	1.6K

Phi 3 Models — Hardware Requirements

All Phi 3 Models by Size

Frequently Asked Questions