Question 1

How much VRAM do I need to run a OLMo model?

Accepted Answer

The smallest OLMo model, OLMo 2 0425 1B, runs from 1.2 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which OLMo models can I run on a 16 GB GPU?

Accepted Answer

7 of 10 OLMo models fit in 16 GB of VRAM at some quantization, including Olmo 3 7B Instruct, OLMoE 1B 7B 0924 Instruct, OLMoE 1B 7B 0924.

Question 3

What is the most popular OLMo model to run locally?

Accepted Answer

Olmo 3 7B Instruct is the most downloaded OLMo model in local-friendly quantized formats. It runs from 3.4 GB of VRAM.

Model	Params	Runs from	Context	Publisher	Quant downloads
OLMo 2 0425 1B	1.5B	1.2 GB	4K	Allen AI	174
OLMoE 1B 7B 0924 Instruct	6.9B	3.5 GB	4K	Allen AI	1.3K
OLMoE 1B 7B 0924	6.9B	3.5 GB	4K	Allen AI	701
OLMoE 1B 7B 0125 Instruct	6.9B	2.5 GB	4K	Allen AI	588
Olmo Hybrid 7B	7B	15.3 GB	66K	Allen AI	—
Olmo 3 7B Instruct	7.3B	3.4 GB	66K	Allen AI	3.1K
Olmo 3 1025 7B	7.3B	3.4 GB	66K	Allen AI	377
Olmo 3 1125 32B	32.2B	65.3 GB	66K	Allen AI	—
Olmo 3.1 32B Think	32.2B	65.3 GB	66K	Allen AI	—
FlexOlmo 7x7B 1T	33.3B	67.9 GB	4K	Allen AI	—

OLMo Models — Hardware Requirements

All OLMo Models by Size

Frequently Asked Questions