Question 1

How much VRAM do I need to run a SmolLM model?

Accepted Answer

The smallest SmolLM model, SmolLM2 70M, runs from 0.4 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which SmolLM models can I run on a 16 GB GPU?

Accepted Answer

13 of 13 SmolLM models fit in 16 GB of VRAM at some quantization, including SmolLM2 135M Instruct, SmolLM2 1.7B Instruct, SmolLM2 360M Instruct.

Question 3

What is the most popular SmolLM model to run locally?

Accepted Answer

SmolLM2 135M Instruct is the most downloaded SmolLM model in local-friendly quantized formats. It runs from 0.4 GB of VRAM.

Model	Params	Runs from	Context	Publisher	Quant downloads
SmolLM2 70M	69M	0.4 GB	8K	codelion	—
SmolLM2 135M Instruct	135M	0.4 GB	8K	Hugging Face	132.4K
SmolLM 135M	135M	0.4 GB	2K	Hugging Face	943
SmolLM2 135M	135M	0.4 GB	8K	Hugging Face	91
SmolLM2 360M Instruct	362M	0.5 GB	8K	Hugging Face	19.6K
SmolLM2 360M	362M	0.5 GB	8K	Hugging Face	116
SmolLM 360M Instruct	362M	0.5 GB	2K	Hugging Face	—
SmolLM2 1.7B Instruct	1.7B	1.4 GB	8K	Hugging Face	36.0K
SmolLM2 1.7B	1.7B	1.4 GB	8K	Hugging Face	180
SmolLM 1.7B	1.7B	1.4 GB	2K	Hugging Face	97
SmolLM3 3B Base	3B	1.3 GB	66K	Hugging Face	280
SmolLM3 3B ONNX	3B	1.7 GB	66K	Hugging Face	—
SmolLM3 3B	3.1B	1.3 GB	66K	Hugging Face	18.4K

SmolLM Models — Hardware Requirements

All SmolLM Models by Size

Frequently Asked Questions