Mellum Models — Hardware Requirements

7 Mellum models from JetBrains and the community, from the smallest that runs in 2.8 GB of VRAM up to 12.1B parameters. Every row links to full quantization tables and GPU compatibility.

All Mellum Models by Size

ModelParamsContext
Mellum 4B Base4.0B8K
Mellum2 12B A2.5B Thinking12.1B131K
Mellum2 12B A2.5B Instruct12.1B131K
Mellum2 12B A2.5B Base12.1B131K
Mellum2 12B A2.5B Thinking SFT12.1B131K
Mellum2 12B A2.5B Base Pretrain12.1B131K
Mellum2 12B A2.5B Instruct SFT12.1B131K

Frequently Asked Questions

How much VRAM do I need to run a Mellum model?
The smallest Mellum model, Mellum 4B Base, runs from 2.8 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which Mellum models can I run on a 16 GB GPU?
5 of 7 Mellum models fit in 16 GB of VRAM at some quantization, including Mellum2 12B A2.5B Thinking, Mellum2 12B A2.5B Instruct, Mellum 4B Base.
What is the most popular Mellum model to run locally?
Mellum2 12B A2.5B Thinking is the most downloaded Mellum model in local-friendly quantized formats. It runs from 5.5 GB of VRAM.