TinyLlama Models — Hardware Requirements

3 TinyLlama models from TinyLlama and the community, from the smallest that runs in 0.8 GB of VRAM up to 1.1B parameters. Every row links to full quantization tables and GPU compatibility.

All TinyLlama Models by Size

ModelParamsContext
TinyLlama 1.1B Chat v1.01.1B2K
TinyLlama 1.1B Intermediate Step 1431k 3T1.1B2K
TinyLlama 1.1B Chat V0.61.1B2K

Frequently Asked Questions

How much VRAM do I need to run a TinyLlama model?
The smallest TinyLlama model, TinyLlama 1.1B Chat v1.0, runs from 0.8 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which TinyLlama models can I run on a 16 GB GPU?
3 of 3 TinyLlama models fit in 16 GB of VRAM at some quantization, including TinyLlama 1.1B Chat v1.0, TinyLlama 1.1B Intermediate Step 1431k 3T, TinyLlama 1.1B Chat V0.6.
What is the most popular TinyLlama model to run locally?
TinyLlama 1.1B Chat v1.0 is the most downloaded TinyLlama model in local-friendly quantized formats. It runs from 0.8 GB of VRAM.