Question 1

How much VRAM do I need to run a Falcon model?

Accepted Answer

The smallest Falcon model, Falcon H1 0.5B Instruct, runs from 0.6 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which Falcon models can I run on a 16 GB GPU?

Accepted Answer

9 of 10 Falcon models fit in 16 GB of VRAM at some quantization, including Falcon Mamba 7B, Falcon H1 7B Instruct, Falcon 40B Instruct.

Question 3

What is the most popular Falcon model to run locally?

Accepted Answer

Falcon Mamba 7B is the most downloaded Falcon model in local-friendly quantized formats. It runs from 2.7 GB of VRAM.

Question 4

How do Falcon models score on benchmarks?

Accepted Answer

Falcon 40B leads the family with an overall benchmark rating of 41.2/100, ranking #43 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Model	Params	Runs from	Context	Publisher	Quant downloads
Falcon H1 0.5B Instruct	521M	0.6 GB	16K	TII UAE	—
Falcon 7B	7.2B	3.4 GB	—	TII UAE	62
Falcon 7B Instruct	7.2B	3.4 GB	—	TII UAE	—
Falcon Mamba 7B	7.3B	2.7 GB	—	TII UAE	1.5K
Falcon3 Mamba 7B Base	7.3B	16 GB	—	TII UAE	—
Falcon H1 7B Base	7.6B	3.7 GB	262K	TII UAE	—
Falcon H1 7B Instruct	7.6B	2.6 GB	262K	TII UAE	869
Falcon 11B	11.1B	5.0 GB	8K	TII UAE	67
Falcon 40B Instruct	40B	12.1 GB	—	TII UAE	342
Falcon 40B	41.8B	19.6 GB	—	TII UAE	36

Falcon Models — Hardware Requirements

All Falcon Models by Size

How Falcon Compares — Benchmark Rating

Frequently Asked Questions