Question 1

How much VRAM do I need to run a GLM 4 model?

Accepted Answer

The smallest GLM 4 model, GLM 4.6V Flash, runs from 3.2 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.

Question 2

Which GLM 4 models can I run on a 16 GB GPU?

Accepted Answer

7 of 15 GLM 4 models fit in 16 GB of VRAM at some quantization, including GLM 4.7 Flash, GLM 4.6V Flash, GLM 4.7 Flash REAP 23B A3B.

Question 3

What is the most popular GLM 4 model to run locally?

Accepted Answer

GLM 4.7 Flash is the most downloaded GLM 4 model in local-friendly quantized formats. It runs from 9.7 GB of VRAM.

Question 4

How do GLM 4 models score on benchmarks?

Accepted Answer

GLM 4.7 leads the family with an overall benchmark rating of 42.6/100, ranking #40 among 73 open models, while the top proprietary model, Claude Fable 5 Max, scores 89.9. See the comparison chart above for the full standings.

Model	Params	Runs from	Context	Publisher	Quant downloads
GLM 4 9B 0414	9.4B	4.4 GB	33K	Z.ai	—
GLM 4.6V Flash	10.3B	3.2 GB	131K	Z.ai	96.2K
GLM 4.7 Flash REAP 23B A3B	23.0B	7.4 GB	203K	Cerebras	22.6K
GLM 4.7 Flash Heretic	29.9B	13.8 GB	203K	Olafangensan	—
GLM 4.7 Flash Heretic 1.2.0	29.9B	13.8 GB	203K	darkc0de	—
GLM 4.7 Flash Ultimate Irrefusable Heretic	29.9B	13.8 GB	203K	llmfan46	—
GLM 4.7 Flash	31.2B	9.7 GB	203K	Z.ai	116.6K
GLM 4.6V	107.7B	30.1 GB	131K	Z.ai	4.9K
GLM 4.5 Air	110.5B	30.8 GB	131K	Z.ai	30.3K
GLM 4.5 Air Derestricted	110.5B	47.4 GB	131K	ArliAI	—
GLM 4.6	356.8B	98.7 GB	203K	Z.ai	5.8K
GLM 4.6 Derestricted	356.8B	98.7 GB	203K	ArliAI	2.4K
GLM 4.6 Derestricted v3	356.8B	152.3 GB	203K	ArliAI	—
GLM 4.7	358.3B	99.2 GB	203K	Z.ai	9.4K
GLM 4.5	358.3B	99.2 GB	131K	Z.ai	1.8K

GLM 4 Models — Hardware Requirements

All GLM 4 Models by Size

How GLM 4 Compares — Benchmark Rating

Frequently Asked Questions