GLM 4 Models — Hardware Requirements

13 GLM 4 models from zai-org and the community, from the smallest that runs in 3.2 GB of VRAM up to 358.3B parameters. Every row links to full quantization tables and GPU compatibility.

All GLM 4 Models by Size

ModelParamsContext
GLM 4 9B 04149.4B33K
GLM 4.6V Flash10.3B131K
GLM 4.7 Flash REAP 23B A3B23.0B203K
GLM 4.7 Flash Heretic29.9B203K
GLM 4.7 Flash Heretic 1.2.029.9B203K
GLM 4.7 Flash Ultimate Irrefusable Heretic29.9B203K
GLM 4.7 Flash31.2B203K
GLM 4.6V107.7B131K
GLM 4.5 Air110.5B131K
GLM 4.5 Air Derestricted110.5B131K
GLM 4.6356.8B203K
GLM 4.6 Derestricted v3356.8B203K
GLM 4.7358.3B203K
GLM 4.5358.3B131K

How GLM 4 Compares — Benchmark Rating

GLM 4.6V is the highest-rated GLM 4 model with an overall benchmark rating of 54.7/100 — #28 among 75 open models. The top proprietary model, GPT 5.5, scores 88.8. Click a model to see its full benchmark breakdown.

GPT 5.5 · proprietary88.8
Claude Opus 4.7 · proprietary87.6
Claude Fable 5 · proprietary86.6
GPT 5.4 · proprietary86.6
Claude Opus 4.8 · proprietary84.4
Composite of normalized public benchmark scores (methodology) · GLM 4 · other models

Frequently Asked Questions

How much VRAM do I need to run a GLM 4 model?
The smallest GLM 4 model, GLM 4.6V Flash, runs from 3.2 GB of VRAM at an aggressive quantization. Larger family members need proportionally more — see the table above for every model.
Which GLM 4 models can I run on a 16 GB GPU?
7 of 14 GLM 4 models fit in 16 GB of VRAM at some quantization, including GLM 4.7 Flash, GLM 4.6V Flash, GLM 4.7 Flash REAP 23B A3B.
What is the most popular GLM 4 model to run locally?
GLM 4.7 Flash is the most downloaded GLM 4 model in local-friendly quantized formats. It runs from 9.7 GB of VRAM.
How do GLM 4 models score on benchmarks?
GLM 4.6V leads the family with an overall benchmark rating of 54.7/100, ranking #28 among 75 open models, while the top proprietary model, GPT 5.5, scores 88.8. See the comparison chart above for the full standings.