GLM 4.7 Flash REAP 23B A3B vs GLM 4 9B 0414

Side-by-side comparison of VRAM requirements, quantization, context length, and hardware compatibility.

GLM 4.7 Flash REAP 23B A3B

Cerebras · 23.0B

Chat
GLM 4 9B 0414

zai-org · 9.4B

Chat

Specifications

GLM 4.7 Flash REAP 23B A3BGLM 4 9B 0414
Parameters23.0B9.4B
Context203K33K
ArchitectureGlm4MoeLiteForCausalLMGlm4ForCausalLM
LicenseMITMIT
Downloads54214.7K
ReleasedJan 2026Apr 2025

VRAM by Quantization: GLM 4.7 Flash REAP 23B A3B vs GLM 4 9B 0414

QuantizationBitsGLM 4.7 Flash REAP 23B A3B VRAMGLM 4 9B 0414 VRAM
Q2_K3.4010.9 GB4.4 GB
Q3_K_M3.9012.3 GB5.0 GB
Q3_K_S3.5011.2 GB4.5 GB
Q4_04.0012.6 GB5.1 GB
Q4_K_M4.8014.9 GB6.0 GB
Q5_K_M5.7017.5 GB7.1 GB
Q6_K6.6020.1 GB8.1 GB
Q8_08.0024.1 GB9.8 GB

Verdict

GLM 4 9B 0414 needs less VRAM at Q4_K_M (6.0 GB vs 14.9 GB), so it fits on smaller GPUs. GLM 4.7 Flash REAP 23B A3B supports a longer context window (203K tokens). GLM 4 9B 0414 is the more widely downloaded of the two.

Frequently Asked Questions

Which needs less VRAM, GLM 4.7 Flash REAP 23B A3B or GLM 4 9B 0414?

At Q4_K_M, GLM 4.7 Flash REAP 23B A3B needs 14.9 GB and GLM 4 9B 0414 needs 6.0 GB, so GLM 4 9B 0414 is the lighter option to run locally.

Which has a longer context window, GLM 4.7 Flash REAP 23B A3B or GLM 4 9B 0414?

GLM 4.7 Flash REAP 23B A3B supports 202,752 tokens and GLM 4 9B 0414 supports 32,768 tokens.

What is the difference between GLM 4.7 Flash REAP 23B A3B and GLM 4 9B 0414?

GLM 4.7 Flash REAP 23B A3B is a 23.0B model from Cerebras (GLM family), while GLM 4 9B 0414 is a 9.4B model from zai-org (GLM family). Compare their VRAM requirements above to see which fits your GPU or Mac.