GPT OSS 20B vs GPT OSS 20B Heretic Ara v3

Side-by-side comparison of VRAM requirements, quantization, context length, and hardware compatibility.

GPT OSS 20B

OpenAI · 21.5B

Chat

Specifications

GPT OSS 20BGPT OSS 20B Heretic Ara v3
Parameters21.5B1.8B
Context131K131K
ArchitectureGptOssForCausalLMGptOssForCausalLM
LicenseApache 2.0Apache 2.0
Downloads7.6M1.1K
ReleasedAug 2025Mar 2026

VRAM by Quantization: GPT OSS 20B vs GPT OSS 20B Heretic Ara v3

QuantizationBitsGPT OSS 20B VRAMGPT OSS 20B Heretic Ara v3 VRAM
Q2_K3.409.5 GB
Q3_K_M3.9010.9 GB
Q3_K_S3.509.8 GB
Q4_04.0011.1 GB
Q4_K_M4.8013.3 GB
Q5_K_M5.7015.7 GB
Q6_K6.6018.1 GB
Q8_08.0021.9 GB2.2 GB

Verdict

GPT OSS 20B Heretic Ara v3 needs less VRAM at Q8_0 (2.2 GB vs 21.9 GB), so it fits on smaller GPUs. GPT OSS 20B is the more widely downloaded of the two.

Frequently Asked Questions

Which needs less VRAM, GPT OSS 20B or GPT OSS 20B Heretic Ara v3?

At Q8_0, GPT OSS 20B needs 21.9 GB and GPT OSS 20B Heretic Ara v3 needs 2.2 GB, so GPT OSS 20B Heretic Ara v3 is the lighter option to run locally.

Which has a longer context window, GPT OSS 20B or GPT OSS 20B Heretic Ara v3?

GPT OSS 20B supports 131,072 tokens and GPT OSS 20B Heretic Ara v3 supports 131,072 tokens.

What is the difference between GPT OSS 20B and GPT OSS 20B Heretic Ara v3?

GPT OSS 20B is a 21.5B model from OpenAI (GPT-OSS family), while GPT OSS 20B Heretic Ara v3 is a 1.8B model from p-e-w (GPT-OSS family). Compare their VRAM requirements above to see which fits your GPU or Mac.