Llama 2 70B Chat HF — Benchmarks

Benchmark scores for Llama 2 70B Chat HF aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Overall rank: #57 of 73 open modelscomposite 32.4/100 across 6 benchmarks in 3 categories · methodology

Knowledge

Benchmark	Score	Open rank	All models
MMLU	59.9%	#43 / 76	#90 / 136

Math

Benchmark	Score	Open rank	All models
AIME 2024/2025	0.0%	#34 / 34	#155 / 155
MATH Level 5	3.3%	#32 / 32	#108 / 108
GSM8K	58.7%	#25 / 59	#34 / 93

Reasoning

Benchmark	Score	Open rank	All models
GPQA Diamond	26.3%	#41 / 46	#175 / 182
BIG-Bench Hard	58.5%	#11 / 37	#17 / 50

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.