Phi 4 — Benchmarks

Benchmark scores for Phi 4 aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Overall rank: #30 of 73 open modelscomposite 51.4/100 across 5 benchmarks in 3 categories · methodology

Knowledge

Benchmark	Score	Open rank	All models
MMLU-Pro	70.4%	#35 / 119	#99 / 259
MMLU	84.8%	#5 / 76	#11 / 136

Math

Benchmark	Score	Open rank	All models
AIME 2024/2025	13.8%	#19 / 34	#114 / 155
MATH Level 5	64.9%	#9 / 32	#49 / 108

Reasoning

Benchmark	Score	Open rank	All models
GPQA Diamond	56.1%	#17 / 46	#111 / 182

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.