Phi 4 — Benchmarks

Benchmark scores for Phi 4 aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Coding

BenchmarkScoreRank
LiveBench Coding30.7#11 / 13

Knowledge

BenchmarkScoreRank
MMLU84.8%#2 / 36

Math

BenchmarkScoreRank
AIME 2024/202513.8%#11 / 22
MATH Level 564.9%#8 / 23
LiveBench Math42.0#10 / 13

Reasoning

BenchmarkScoreRank
GPQA Diamond56.1%#9 / 28
LiveBench Reasoning47.8#8 / 13

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.