Phi 4 — Benchmarks
Benchmark scores for Phi 4 aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.
Coding
| Benchmark | Score | Rank |
|---|---|---|
| LiveBench Coding | 30.7 | #11 / 13 |
Knowledge
| Benchmark | Score | Rank |
|---|---|---|
| MMLU | 84.8% | #2 / 36 |
Math
| Benchmark | Score | Rank |
|---|---|---|
| AIME 2024/2025 | 13.8% | #11 / 22 |
| MATH Level 5 | 64.9% | #8 / 23 |
| LiveBench Math | 42.0 | #10 / 13 |
Reasoning
| Benchmark | Score | Rank |
|---|---|---|
| GPQA Diamond | 56.1% | #9 / 28 |
| LiveBench Reasoning | 47.8 | #8 / 13 |
Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.