Phi 2 — Benchmarks

Benchmark scores for Phi 2 aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Knowledge

Benchmark	Score	Open rank	All models
HellaSwag	53.6%	#38 / 42	#68 / 76
MMLU	58.4%	#46 / 76	#95 / 136

Reasoning

Benchmark	Score	Open rank	All models
BIG-Bench Hard	59.4%	#10 / 37	#15 / 50

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.