What is the best open LLM on FrontierMath?

Kimi K2.6 is the top open model on FrontierMath, scoring 39.0%. Among all models tested — including proprietary ones — it ranks #14. The top model overall is GPT 5.5 Pro Pre Release (high) (OpenAI) at 52.4%.

Can open models match proprietary models on FrontierMath?

Not quite on FrontierMath: the strongest proprietary model (GPT 5.5 Pro Pre Release (high)) scores 52.4%, ahead of the best open model (Kimi K2.6) at 39.0% — but you can run the open one yourself.

Math

FrontierMath Leaderboard

Name: FrontierMath — open LLM scores
Creator: epoch

FrontierMath is a benchmark of exceptionally hard, original research-level mathematics problems created with professional mathematicians. Even the strongest models solve only a small fraction, making it a frontier measure of genuine mathematical ability.

Source: epoch12 open models ranked+89 proprietaryData through May 2026

Open models All models

Open models ranked on FrontierMath

# shows rank among open models / rank overall (including proprietary).

#	Model	Score
1 / 15	Kimi K2.6 · 1058.6B	39.0%
2 / 21	GLM 5.1 · 753.9B	33.5%
3 / 27	Kimi K2.5 · 1058.6B	27.9%
4 / 36	DeepSeek V3.2 · 685.4B	22.1%
5 / 37	Kimi K2 Thinking · 1058.1B	21.4%
6 / 48	GLM 5 · 753.9B	16.4%
7 / 59	Qwen3 235B A22B Thinking 2507 · 235.1B	8.5%
8 / 76	GLM 4.6 · 356.8B	3.8%
9 / 82	GLM 4.7 · 358.3B	2.4%
10 / 85	DeepSeek v3 · 684.5B	1.7%
11 / 94	Llama 4 Maverick 17B 128E Instruct · 401.6B	0.7%
12 / 101	Llama 4 Scout 17B 16E Instruct · 108.6B	0.0%

Score vs model size

Which models give the most quality for their size — the ones worth running locally.

Each dot is a model. Up = higher score, left = smaller (easier to run locally). The dashed line marks the efficiency frontier — the best score you can get at each size or smaller.

FrontierMath: frequently asked questions

What is the best open LLM on FrontierMath?: Kimi K2.6 is the top open model on FrontierMath, scoring 39.0%. Among all models tested — including proprietary ones — it ranks #14. The top model overall is GPT 5.5 Pro Pre Release (high) (OpenAI) at 52.4%.
Can open models match proprietary models on FrontierMath?: Not quite on FrontierMath: the strongest proprietary model (GPT 5.5 Pro Pre Release (high)) scores 52.4%, ahead of the best open model (Kimi K2.6) at 39.0% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.