Math

FrontierMath Leaderboard

FrontierMath is a benchmark of exceptionally hard, original research-level mathematics problems created with professional mathematicians. Even the strongest models solve only a small fraction, making it a frontier measure of genuine mathematical ability.

Source: epoch3 open models ranked+97 proprietaryData through May 2026

All models ranked on FrontierMath

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#ModelScore
1gpt-5.5-pro-pre-release_high · proprietary
52.4%
2gpt-5.5-pre-release_xhigh · proprietary
51.7%
3gpt-5.5-pro-pre-release_xhigh · proprietary
51.0%
4gpt-5.4-pro-2026-03-05_xhigh · proprietary
50.0%
5gpt-5.4-2026-03-05_xhigh · proprietary
47.6%
6claude-opus-4-7_xhigh · proprietary
43.8%
7claude-opus-4-6_max · proprietary
40.7%
8gpt-5.2-2025-12-11_xhigh · proprietary
40.7%
9gpt-5.2-2025-12-11_high · proprietary
40.3%
10claude-opus-4-6_32K · proprietary
40.0%
11claude-opus-4-6_64K · proprietary
39.7%
12muse-spark · proprietary
39.0%
13gemini-3.5-flash_high · proprietary
39.0%
14kimi-k2.6 · proprietary
39.0%
15claude-opus-4-6 · proprietary
38.3%
16gemini-3-pro-preview · proprietary
37.6%
17gemini-3.1-pro-preview · proprietary
36.9%
18gpt-5.2-2025-12-11_medium · proprietary
36.9%
19gemini-3-flash-preview · proprietary
35.6%
20GLM 5.1 · 753.9B
33.5%
21gpt-5-2025-08-07_high · proprietary
32.4%
22claude-sonnet-4-6_16K · proprietary
32.4%
23gpt-5.1-2025-11-13_high · proprietary
31.0%
24gemini-2.5-deep-think-2025-08-01-webapp · proprietary
29.0%
25gpt-5.4-mini-2026-03-17_high · proprietary
28.3%
26fireworks/kimi-k2p5 · proprietary
27.9%
27gpt-5-2025-08-07_medium · proprietary
27.2%
28gpt-5-mini-2025-08-07_high · proprietary
27.2%
29gpt-5.1-2025-11-13_medium · proprietary
26.9%
30gpt-5.2-2025-12-11_low · proprietary
26.6%
31qwen3.6-plus · proprietary
26.2%
32gpt-5.4-nano-2026-03-17_high · proprietary
25.9%
33o4-mini-2025-04-16_high · proprietary
24.8%
34qwen3.6-max-preview · proprietary
23.1%
35fireworks/deepseek-v3p2 · proprietary
22.1%
36moonshotai/Kimi-K2-Thinking · proprietary
21.4%
37qwen3.5-plus · proprietary
21.0%
38claude-opus-4-5-20251101 · proprietary
20.7%
39claude-opus-4-5-20251101_32K · proprietary
20.7%
40claude-opus-4-5-20251101_16K · proprietary
20.3%
41gpt-5-mini-2025-08-07_medium · proprietary
20.3%
42grok-4-0709 · proprietary
19.7%
43o4-mini-2025-04-16_medium · proprietary
19.0%
44o3-2025-04-16_high · proprietary
18.7%
45gpt-5.1-2025-11-13_low · proprietary
17.3%
46o3-2025-04-16_medium · proprietary
16.9%
47GLM 5 · 753.9B
16.4%
48claude-sonnet-4-5-20250929_32K · proprietary
15.2%
49gemini-2.5-pro · proprietary
14.1%
50claude-sonnet-4-5-20250929_59K · proprietary
13.5%
51o3-mini-2025-01-31_high · proprietary
12.4%
52o4-mini-2025-04-16_low · proprietary
10.7%
53gemini-2.5-pro-preview-06-05 · proprietary
10.3%
54qwen3.6-flash · proprietary
10.3%
55o3-2025-04-16_low · proprietary
9.7%
56claude-sonnet-4-5-20250929 · proprietary
9.3%
57o1-2024-12-17_high · proprietary
9.3%
58Qwen/Qwen3-235B-A22B-Thinking-2507 · proprietary
8.5%
59gpt-5-nano-2025-08-07_high · proprietary
8.3%
60o3-mini-2025-01-31_medium · proprietary
8.1%
61claude-opus-4-1-20250805_27K · proprietary
7.2%
62gpt-5-nano-2025-08-07_medium · proprietary
7.2%
63qwen3.5-flash · proprietary
6.2%
64claude-haiku-4-5-20251001_32K · proprietary
5.9%
65claude-opus-4-1-20250805 · proprietary
5.9%
66grok-3-mini-beta_high · proprietary
5.9%
67gpt-4.1-2025-04-14 · proprietary
5.5%
68gemini-2.5-flash · proprietary
4.8%
69claude-opus-4-20250514 · proprietary
4.5%
70gpt-4.1-mini-2025-04-14 · proprietary
4.5%
71claude-3-7-sonnet-20250219_16K · proprietary
4.1%
72claude-haiku-4-5-20251001 · proprietary
4.1%
73claude-opus-4-20250514_27K · proprietary
4.1%
74claude-sonnet-4-20250514 · proprietary
4.1%
75zai-org/GLM-4.6 · proprietary
3.8%
76grok-3-beta · proprietary
3.8%
77claude-3-7-sonnet-20250219_32K · proprietary
3.5%
78claude-3-7-sonnet-20250219 · proprietary
3.1%
79claude-3-7-sonnet-20250219_64K · proprietary
3.1%
80grok-3-mini-beta_low · proprietary
2.8%
81zai-org/GLM-4.7 · proprietary
2.4%
82gpt-5.1-2025-11-13_none · proprietary
2.1%
83claude-3-5-sonnet-20241022 · proprietary
2.1%
84DeepSeek-V3 · proprietary
1.7%
85gemini-2.0-flash-001 · proprietary
1.7%
86o1-mini-2024-09-12_medium · proprietary
1.7%
87qwen-plus-2025-04-28 · proprietary
1.7%
88o1-mini-2024-09-12_high · proprietary
1.4%
89claude-3-5-sonnet-20240620 · proprietary
1.0%
90gpt-4.1-nano-2025-04-14 · proprietary
1.0%
91qwen-max-2025-01-25 · proprietary
1.0%
92grok-2-1212 · proprietary
0.7%
93Llama-4-Maverick-17B-128E-Instruct-FP8 · proprietary
0.7%
94mistral-medium-2505 · proprietary
0.4%
95claude-3-5-haiku-20241022 · proprietary
0.3%
96gpt-4o-2024-08-06 · proprietary
0.3%
97gpt-4o-2024-11-20 · proprietary
0.3%
98mistral-large-2411 · proprietary
0.3%
99gemini-1.5-flash-002 · proprietary
0.0%
100Llama 4 Scout 17B 16E Instruct · 108.6B
0.0%

FrontierMath: frequently asked questions

What is the best open LLM on FrontierMath?
GLM 5.1 is the top open model on FrontierMath, scoring 33.5%. Among all models tested — including proprietary ones — it ranks #20.
Can open models match proprietary models on FrontierMath?
Not quite on FrontierMath: the strongest proprietary model (gpt-5.5-pro-pre-release_high) scores 52.4%, ahead of the best open model (GLM 5.1) at 33.5% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.