Knowledge

Humanity's Last Exam Leaderboard

Humanity's Last Exam (HLE) is a set of extremely difficult, expert-written questions across many fields, designed so that even frontier models score low. It is built to stay hard as models improve, measuring the true knowledge frontier.

Source: epoch1 open models ranked+45 proprietaryData through Apr 2026

All models ranked on Humanity's Last Exam

Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.

#ModelScore
1gemini-3.1-pro-preview · proprietary
46.4%
2gpt-5.4-pro-2026-03-05_unknown · proprietary
44.3%
3muse-spark · proprietary
40.6%
4gemini-3-pro-preview · proprietary
37.5%
5gpt-5.4-2026-03-05_xhigh · proprietary
36.2%
6claude-opus-4-7_unknown · proprietary
36.2%
7claude-opus-4-6_max · proprietary
34.4%
8gpt-5-pro-2025-10-06_unknown · proprietary
31.6%
9gpt-5.2-2025-12-11_unknown · proprietary
27.8%
10gpt-5-2025-08-07_high · proprietary
25.3%
11gpt-5-2025-08-07_unknown · proprietary
25.3%
12kimi-k2.5 · proprietary
24.4%
13gpt-5.1-2025-11-13_unknown · proprietary
23.7%
14gemini-2.5-pro-preview-06-05 · proprietary
21.6%
15o3-2025-04-16_high · proprietary
20.3%
16gpt-5-mini-2025-08-07_unknown · proprietary
19.4%
17o3-2025-04-16_medium · proprietary
19.2%
18claude-opus-4-6 · proprietary
19.0%
19gemini-2.5-pro-exp-03-25 · proprietary
18.2%
20o4-mini-2025-04-16_high · proprietary
18.1%
21gemini-2.5-pro-preview-05-06 · proprietary
17.8%
22o4-mini-2025-04-16_medium · proprietary
14.3%
23claude-opus-4-5-20251101_unknown · proprietary
14.2%
24gemini-2.5-flash-preview-04-17 · proprietary
12.1%
25gemini-2.5-flash-preview-05-20 · proprietary
11.0%
26gemini-3.1-flash-lite · proprietary
8.6%
27glm-4.5 · proprietary
8.3%
28GLM 4.5 Air · 110.5B
8.1%
29o1-pro-2025-03-19 · proprietary
8.1%
30claude-3-7-sonnet-20250219_unknown · proprietary
8.0%
31o1-2024-12-17_unknown · proprietary
8.0%
32claude-opus-4-1-20250805_unknown · proprietary
7.9%
33claude-sonnet-4-5-20250929_unknown · proprietary
7.5%
34gpt-5.1-2025-11-13_none · proprietary
6.8%
35claude-opus-4-20250514_unknown · proprietary
6.7%
36gemini-2.0-flash-thinking-exp-01-21 · proprietary
6.6%
37Llama-4-Maverick-17B-128E-Instruct · proprietary
5.7%
38claude-sonnet-4-20250514_unknown · proprietary
5.5%
39gpt-4.5-preview-2025-02-27 · proprietary
5.4%
40gpt-4.1-2025-04-14 · proprietary
5.4%
41gemini-1.5-pro-002 · proprietary
4.6%
42mistral-medium-2505 · proprietary
4.5%
43amazon.nova-pro-v1:0 · proprietary
4.4%
44claude-3-5-sonnet-20241022 · proprietary
4.1%
45amazon.nova-lite-v1:0 · proprietary
3.6%
46gpt-4o-2024-11-20 · proprietary
2.7%

Humanity's Last Exam: frequently asked questions

What is the best open LLM on Humanity's Last Exam?
GLM 4.5 Air is the top open model on Humanity's Last Exam, scoring 8.1%. Among all models tested — including proprietary ones — it ranks #28.
Can open models match proprietary models on Humanity's Last Exam?
Not quite on Humanity's Last Exam: the strongest proprietary model (gemini-3.1-pro-preview) scores 46.4%, ahead of the best open model (GLM 4.5 Air) at 8.1% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.