Knowledge
Humanity's Last Exam Leaderboard
Humanity's Last Exam (HLE) is a set of extremely difficult, expert-written questions across many fields, designed so that even frontier models score low. It is built to stay hard as models improve, measuring the true knowledge frontier.
Source: epoch1 open models ranked+45 proprietaryData through Apr 2026
All models ranked on Humanity's Last Exam
Proprietary / closed models are shown dimmed — you can't run them locally, but they show where the open field stands.
| # | Model | Score |
|---|---|---|
| 1 | gemini-3.1-pro-preview · proprietary | 46.4% |
| 2 | gpt-5.4-pro-2026-03-05_unknown · proprietary | 44.3% |
| 3 | muse-spark · proprietary | 40.6% |
| 4 | gemini-3-pro-preview · proprietary | 37.5% |
| 5 | gpt-5.4-2026-03-05_xhigh · proprietary | 36.2% |
| 6 | claude-opus-4-7_unknown · proprietary | 36.2% |
| 7 | claude-opus-4-6_max · proprietary | 34.4% |
| 8 | gpt-5-pro-2025-10-06_unknown · proprietary | 31.6% |
| 9 | gpt-5.2-2025-12-11_unknown · proprietary | 27.8% |
| 10 | gpt-5-2025-08-07_high · proprietary | 25.3% |
| 11 | gpt-5-2025-08-07_unknown · proprietary | 25.3% |
| 12 | kimi-k2.5 · proprietary | 24.4% |
| 13 | gpt-5.1-2025-11-13_unknown · proprietary | 23.7% |
| 14 | gemini-2.5-pro-preview-06-05 · proprietary | 21.6% |
| 15 | o3-2025-04-16_high · proprietary | 20.3% |
| 16 | gpt-5-mini-2025-08-07_unknown · proprietary | 19.4% |
| 17 | o3-2025-04-16_medium · proprietary | 19.2% |
| 18 | claude-opus-4-6 · proprietary | 19.0% |
| 19 | gemini-2.5-pro-exp-03-25 · proprietary | 18.2% |
| 20 | o4-mini-2025-04-16_high · proprietary | 18.1% |
| 21 | gemini-2.5-pro-preview-05-06 · proprietary | 17.8% |
| 22 | o4-mini-2025-04-16_medium · proprietary | 14.3% |
| 23 | claude-opus-4-5-20251101_unknown · proprietary | 14.2% |
| 24 | gemini-2.5-flash-preview-04-17 · proprietary | 12.1% |
| 25 | gemini-2.5-flash-preview-05-20 · proprietary | 11.0% |
| 26 | gemini-3.1-flash-lite · proprietary | 8.6% |
| 27 | glm-4.5 · proprietary | 8.3% |
| 28 | GLM 4.5 Air · 110.5B | 8.1% |
| 29 | o1-pro-2025-03-19 · proprietary | 8.1% |
| 30 | claude-3-7-sonnet-20250219_unknown · proprietary | 8.0% |
| 31 | o1-2024-12-17_unknown · proprietary | 8.0% |
| 32 | claude-opus-4-1-20250805_unknown · proprietary | 7.9% |
| 33 | claude-sonnet-4-5-20250929_unknown · proprietary | 7.5% |
| 34 | gpt-5.1-2025-11-13_none · proprietary | 6.8% |
| 35 | claude-opus-4-20250514_unknown · proprietary | 6.7% |
| 36 | gemini-2.0-flash-thinking-exp-01-21 · proprietary | 6.6% |
| 37 | Llama-4-Maverick-17B-128E-Instruct · proprietary | 5.7% |
| 38 | claude-sonnet-4-20250514_unknown · proprietary | 5.5% |
| 39 | gpt-4.5-preview-2025-02-27 · proprietary | 5.4% |
| 40 | gpt-4.1-2025-04-14 · proprietary | 5.4% |
| 41 | gemini-1.5-pro-002 · proprietary | 4.6% |
| 42 | mistral-medium-2505 · proprietary | 4.5% |
| 43 | amazon.nova-pro-v1:0 · proprietary | 4.4% |
| 44 | claude-3-5-sonnet-20241022 · proprietary | 4.1% |
| 45 | amazon.nova-lite-v1:0 · proprietary | 3.6% |
| 46 | gpt-4o-2024-11-20 · proprietary | 2.7% |
Humanity's Last Exam: frequently asked questions
- What is the best open LLM on Humanity's Last Exam?
- GLM 4.5 Air is the top open model on Humanity's Last Exam, scoring 8.1%. Among all models tested — including proprietary ones — it ranks #28.
- Can open models match proprietary models on Humanity's Last Exam?
- Not quite on Humanity's Last Exam: the strongest proprietary model (gemini-3.1-pro-preview) scores 46.4%, ahead of the best open model (GLM 4.5 Air) at 8.1% — but you can run the open one yourself.
Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.