Knowledge
Humanity's Last Exam Leaderboard
Humanity's Last Exam (HLE) is a set of extremely difficult, expert-written questions across many fields, designed so that even frontier models score low. It is built to stay hard as models improve, measuring the true knowledge frontier.
Source: epoch1 open models ranked+45 proprietaryData through Apr 2026
Open models ranked on Humanity's Last Exam
# shows rank among open models / rank overall (including proprietary).
| # | Model | Score |
|---|---|---|
| 1 / 28 | GLM 4.5 Air · 110.5B | 8.1% |
Humanity's Last Exam: frequently asked questions
- What is the best open LLM on Humanity's Last Exam?
- GLM 4.5 Air is the top open model on Humanity's Last Exam, scoring 8.1%. Among all models tested — including proprietary ones — it ranks #28.
- Can open models match proprietary models on Humanity's Last Exam?
- Not quite on Humanity's Last Exam: the strongest proprietary model (gemini-3.1-pro-preview) scores 46.4%, ahead of the best open model (GLM 4.5 Air) at 8.1% — but you can run the open one yourself.
Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.