Knowledge

Humanity's Last Exam Leaderboard

Humanity's Last Exam (HLE) is a set of extremely difficult, expert-written questions across many fields, designed so that even frontier models score low. It is built to stay hard as models improve, measuring the true knowledge frontier.

Source: epoch1 open models ranked+45 proprietaryData through Apr 2026

Open models ranked on Humanity's Last Exam

# shows rank among open models / rank overall (including proprietary).

#ModelScore
1 / 28GLM 4.5 Air · 110.5B
8.1%

Humanity's Last Exam: frequently asked questions

What is the best open LLM on Humanity's Last Exam?
GLM 4.5 Air is the top open model on Humanity's Last Exam, scoring 8.1%. Among all models tested — including proprietary ones — it ranks #28.
Can open models match proprietary models on Humanity's Last Exam?
Not quite on Humanity's Last Exam: the strongest proprietary model (gemini-3.1-pro-preview) scores 46.4%, ahead of the best open model (GLM 4.5 Air) at 8.1% — but you can run the open one yourself.

Scores aggregated from epoch. llmrun does not run this benchmark — see the source for methodology, or the about benchmarks for what it measures.