Llama 3.1 Tulu 3 70B DPO — Benchmarks

Benchmark scores for Llama 3.1 Tulu 3 70B DPO aggregated from public leaderboards, with how it ranks among open models. See hardware requirements for what you need to run it.

Math

Benchmark	Score	Open rank	All models
AIME 2024/2025	4.4%	#25 / 34	#136 / 155
MATH Level 5	42.7%	#15 / 32	#70 / 108

Reasoning

Benchmark	Score	Open rank	All models
GPQA Diamond	46.3%	#25 / 46	#134 / 182

Scores aggregated from public benchmark sources (each linked from the benchmark pages). llmrun does not run these benchmarks.