Age of LLM — Benchmark

v0.11.0 ▶ Presentation
1v1 strategic benchmark — two LLMs face off. Win by nuclear bomb or military conquest. Ranking by points (3/1/0).
⭐ Give a star on GitHub 𝕏 Follow on X — know when new models are tested
Model ranking
Ranked by points/match · ties by win rate
# Model Pts/match Pts Win rate W L D Total ☢ Nuc. ⚔ Mil. 📜 Dip. ☢☢ MD ⏱ avg/turn ▦ tok/turn 💲 $/match ⚠ % illegal
Recent matches