pull down to refresh

Gemini 3 Pro scored 1501 Elo on the LMArena leaderboard, topping virtually every other LLM, including Claude, ChatGPT, and Grok. On the GPQA Diamond benchmark, which tests PhD-level scientific reasoning, it achieved 91.9%—better than Claude Sonnet 4.5 and ChatGPT 5.1. The model also scored 37.5% on Humanity’s Last Exam without tools, surpassing GPT-5 Pro’s previous high of 31.64%. In math, Gemini 3 set a new standard with 23.4% on MathArena Apex.
10 sats \ 0 replies \ @optimism 20h
The ELO on LMArena overall is down to 1498 right now, but it's still beating Grok 4.1 (by a hair or 2.) It definitely beats gpt-5.1 and also Claude 4.5 on coding, according to users.
What worries me is that open models aren't competitive right now, Kimi/GLM/Qwen are barely in the top 20.
reply