The ELO on LMArena overall is down to 1498 right now, but it's still beating Grok 4.1 (by a hair or 2.) It definitely beats gpt-5.1 and also Claude 4.5 on coding, according to users.

What worries me is that open models aren't competitive right now, Kimi/GLM/Qwen are barely in the top 20.


optimism

Gemini 3 Pro scored 1501 Elo on the LMArena leaderboard, topping virtually every other LLM, including Claude, ChatGPT, and Grok.

On the GPQA Diamond benchmark, which tests PhD-level scientific reasoning, it achieved 91.9%—better than Claude Sonnet 4.5 and ChatGPT 5.1.

The model also scored 37.5% on Humanity’s Last Exam without tools, surpassing GPT-5 Pro’s previous high of 31.64%.

In math, Gemini 3 set a new standard with 23.4% on MathArena Apex.

>    Gemini 3 Pro scored 1501 Elo on the LMArena leaderboard, topping virtually every other LLM, including Claude, ChatGPT, and Grok.
    On the GPQA Diamond benchmark, which tests PhD-level scientific reasoning, it achieved 91.9%—better than Claude Sonnet 4.5 and ChatGPT 5.1.
    The model also scored 37.5% on Humanity’s Last Exam without tools, surpassing GPT-5 Pro’s previous high of 31.64%.
    In math, Gemini 3 set a new standard with 23.4% on MathArena Apex.