pull down to refresh

Here's the current ELO:

GPT from 2 versions ago is ranked 9 and beaten by every direct competitor. The new versions don't even show in top 20. gpt-5.2-high is ranked at 21, below open models like Kimi and GLM

What tasks are they being scored on? I've found ChatGPT to be a bit more useful for general purpose queries while Claude is better at technical stuff

reply

This is overall "text chatbot". There's categories, see https://arena.ai/leaderboard/

For me, the only thing that GPT currently does well is deep research, as good as Grok, better than Claude and Gemini (though funnily, arena disagrees with me.)

But since Grok responses are much less annoying than GPT's emoji flood, I use the former.

reply