pull down to refresh
What tasks are they being scored on? I've found ChatGPT to be a bit more useful for general purpose queries while Claude is better at technical stuff
reply
This is overall "text chatbot". There's categories, see https://arena.ai/leaderboard/
For me, the only thing that GPT currently does well is deep research, as good as Grok, better than Claude and Gemini (though funnily, arena disagrees with me.)
But since Grok responses are much less annoying than GPT's emoji flood, I use the former.
reply
Here's the current ELO:
GPT from 2 versions ago is ranked 9 and beaten by every direct competitor. The new versions don't even show in top 20.
gpt-5.2-highis ranked at 21, below open models like Kimi and GLM