This is overall "text chatbot". There's categories, see https://arena.ai/leaderboard/

For me, the only thing that GPT currently does well is deep research, as good as Grok, better than Claude and Gemini (though funnily, arena disagrees with me.)

But since Grok responses are much less annoying than GPT's emoji flood, I use the former.

![](https://m.stacker.news/129523)

What tasks are they being scored on? I've found ChatGPT to be a bit more useful for general purpose queries while Claude is better at technical stuff

SimpleStacker

 is ranked 9 and beaten by every direct competitor. The new versions don't even show in top 20. 

 is ranked at 21, below open models like Kimi and GLM

As Rocks May Think - Eric Jang

Scoresby

Here's the current ELO:

![](https://m.stacker.news/129520)

GPT from ***2 versions ago*** is ranked 9 and beaten by every direct competitor. The new versions don't even show in top 20. `gpt-5.2-high` is ranked at 21, below open models like Kimi and GLM