pull down to refresh

This is an admittedly boosterish take on the state of AI and where it is going...but I find I'm getting more sympathetic to these kinds of takes. Not that I'm some kind of advanced openclaw user with agents doing this or that for me. But I think I'm beginning to see what the AI people are talking about.

The whole post is kinda long and technical, but I was able to grasp a little of it, I think. My takeaway is that yes, these things are very useful. And they're going to get more useful.

Here's a few interesting points:

Based on my own usage patterns, it's beginning to dawn on me how much inference compute we will need in the coming years. I don't think people have begun to fathom how much we will need. Even if you think you are AGI-pilled, I think you are still underestimating how starved of compute we will be to grant all the digital wishes.
  • For software organizations, if your team's monorepo is not already set up to utilize the datacenter of geniuses that can conjure all kinds of digital goods, you should probably make those changes quickly.
  • For researchers: automated research is the new meta. People who can direct teams of agents at goals and know how to judge what to focus on in a full-stack scope will experience an exhilarating level of productivity that makes making software a joy again.
  • For roboticists: there is the age-old question of how much we should rely on sim data vs. real data. Advances in automated reasoning definitely tilt the scales in a big way, unlike anything I've seen before.
All players take note: consider playing differently.
your team's monorepo is not already set up to utilize the datacenter of geniuses

Contradictio in terminis.

reply

is this just because he's using monorepo wrong? (I understand the contradiction to be that it's necessarily not a monorepo if you're relying on agents?)

reply
102 sats \ 3 replies \ @optimism 6h

It's because monorepos lead to regressions, so if you run on a monorepo, you're going to be a step behind your competitor that can actually manage software development like a pro, and you'll keep losing all the time.

Don't believe me on this. Just ask OpenAI (confirmed monorepo: #1040555) how successful they are in regression management (or just check their ELO on arena)

Therefore, if you run a monorepo, you're per definition not ready.

reply

Here's the current ELO:

GPT from 2 versions ago is ranked 9 and beaten by every direct competitor. The new versions don't even show in top 20. gpt-5.2-high is ranked at 21, below open models like Kimi and GLM

reply

What tasks are they being scored on? I've found ChatGPT to be a bit more useful for general purpose queries while Claude is better at technical stuff

reply

This is overall "text chatbot". There's categories, see https://arena.ai/leaderboard/

For me, the only thing that GPT currently does well is deep research, as good as Grok, better than Claude and Gemini (though funnily, arena disagrees with me.)

But since Grok responses are much less annoying than GPT's emoji flood, I use the former.

reply