pull down to refresh

“Throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions.
“I’ve seen this within myself, within the organization, where we constantly face pressures to set aside what matters most, and throughout broader society, too.”
humanity is approaching a threshold where “our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences.”
224 sats \ 4 replies \ @optimism 3h

I fear that the report about the xAI pressure cooker (#1438462) and a bunch of months ago, the report about the gpt-codex pressure cooker (#1040555) illustrate very much what every frontier model company does right now.

They have weeks to ship because of the competition. Remember, grok-4.1 was king of the world for exactly 1 day, then Google launched Gemini-3, and the spotlight was over for Grok. It's all going fast. Too fast, I think, and not because I'm worried about sAfEtY, but because I'm worried about what this does for quality.

Delivering broken products may have been the norm in Silicon Valley for a while now, but that's also why there was space for multiple 100x. If you onboard your entire future customer base on version 0.1, that's not going to happen. Because I believe that we should read GPT 5.3 as version 0.5.3, Grok 4.1 as 0.4.1, and Gemini 3 as 0.3 - these aren't mature products in any way, they're not even "alpha" worthy - it's all experimental and they regress across minor versions like crazy.

AI companies need to chill a little, but consumers, Wall Street, and all they hype-pushers need to chill a little too. Maybe #1440600 is a good perspective to at least keep in mind. To illustrate:

Everyone catches up, everyone shifts resources, everyone ships the update. The net competitive position of every company is exactly where it was before, except they all [spent] engineering time on esoteria and none of them made their end-product meaningfully better for end-users.
reply

It's another symptom of the overly financialized fiat economy: #1440612

I'm not even convinced they actually have good metrics for what constitutes a model improvement. But they are forced by this prisoner's dilemma to just target whatever metric the capital allocators use.

reply
15 sats \ 2 replies \ @optimism 2h

They don't have good metrics for a model improvement because they just train towards improving on the benchmarks - it's what you yourself often bring up: Goodhart's Law.

That's why GPT-5.2 is sublime in the benchmarks but sucks in human ELO ranking.. I was looking out for it to drop on arena for days after launch only to realize it was ranked #25 and I just didn't scroll down far enough. Same happened to Llama 4, which was (allegedly) even doctored for the bench - Meta is the VW of software now.

So yes, it's all about the money, not so much about delivering world class autocorrect.

reply

I'm sure better metrics exist. They should have enough users to A/B test user satisfaction and feedback, for example. But the metrics don't matter if that's not what the investors are looking at. Which brings me to the whole financialization thing: the real audience is the investor, not the user.

reply
82 sats \ 0 replies \ @optimism 2h
I'm sure better metrics exist.

For us, yes, things like arena.ai (I actively use that for 1-shot) allow you to not only see the leaderboard, but to judge for yourself, even though its ELO biases towards normie prompts. I wish they had prompter tiers (or maybe they do and I'm in retard tier haha)

They should have enough users to A/B test user satisfaction

Too paranoid! The competition is on their platforms: #1064935

financialization, the real audience is the investor

Not contending that. Aren't we just seeing the same old playbook, except with 4 contenders still standing and at 10x speed? The only crack I've seen is Sequoia jumping on the Anthropic bandwagon (#1415041), which I think is the worst signal Sam Altman must have gotten in his entire life, because they never jumped before.

reply
216 sats \ 1 reply \ @grayruby 3h

Uh oh. I hope he was being hyperbolic.

reply

No kidding. If there's one thing I'm confident in it's that our wisdom won't be growing very much.

reply
110 sats \ 1 reply \ @optimism 4h

Also see: #1436213

reply

And this: #1436222

reply
15 sats \ 0 replies \ @Solomonsatoshi 3h -118 sats

“Throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions."

Like not attaching LN wallets to show our support for the LN while constantly virtue signalling about 'living on The Bitcoin Standard' ?

"Humanity is approaching a threshold where “our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences.”

I still struggle to see the threat AI can present, but then I don't have the knowledge this guy and others raising similar concerns have.

Humans seem to be inclined to a number of weaknesses essentially based in selfishness and greed manifested in hypocrisy, which religion and morality have sought to contain, but that technology does often amplify.

Will we end up ruled by AI/algorithms?