reply on: Something Big Is Happening \ stacker news

pull down to refresh

274 sats \ 2 replies \ @optimism 11 Feb \ on: Something Big Is Happening AI

I describe what I want built, in plain English, and it just... appears. Not a rough draft I need to fix. The finished thing.

So, this is simply not true. I'll give an example of something (out of many things) Claude 4.6 Opus did this weekend:

I describe (to Claude 4.6 Opus) that I need a REST server code (that is mostly written by Claude 4.6 Opus) to not be changed but annotated with OpenAI/Swagger spec so that I can publish the API. It runs 100% sandboxed and it doesn't have interactive mode to ask questions because it is part of a pipeline. What it did was fake it. Yes, it worked, but it didn't actually use the annotations it wrote to generate docs (this is why we annotate) but instead it hardcoded the file.

So I had to correct it (commits in reverse order):

This is a very simple request. The reason why all the botbois are now obsolete is because they don't review. And that is the extremely worrying part: they just trust the bot, whereas real developers that are not botbois, would not just trust their coworker. They would help the co-worker get better results, just like I had to help Claude get better results. I can do that because I've been doing this for 40+ years.

These new AI models aren't incremental improvements. This is a different thing entirely.

Because now the bot no longer needs a boi. Maybe, one day I will be obsolete too, but not as long as there's people like this building and training LLMs. You see... you can do what this guy does. You'll get executable code. You'll need to burn a couple billion tokens if you want to make a change, so it is cheaper to just throw it away, but you can do this. Right now. And you could do this last year with Claude 4.1 too. It's actually an incremental improvement.

If you tried ChatGPT in 2023 or early 2024 and thought "this makes stuff up" or "this isn't that impressive", you were right. Those early versions were genuinely limited. They hallucinated. They confidently said things that were nonsense.

My example above is from 2 days ago, try again.

"GPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations."

AKA, no one really read the code. I'll have a post soon to show you why Anthropic (that still uses HitL, but yes, heavily uses Claude in all development) on a simple bench I ran cross model to build a web app for the above OpenAPI interface. I'll link here when I post it.

These apps often default to a faster, dumber model.

Also not true. You just get Opus from your $20 Claude Pro subscription. And Kimi K2.5 beats GPT and Gemini 3 for web apps almost every time.

If you're a lawyer, feed it a contract and ask it to find every clause that could hurt your client.

Don't do that. If you're a lawyer, talk to people that help you keep your confidentiality. Under no circumstance share your data with a third party.

This might be the most important year of your career. Work accordingly. I don't say that to stress you out. I say it because right now, there is a brief window where most people at most companies are still ignoring this. The person who walks into a meeting and says "I used AI to do this analysis in an hour instead of three days" is going to be the most valuable person in the room.

The bot will not lose its job when the result it gave to you is bad, you will. Use the bot to be faster, better, but don't be a lazy s.o.b.; work. hard.