After 36 hours (without consistent sleep - I am way too old for that but Red Bull GmbH is really pleased with me once more) of intense Claude 4.5 Sonnet usage, and being delighted to see that it improved a lot vs 4.0; I have findings to share.
TLDR; if you don't know what you're doing, you're still gonna suck at coding, Claude won't fix that.
Let me start by stating that it feels to me that Anthropic has definitely raised the bar with this one. ChatGPT, Grok, Mistral can't beat this right now. Neither can Qwen3-coder, GLM 4.6, or InternVL 3.5. So I must admit: good job Anthropic!
Most important finding: Claude, despite the sales pitch, is definitely not an expert coder.
Claude still fails with things that an expert coder does all day every day, like code that is heavy on concurrency or uses binary protocols - basically anything that isn't
json
- even when I'm doing this in python
. I haven't tried having it to do systems programming because of this - if it cannot do complex operations in python then I won't give it rust
or c++
. Step-by-step, because previous attempts have taught me to not throw it off the deep end; it just doesn't work like that and you need to continuously tune the instructions based on what you're observing. Ultimately, I may be able to make a good instruction for it to code c++
but this would take me quite some time, as it's not science (nor engineering) but simple trial & error.However, once I explain it what it does wrong and it does the annoying thing1, it does fix problems it created. This is why, I'm sorry, noobs can't code with Claude either. You have to understand what you're doing.
It also still has attention issues so you'll see it create tons of bugs, struggle to understand complex project structure, forget that you cannot just change method signatures without checking your implementation, and so on. There are definitely gold mines to be had in selling tokens and selling CI infra, because it's not that great. The more repetitive I make it, the better it performs though (and the more tokens it burns.)
Here's the big thing: it is really good at making cli tools using bash, python or nodejs; like exceptionally good. And then it can use the tools it made seamlessly (if you use Claude Code) and boost its own productivity by improving the tools, as long as you remember to nudge it about assessing things that can be improved in the toolset.
This pattern is actually interesting: let the LLM code the toolset, let it use it, improve it, and gain massive productivity. This is now on my list to test with future open-weight coding models. Or even let Claude code tools and let the open model use the tools.
I let it build a cli tool around
git-bug
and with some instruction tuning it now uses that with its home-grown tool bugctl
(I had it do two rounds of making up names to come up with that one), logs things nicely without being dependent on GitHub or other platforms.It's still a bit eager to close issue reports and jumps the gun very often, like your intern would, which I why I have deny-listed
git add/commit/pull/push
. I just review everything with git add -i
.