You’re still building the infrastructure of things like plan files (That’s fancy talk for ‘todo lists’), skills, and rules. The machine works very poorly without being given a framework.
So pure vibe coding is a myth. But they’re still trying to do it, and this leads to some very ridiculous outcomes. For example, a human actually looked and saw a lot of duplication between them. Now, you might ask: why didn’t any of the developers just go look for themselves? Again, it’s vibe coding. Looking under the hood is cheating.
In this particular case, a human could have told the machine: “There’s a lot of things that are both agents and tools. Let’s go through and make a list of all of them, look at some examples, and I’ll tell you which should be agents and which should be tools. We’ll have a discussion and figure out the general guidelines. Then we’ll audit the entire set, figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
As someone that's spent his last 16 working hours fighting bots to make client side wallet vaults less awkward and bug prone, I can sympathize with folks skipping that step. If it weren't important to me that humans can understand the code for themselves, I'd be tempted to let the slop win too.
IMO the real problem is that LLMs are generative and great code is compressive. Meat agents bias toward generation too, because compressing something well requires understanding it very well, but clankers are much worse. Meat agents need to compress code as they go because their token throughput and context windows are relatively limited.
I'm sure LLMs will fix this bias at some point. In the meantime, vibemaxx'd codebases are incompatible with human oversight. That might not matter to you, but we can't pretend this paradigm is strictly better either.
I haven't prompted much with it yet, but I've added the following
ruleto cursor hoping it might fight the generation bias:1. Prefer deletion to addition. 2. Do not introduce new abstractions unless at least 3 concrete call sites clearly need them. 3. Inline one-off helpers. 4. Reduce files, layers, and indirection. 5. Optimize for minimum code surface area that preserves clarity. 6. Show the simplest working version first.5..1.and4.are contextual / specific to job-at-hand and could objective-poison generic functioning and degrade.2.and3.are subjective, but valid.6.I would personally not do because rework is expensive. I'd rather "1-shot" (after planning / analysis / exploration) an implementation and throw it away than have a bot rework stuff. Arguably, latest releases did get better at rework, but I'm still feeling as if it is more costly.I agree with your assessment of the rules. They are my personal generic rules absent bots - context is usually something relatively frivolous where human readability takes precedence over nearly everything. I find rework from a checkpoint created by these rules easier than going in the other direction.
I guess rework from checkpoint also means redo, rather than rework. (The checkpoints were broken when combined with having full control cmdline git in the early days, so I hated that feature most of all)
I don't run into that problem with GPT 5.4. It's subjective but I also prefer larger functions all else being equal hence
inline one-off helpers. It's easy to go overboard with function shortening IME.I feel like LLMs can already do the compression step, you just have to tell them to. I think asking LLMs to clean up my code is actually a task it's pretty well suited for.
The reason it doesn't do this by default (vs humans) is that humans can maintain longer context and read between the lines of the specific task instructions, whereas LLMs take your task instructions quite literally and do not consider wider context than that
IME they struggle with the compression step a lot more than they do generation.
It took me 1 hour of prompting to generate the code I've spent 16 hours compressing.
another thought is whether compression is also harder than generation for humans as well
I think it probably is
It definitely is. I call that out in the post. But bots don't need compression to make progress as much as humans do.
I wonder how much their performance depends on the need to reason about state vs reasoning about the code itself?
Statelessness is easier for humans and bots alike, but I do think the generation bias is legit independent of the context.
Anytime I’ve prompted “make this clearer” or “clean this up,” they tend to increase lines of code. Even “reduce lines of code” results in, at best, negligible reductions. I have to point out excessive abstraction and overengineering repeatedly.
Meat agents. I like that.
Holy hell though, vibe coded PRs are always massive and way too much to review as a meat agent. But moar code is better, right?
I've been trying to figure out why, but it's usually from too many layers/abstractions. I'm not sure what objective they're trained on, but it's in conflict with readability.
producing output only consumable by other agents, maybe? we’re being replaced
I'd guess their success criteria isn't very sophisticated yet and is mostly "did it output something that gets the job done?"
I should probably go browse with SWE benchmarks they all use. I'd guess that tracks SOTA success criteria pretty well.
Yeah man. That's very true
I'm very much torn on the vibe coding issues, I think I'm so torn because "vibe coded throw away code" is actually fine for some things, but horrible for others.
SN would devolve into a total mess trying to vibe code it, but thats because building each aspect of it requires understand lots of intent not of just the individual elements, but you need big picture intent as well (which LLMs generally are not good at).
However for other things, like throwaway GUIs (ie. you need a CRUD form to manage some data), it really doesn't matter if humans completely understand the code or not.
A huge huge amount of web programming is already in the later camp. What web developers actually understand what React+Tailwind are doing? Probably the majority are just copying and pasting boilerplate they find on stackoverflow until "it works".
people use stack overflow?
I thought the same thing. Ha.
last? I'm rooting forlatest🤣I truly hope to one day truly be able to just finetune a working coding agent and make it code exactly the way I want it to.
lol
k00b nailed the core issue here. Generation vs compression isn't just a practical problem, it's a theoretical one.
Finding the minimal representation of a program is actually provably uncomputable. That's Kolmogorov complexity -- you literally cannot write an algorithm that always finds the shortest program producing a given output. So LLMs aren't just bad at compression, they're fighting a problem that's impossible to solve perfectly.
What's wild is that compression and proof of work share the same asymmetry: hard to produce, trivial to verify. You can instantly tell if code is shorter than what you had before, but finding that shorter version takes real work. Same structure as finding a valid hash.
Bram Cohen seeing this so clearly makes total sense. BitTorrent's protocol spec fit on like two pages. The man spent his career making things smaller. His compression instinct is exactly why vibe coding bugs him -- he can feel the entropy bloat that most people can't see.