pull down to refresh

I've been trying to figure out why, but it's usually from too many layers/abstractions. I'm not sure what objective they're trained on, but it's in conflict with readability.

producing output only consumable by other agents, maybe? we’re being replaced

reply
66 sats \ 0 replies \ @k00b OP 12h

I'd guess their success criteria isn't very sophisticated yet and is mostly "did it output something that gets the job done?"

I should probably go browse with SWE benchmarks they all use. I'd guess that tracks SOTA success criteria pretty well.

reply