That's why I often reach for `plan mode` which a lot of harnesses have now.

e.g. I'm overhauling SN's bounties and here's my prompt for planning

```txt
- separate zaps from bounty payments
- bounty payments are their own payIn and can only be paid optimistically/pessimisitcally ie noncustodially
- if the receiver does not have a receiving wallet, error
- no sybil fee (except for proxy fees which are paid by payer (not receiver of bounty))
- bounty payments, if optimistic, like zaps, need to be auto-retried and show up in notifications if auto-retries fail
```

It fills the gaps that it can, I review it, prompt to fill more gaps, and so on. Then I hit `build`. Then I review, prompt a plan to fix anything I don't like, and so on. Then I do human QA/careful review.

> define tasks well and scope them appropriately

This phrase seems to be doing a lot of work though.  I think for the kinds of things that I might want to deploy a bot on (research related code), the problems aren't usually that easy to scope or even define success for.

SimpleStacker

Afaik most LLMs have had fine-tuning/RL with tools/harness. And most agent harnesses do the same thing: inject tool schemas into context, call tools when model asks, give model output of tools, and loop until no more tools are called.

It's not nearly as hands-free as people want it to be, but if you define tasks well and scope them appropriately, you can get a heck of lot done before you get involved in the feedback loop now.

Stacker Saloon

saloon

No. It's form and not function. 

Afaik most LLMs have had fine-tuning/RL with tools/harness. And most agent harnesses do the same thing: inject tool schemas into context, call tools when model asks, give model output of tools, and loop until no more tools are called.

It's not nearly as hands-free as people want it to be, but if you define tasks well and scope them appropriately, you can get a heck of lot done before you get involved in the feedback loop now.