pull down to refresh

Does this actually do anything... good?

I have not really dipped my toe into AI agents. I've not had great results with "letting AI do its thing"... for me to have good use of AI I feel like I have to be quite involved in the feedback loop - so much so that it's often faster to do stuff myself.

168 sats \ 1 reply \ @optimism 1h
I've not had great results with "letting AI do its thing"

If I properly review the outputs of code / research / plans, it means I do ∞x because I let the bot do stuff that would never get to the top of my todo. So I just queue it up and then spend time on review, queue up more. I could automate that too, except as discussed above, the bots have high error rate, so I don't, or I end up pwnd like Palantir/OpenAI/USG.

Made a little script that uses up all tokens in my claude plan (and reports on it after every task) and then sleeps until the plan resets. This week I'll have about 5% unused because I was too busy to queue up work Mon/Tue. It works on 15 projects concurrently for me right now. I can reprioritize next task at any time; basically it runs the equivalent of a mid-size agile software shop for me, but with a dictator-in-chief, me.

Anyway, the great thing about queueing up work is that I just review a couple of times per day, mostly keeping focus on a single project until I went through everything and queued new work. Then I go get a coffee, have a smoke, and do the next. Or do some actual work.

reply

Interesting. I'd love to know more about your setup.

reply
112 sats \ 2 replies \ @k00b 1h

No. It's form and not function.

Afaik most LLMs have had fine-tuning/RL with tools/harness. And most agent harnesses do the same thing: inject tool schemas into context, call tools when model asks, give model output of tools, and loop until no more tools are called.

It's not nearly as hands-free as people want it to be, but if you define tasks well and scope them appropriately, you can get a heck of lot done before you get involved in the feedback loop now.

reply
define tasks well and scope them appropriately

This phrase seems to be doing a lot of work though. I think for the kinds of things that I might want to deploy a bot on (research related code), the problems aren't usually that easy to scope or even define success for.

reply
1 sat \ 0 replies \ @k00b 7m

That's why I often reach for plan mode which a lot of harnesses have now.

e.g. I'm overhauling SN's bounties and here's my prompt for planning

- separate zaps from bounty payments
- bounty payments are their own payIn and can only be paid optimistically/pessimisitcally ie noncustodially
- if the receiver does not have a receiving wallet, error
- no sybil fee (except for proxy fees which are paid by payer (not receiver of bounty))
- bounty payments, if optimistic, like zaps, need to be auto-retried and show up in notifications if auto-retries fail

It fills the gaps that it can, I review it, prompt to fill more gaps, and so on. Then I hit build. Then I review, prompt a plan to fix anything I don't like, and so on. Then I do human QA/careful review.

reply