Why Not?
SECURITYSECURITY
- Claw got cute and wrote a python script
- I didn't review said python script (it also didn't ask me to review it)
- Unreviewed scripts, especially when some bot decided to code it, are a liability
- I run this bot in a distroless docker image + chroot - and no, python is not installed, of course
- The bot's first suggestion was for me to give it python
- Nice try, clanker. REQUEST DENIED
Did you install your claw with access to python or nodejs or perl, fren? Have you reviewed the scripts it built?
Are you in the know? Or are you just yolo-ing away your life so that even your morning mumblings to your bot can get forever stored in some Palantir database?
First take-away from my botched experiment with it was to just DIY something purpose built
Stumbled across this yesterday and it does a good job articulating why
So I have a whole bespoke coding system that is fit for purpose and nothing more (and doesn't have any claw stuff). All this was just caused by me saying "yo, do self-improvement" by (a) reading all traces (which claude, in my real workflow, bespoke developed) and (b) either tune your prompt files or put shit on a wishlist to inform me of daily.
After some investigation I found that it developed the python script because it only has
read_file(), notcat, so it can only read one trace at a time and it thought that it would be more efficient to just analyze multiple files at a time. which is already cute thinking that should be guarded against, because it deteriorates the goal. But I assign that one to me being too lazy to write a 200 line instruction; i didn't write enough "IMPORTANT: NEVER WRITE CODE" in there. It did create a wishlist, and it did do some prompt tuning.FWIW, the point in the video about token overhead is real. All I do is let it analyze traces and the only thing I used was some little "tell me the weather" skill I built for it, to have something to trace. Yet today alone it used some 6.5M tokens. That'd be over $20 on Sonnet, what everyone recommends using, 3/4 in (because UTC). 20 bucks a day for nothing is awful.
However, the self-improvement process where I HitL between it and coding tasks on my main pipeline and the deployment does work well. So that part is great and I think it only speaks more for what you're saying, and the vid too: just build your own agent. Don't even try to reuse any of the sloppy products out there. From scratch, design event/comms channels, an event loop (I've literally started out with running
while true; do ... sleep 5; doneon my main pipeline) and a way to do prompt templating. And improve it.I somehow spent $6 on gemini flash just fucking with it for a few hours that day... madness
Thank youuuu