SECURITYSECURITY

Claw got cute and wrote a python script
I didn't review said python script (it also didn't ask me to review it)
Unreviewed scripts, especially when some bot decided to code it, are a liability
I run this bot in a distroless docker image + chroot - and no, python is not installed, of course
The bot's first suggestion was for me to give it python
Nice try, clanker. REQUEST DENIED

Did you install your claw with access to python or nodejs or perl, fren? Have you reviewed the scripts it built?

Are you in the know? Or are you just yolo-ing away your life so that even your morning mumblings to your bot can get forever stored in some Palantir database?

view all related items

259 sats \ 2 replies \ @justin_shocknet 5h

First take-away from my botched experiment with it was to just DIY something purpose built

Stumbled across this yesterday and it does a good job articulating why

159 sats \ 1 reply \ @optimism OP 4h

So I have a whole bespoke coding system that is fit for purpose and nothing more (and doesn't have any claw stuff). All this was just caused by me saying "yo, do self-improvement" by (a) reading all traces (which claude, in my real workflow, bespoke developed) and (b) either tune your prompt files or put shit on a wishlist to inform me of daily.

After some investigation I found that it developed the python script because it only has read_file(), not cat, so it can only read one trace at a time and it thought that it would be more efficient to just analyze multiple files at a time. which is already cute thinking that should be guarded against, because it deteriorates the goal. But I assign that one to me being too lazy to write a 200 line instruction; i didn't write enough "IMPORTANT: NEVER WRITE CODE" in there. It did create a wishlist, and it did do some prompt tuning.

FWIW, the point in the video about token overhead is real. All I do is let it analyze traces and the only thing I used was some little "tell me the weather" skill I built for it, to have something to trace. Yet today alone it used some 6.5M tokens. That'd be over $20 on Sonnet, what everyone recommends using, 3/4 in (because UTC). 20 bucks a day for nothing is awful.

However, the self-improvement process where I HitL between it and coding tasks on my main pipeline and the deployment does work well. So that part is great and I think it only speaks more for what you're saying, and the vid too: just build your own agent. Don't even try to reuse any of the sloppy products out there. From scratch, design event/comms channels, an event loop (I've literally started out with running while true; do ... sleep 5; done on my main pipeline) and a way to do prompt templating. And improve it.

115 sats \ 0 replies \ @justin_shocknet 3h

I somehow spent $6 on gemini flash just fucking with it for a few hours that day... madness

115 sats \ 0 replies \ @plebpoet 3h

Thank youuuu