reply on: Meta's $70m efforts lobbying for age-verification laws \ stacker news

pull down to refresh

1 sat \ 2 replies \ @optimism 14 Mar \ parent \ on: Meta's $70m efforts lobbying for age-verification laws Politics_And_Law

Do you do that in the same session?

So what I find is that if I say review <xyz> for logic errors and omissions in a new session (I work git-based with all models though, so there is some indirection here too that may be helpful in triggering a different pattern through the layers) I can be sure that Claude finds a lot of stuff it missed the first run. (Hate the apologies tho, wtf Anthropic).

However, I do admit that second-model review works better. I think k00b was experiencing the same by mixing GPT and Claude for reviews.

106 sats \ 1 reply \ @SimpleStacker 14 Mar

I might do a write up of my process. It was an iterative loop along these lines:

Start a new session (no memory)
Upload current version of model and (incomplete) proof
Ask it to complete the proof
Evaluate the AI's proof completion
Poke the AI on parts that I found incorrect or dissatisfying
Iterate within the chat on why that part was hard or dissatisfying
- If simply a mistake by the AI, fix it
- If model setup is genuinely problematic, revise assumptions
Update the model with new assumptions as appropriate
Go to step 1

6 sats \ 0 replies \ @optimism 14 Mar

For step 5, is there a way to make it come to that conclusion?

Not sure if it's 100% comparable but maybe there's something close to it: if I find a thing in code, instead of "arguing" or "pressing" I just say: write a test around <xyz> in case there are any issues and then it 90% of the time finds its own error.