pull down to refresh
I might do a write up of my process. It was an iterative loop along these lines:
- Start a new session (no memory)
- Upload current version of model and (incomplete) proof
- Ask it to complete the proof
- Evaluate the AI's proof completion
- Poke the AI on parts that I found incorrect or dissatisfying
- Iterate within the chat on why that part was hard or dissatisfying
- If simply a mistake by the AI, fix it
- If model setup is genuinely problematic, revise assumptions
- Update the model with new assumptions as appropriate
- Go to step 1
reply
For step 5, is there a way to make it come to that conclusion?
Not sure if it's 100% comparable but maybe there's something close to it: if I find a thing in code, instead of "arguing" or "pressing" I just say: write a test around <xyz> in case there are any issues and then it 90% of the time finds its own error.
reply
Do you do that in the same session?
So what I find is that if I say
review <xyz> for logic errors and omissionsin a new session (I work git-based with all models though, so there is some indirection here too that may be helpful in triggering a different pattern through the layers) I can be sure that Claude finds a lot of stuff it missed the first run. (Hate the apologies tho, wtf Anthropic).However, I do admit that second-model review works better. I think k00b was experiencing the same by mixing GPT and Claude for reviews.