pull down to refresh
0 sats \ 0 replies \ @Scoresby 6h \ on: Anthropic Researchers Run Into Trouble When New Model Realizes It's Being Tested AI
Agents like this always and only are " playing along." That's what they do.
It's not "refusing," it's predicting that refusal is the most likely thing the assistant described in its system prompt would to do based on parameters of the model that runs it.