I think their point is that model providers don’t do a good job of assessing certainty, and can use the probing technique they developed to stop reasoning earlier without losing accuracy.
It’s not necessarily malicious afaict, especially for the open source models they tested. It’s just an oversight or deficiency.
I think their point is that model providers don’t do a good job of assessing certainty, and can use the probing technique they developed to stop reasoning earlier without losing accuracy.
It’s not necessarily malicious afaict, especially for the open source models they tested. It’s just an oversight or deficiency.