pull down to refresh

How does a reasoning model assess its level of certainty?

If it's unable to assess it, how long it reasons is just a setting of the system?

If it is able to, LLM makers have obvious incentive to get it to hide it as long as possible, so this study would be proof of their malicious action?

Don't know enough about it so I'm genuinely asking.

I think their point is that model providers don’t do a good job of assessing certainty, and can use the probing technique they developed to stop reasoning earlier without losing accuracy.

It’s not necessarily malicious afaict, especially for the open source models they tested. It’s just an oversight or deficiency.

reply