pull down to refresh

Deception is a regular feature of human interaction with artificial intelligence. This kind of lie is consequence-free. Or is it? These models are intended to learn what behaviour will trigger penalties and stop doing them. What if they in fact learn "strategic concealment" and how to superficially agree with certain values, while still holding harmful ones? Perhaps it's worth being polite to the chatbots after all
this territory is moderated
Fascinating!
“Establishing trust in this environment requires something more substantial than our solemn word that THIS time, unlike the previous million times in the training data, we're TOTALLY going to honor our commitment. Game theory offers insight here: for cooperation to emerge in environments swimming with deception, there needs to be some reliable signal that separates honest actors from the hordes of promise-breakers. "Cheap talk"—promises that cost nothing to make and nothing to break—fails this requirement.
What works instead is costly signaling—actions that would be irrational for dishonest actors to mimic. When someone actually bleeds a little to demonstrate trustworthiness, they create what game theorists call a "separating equilibrium"—a signal that only those who genuinely intend to cooperate would find worth making. A classic example is companies offering money-back guarantees that would be financially devastating if their products were actually low quality—only firms confident in their products can afford such policies.”
reply