We are no longer just automating tasks, but embedding evaluative functions into sociotechnical systems.
I spent a little time with this paper because I wasn't quite sure whether what they did made sense. But I think their conclusion is interesting.
LLMs can produce outputs similar to those of humans in structured tasks, but the similarity concerns results, not the process.
our findings show consistent differences in the observable criteria guiding model evaluations, suggesting that lexical associations and statistical priors could influence evaluations in ways that differ from contextual reasoning.
This reliance is associated with systematic effects: political asymmetries and a tendency to confuse linguistic form with epistemic reliability—a dynamic we term epistemia, the illusion of knowledge that emerges when surface plausibility replaces verification.
Indeed, delegating judgment to such systems may affect the heuristics underlying evaluative processes, suggesting a shift from normative reasoning toward pattern-based approximation and raising open questions about the role of LLMs in evaluative processes.