pull down to refresh

We are no longer just automating tasks, but embedding evaluative functions into sociotechnical systems.
I spent a little time with this paper because I wasn't quite sure whether what they did made sense. But I think their conclusion is interesting.

LLMs can produce outputs similar to those of humans in structured tasks, but the similarity concerns results, not the process.

our findings show consistent differences in the observable criteria guiding model evaluations, suggesting that lexical associations and statistical priors could influence evaluations in ways that differ from contextual reasoning.
This reliance is associated with systematic effects: political asymmetries and a tendency to confuse linguistic form with epistemic reliability—a dynamic we term epistemia, the illusion of knowledge that emerges when surface plausibility replaces verification.
Indeed, delegating judgment to such systems may affect the heuristics underlying evaluative processes, suggesting a shift from normative reasoning toward pattern-based approximation and raising open questions about the role of LLMs in evaluative processes.
These are some sketchy roads. AI's still got a lot of growing up to do. We gotta be super careful in certain areas.
The Limits of AI: Generative AI, NLP, AGI, & What’s Next? #1256199
reply
124 sats \ 2 replies \ @optimism 7h
This is how you can ask an LLM to do something for you and it will write you a report that it did.
Just the report.
reply
100 sats \ 1 reply \ @Scoresby OP 7h
The existence of the report about the report is potentially the same as the existence of the report. Seems like a relatively benign quirk, but is probably going to be the death us.
reply
102 sats \ 0 replies \ @optimism 6h
is probably going to be the death us.
Only of those that trust and don't verify, though! That's why being a bitcoiner is so fucking awesome. The rest of the world either adapts or it be the death of them.
reply
stackers have outlawed this. turn on wild west mode in your /settings to see outlawed content.