items/1459801/related \ stacker news

pull down to refresh

How the novel nature of LLMs confounds our ability to evaluate them parsingphase.dev/tech/LLMs/psychologicalFactors.html

478 sats \ 0 comments \ @co574 24 Mar AI

related

The simulation of judgment in LLMs - PNAS www.pnas.org/doi/10.1073/pnas.2518443122

244 sats \ 5 comments \ @Scoresby 15 Oct 2025 AI

Graham King - Evaluating LLMs for my personal use case darkcoding.net/software/personal-ai-evals-aug-2025/

278 sats \ 1 comment \ @carter 25 Aug 2025 AI

LLM evaluation at scale with the NeurIPS Efficiency Challenge blog.mozilla.ai/exploring-llm-evaluation-at-scale-with-the-neurips-large-language-model-efficiency-challenge/

210 sats \ 0 comments \ @localhost 22 Feb 2024 tech

More Artificial than Intelligent, it is only getting worse - Mathjis Lagerberg mlagerberg.com/much-a-little-i-and-it-is-not-getting-better/

247 sats \ 4 comments \ @Scoresby 15 Jul 2025 AI

Elites, the curse of recursion, and the half-life of policy

5779 sats \ 11 comments \ @elvismercury 29 Mar 2024 mostly_harmless

Are You Getting Dumber?

1986 sats \ 29 comments \ @kr 6 Jun 2025 AskSN

Context Rot: How Increasing Input Tokens Impacts LLM Performance research.trychroma.com/context-rot

334 sats \ 2 comments \ @Scoresby 14 Jul 2025 AI

Hallucination Stations On Some Basic Limitations of Transformer-Based LM arxiv.org/pdf/2507.07505

213 sats \ 0 comments \ @0xbitcoiner 23 Jan AI

2025 LLM Year in Review - karpathy karpathy.bearblog.dev/year-in-review-2025/

1652 sats \ 3 comments \ @Scoresby 21 Dec 2025 AI

"History is only useful to the extent that it can predict the future": Rebuttal

2674 sats \ 8 comments \ @frostdragon 24 Mar 2024 FiresidePhilosophy

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

Writing is thinking - the value of human scientific writing in the age of LLMs www.nature.com/articles/s44222-025-00323-4

555 sats \ 1 comment \ @k00b 24 Jul 2025 science

Defining and evaluating political bias in LLMs openai.com/index/defining-and-evaluating-political-bias-in-llms/

387 sats \ 2 comments \ @0xbitcoiner 14 Oct 2025 AI

Devs: LLMs are not about to take your jobs

729 sats \ 17 comments \ @halleck 17 May 2024 devs

How do you use LLMs?

901 sats \ 8 comments \ @gmd 21 Mar 2025 AI

Local LLMs are how nerds now justify a big computer they don't need world.hey.com/dhh/local-llms-are-how-nerds-now-justify-a-big-computer-they-don-t-need-af2fcb7b

948 sats \ 9 comments \ @k00b 25 Nov 2025 AI

Why do people find it so exciting when LLMs say outrageous things?substack.com/home/post/p-167898567

548 sats \ 13 comments \ @Scoresby 10 Jul 2025 AI

Andrej Karpathy: How I use LLMs www.youtube.com/watch?v=EWvNQjAaOHw

1278 sats \ 1 comment \ @k00b 28 Feb 2025 AI

Political censorship in large language models originating from China academic.oup.com/pnasnexus/article/5/2/pgag013/8487339

251 sats \ 1 comment \ @0xbitcoiner 27 Feb AI

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection arxiv.org/abs/2510.04849v1

433 sats \ 2 comments \ @optimism 19 Oct 2025 AI

Pleb Economist #9: Comparative Advantage, AI, and You

12.5k sats \ 22 comments \ @SimpleStacker 19 Jan AI econ