items/1265187/related \ stacker news

pull down to refresh

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

related

Hallucination Stations On Some Basic Limitations of Transformer-Based LM arxiv.org/pdf/2507.07505

213 sats \ 0 comments \ @0xbitcoiner 23 Jan AI

To Make Language Models Work Better, Researchers Sidestep Language www.quantamagazine.org/to-make-language-models-work-better-researchers-sidestep-language-20250414/

210 sats \ 0 comments \ @0xbitcoiner 15 Apr 2025 AI

Large Language Models Pass the Turing Test arxiv.org/pdf/2503.23674

374 sats \ 11 comments \ @south_korea_ln 15 Apr 2025 AI

The AI Revolution in Math Has Arrived www.quantamagazine.org/the-ai-revolution-in-math-has-arrived-20260413/

355 sats \ 1 comment \ @0xbitcoiner 13 Apr math AI

Mathematicians issue a major challenge to AI—show us your work www.scientificamerican.com/article/mathematicians-launch-first-proof-a-first-of-its-kind-math-exam-for-ai/

1145 sats \ 4 comments \ @south_korea_ln 14 Feb AI science

Why language models hallucinate - OpenAI openai.com/index/why-language-models-hallucinate/

438 sats \ 4 comments \ @Scoresby 6 Sep 2025 AI

The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark

260 sats \ 0 comments \ @0xbitcoiner 27 Feb AI

Meet the new biologists treating LLMs like aliens www.technologyreview.com/2026/01/12/1129782/ai-large-language-models-biology-alien-autopsy/

580 sats \ 1 comment \ @winteryeti 14 Jan AI

Is Chain-of-Thought Reasoning of LLMs a Mirage?arxiv.org/abs/2508.01191

427 sats \ 9 comments \ @optimism 7 Aug 2025 AI

Debate May Help AI Models Converge on Truth www.quantamagazine.org/debate-may-help-ai-models-converge-on-truth-20241108/

258 sats \ 0 comments \ @0xbitcoiner 8 Nov 2024 science

LLMs Can Get Brain Rot llm-brain-rot.github.io/

287 sats \ 0 comments \ @Scoresby 21 Oct 2025 AI

To Have Machines Make Math Proofs, Turn Them Into a Puzzle www.quantamagazine.org/to-have-machines-make-math-proofs-turn-them-into-a-puzzle-20251110/

268 sats \ 0 comments \ @0xbitcoiner 11 Nov 2025 AI

In a First, AI Models Analyze Language As Well As a Human Expert www.quantamagazine.org/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert-20251031/

274 sats \ 0 comments \ @0xbitcoiner 31 Oct 2025 AI

Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians arxiv.org/abs/2602.19141

978 sats \ 9 comments \ @k00b 31 Mar AI science HealthAndFitness

Vibe physics www.math.columbia.edu/~woit/wordpress/?p=15012

2355 sats \ 4 comments \ @south_korea_ln 1 Aug 2025 science

LLMs and the Specter of the Cognitive Black Hole www.psychologytoday.com/us/blog/the-digital-self/202403/llms-and-the-specter-of-the-cognitive-black-hole

200 sats \ 0 comments \ @ch0k1 22 Mar 2024 science

Financial Statement Analysis with Large Language Models papers.ssrn.com/sol3/papers.cfm?abstract_id=4835311&fbclid=IwY2xjawIJNupleHRuA2FlbQIxMAABHWJxn71ESvZCS0FxEF_31oro1rwtk4rlgOst5Q4A6tuxDhxB9cgZBPizAg_aem_OAMNHiz7Vyv2bb2vt2yM0Q

222 sats \ 2 comments \ @scatman 31 Jan 2025 AI

How to turn LLM Pinocchio into a real boy

12.7k sats \ 10 comments \ @Scoresby 7 Oct 2025 AI

AI is actually bad at math, ORCA shows www.theregister.com/2025/11/17/ai_bad_math_orca/

197 sats \ 4 comments \ @0xbitcoiner 18 Nov 2025 AI

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection arxiv.org/abs/2510.04849v1

433 sats \ 2 comments \ @optimism 19 Oct 2025 AI

How large are large language models?gist.github.com/rain-1/cf0419958250d15893d8873682492c3e

231 sats \ 0 comments \ @carter 14 Jul 2025 AI