items/1034728/related \ stacker news

pull down to refresh

AI Agent Benchmarks are Broken ddkang.substack.com/p/ai-agent-benchmarks-are-broken

140 sats \ 0 comments \ @carter 11 Jul 2025 AI

related

Search-capable AI agents may cheat on benchmark tests www.theregister.com/2025/08/23/searchcapable_ai_agents_may_cheat

267 sats \ 2 comments \ @Coinsreporter 23 Aug 2025 AI

AI benchmarks hampered by bad science www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/

208 sats \ 0 comments \ @0xbitcoiner 10 Nov 2025 AI

AI agents can't teach themselves new tricks – people can www.theregister.com/2026/02/19/ai_agents_cant_teach_themselves/

221 sats \ 1 comment \ @0xbitcoiner 19 Feb AI

Has anyone had success with AI agents?

338 sats \ 9 comments \ @NEEDcreations 21 Feb AI

Beyond the Sum: Unlocking AI Agents Potential Through Market Forces github.com/Fewsats/beyond_the_sum

2462 sats \ 1 comment \ @Rsync25 26 Dec 2024 lightning

Voting on chat.lmsys.org might be super influential for humankinds future

611 sats \ 5 comments \ @zuspotirko 22 Jun 2024 tech

AI Agents Unleashed: Navigating a World of Intelligent Autonomy in 2025

1185 sats \ 0 comments \ @satcat 7 Mar 2025 AI

The week in AI, August 18-24, 2025

2309 sats \ 2 comments \ @optimism 25 Aug 2025 AI

The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark

260 sats \ 0 comments \ @0xbitcoiner 27 Feb AI

Building Effective AI Agents www.anthropic.com/engineering/building-effective-agents

184 sats \ 0 comments \ @carter 19 Jun 2025 tech

AI Agents vs Cybersecurity Professionals in Real-World Penetration Testing arxiv.org/abs/2512.09882

194 sats \ 2 comments \ @optimism 13 Dec 2025 AI

MCP-Bench: Benchmarking Tool-Using LLM Agents arxiv.org/abs/2508.20453

269 sats \ 0 comments \ @optimism 30 Aug 2025 AI

The Age of the All-Access AI Agent Is Here www.wired.com/story/expired-tired-wired-all-access-ai-agents/

551 sats \ 2 comments \ @0xbitcoiner 24 Dec 2025 AI

Stop Building AI Agents decodingml.substack.com/p/stop-building-ai-agents

152 sats \ 1 comment \ @carter 11 Jul 2025 devs

Parallel AI Agents Are a Game Changer morningcoffee.io/parallel-ai-agents-are-a-game-changer.html

187 sats \ 0 comments \ @carter 4 Sep 2025 AI

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs arxiv.org/abs/2510.04721

210 sats \ 1 comment \ @jakoyoh629 25 Oct 2025 AI

I Don’t Know What An AI Agent Is

389 sats \ 4 comments \ @Jon_Hodl 1 Feb AI Memes

Half the AI Agent Market Is One Category. The Rest Is Wide Open.garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open

378 sats \ 0 comments \ @co574 24 Feb charts_and_maps

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents arxiv.org/abs/2505.22954

80 sats \ 0 comments \ @0xbitcoiner 11 Jun 2025 tech

Signal Execs warn agentic AI is insecure, unreliable, & a surveillance nightmare coywolf.com/news/productivity/signal-president-and-vp-warn-agentic-ai-is-insecure-unreliable-and-a-surveillance-nightmare/

208 sats \ 1 comment \ @co574 15 Jan AI

Meredith Whittaker calls out agentic AI for 'profound' security/privacy issues techcrunch.com/2025/03/07/signal-president-meredith-whittaker-calls-out-agentic-ai-as-having-profound-security-and-privacy-issues/

864 sats \ 6 comments \ @k00b 9 Mar 2025 privacy