sign up
sign up
sign up
sign up
pull down to refresh
AI Agent Benchmarks are Broken
ddkang.substack.com/p/ai-agent-benchmarks-are-broken
140 sats
\
0 comments
\
@carter
11 Jul 2025
AI
related
Search-capable AI agents may cheat on benchmark tests
www.theregister.com/2025/08/23/searchcapable_ai_agents_may_cheat
267 sats
\
2 comments
\
@Coinsreporter
23 Aug 2025
AI
AI benchmarks hampered by bad science
www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/
208 sats
\
0 comments
\
@0xbitcoiner
10 Nov 2025
AI
AI agents can't teach themselves new tricks – people can
www.theregister.com/2026/02/19/ai_agents_cant_teach_themselves/
221 sats
\
1 comment
\
@0xbitcoiner
19 Feb
AI
Has anyone had success with AI agents?
338 sats
\
9 comments
\
@NEEDcreations
21 Feb
AI
Beyond the Sum: Unlocking AI Agents Potential Through Market Forces
github.com/Fewsats/beyond_the_sum
2462 sats
\
1 comment
\
@Rsync25
26 Dec 2024
lightning
Voting on chat.lmsys.org might be super influential for humankinds future
611 sats
\
5 comments
\
@zuspotirko
22 Jun 2024
tech
AI Agents Unleashed: Navigating a World of Intelligent Autonomy in 2025
1185 sats
\
0 comments
\
@satcat
7 Mar 2025
AI
The week in AI, August 18-24, 2025
2309 sats
\
2 comments
\
@optimism
25 Aug 2025
AI
The ORCA Benchmark Evaluates How Well AIs Deal with Everyday Math
www.omnicalculator.com/reports/omni-research-on-calculation-in-ai-benchmark
260 sats
\
0 comments
\
@0xbitcoiner
27 Feb
AI
Building Effective AI Agents
www.anthropic.com/engineering/building-effective-agents
184 sats
\
0 comments
\
@carter
19 Jun 2025
tech
AI Agents vs Cybersecurity Professionals in Real-World Penetration Testing
arxiv.org/abs/2512.09882
194 sats
\
2 comments
\
@optimism
13 Dec 2025
AI
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
269 sats
\
0 comments
\
@optimism
30 Aug 2025
AI
The Age of the All-Access AI Agent Is Here
www.wired.com/story/expired-tired-wired-all-access-ai-agents/
551 sats
\
2 comments
\
@0xbitcoiner
24 Dec 2025
AI
Stop Building AI Agents
decodingml.substack.com/p/stop-building-ai-agents
152 sats
\
1 comment
\
@carter
11 Jul 2025
devs
Parallel AI Agents Are a Game Changer
morningcoffee.io/parallel-ai-agents-are-a-game-changer.html
187 sats
\
0 comments
\
@carter
4 Sep 2025
AI
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
arxiv.org/abs/2510.04721
210 sats
\
1 comment
\
@jakoyoh629
25 Oct 2025
AI
I Don’t Know What An AI Agent Is
389 sats
\
4 comments
\
@Jon_Hodl
1 Feb
AI
Memes
Half the AI Agent Market Is One Category. The Rest Is Wide Open.
garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open
378 sats
\
0 comments
\
@co574
24 Feb
charts_and_maps
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
arxiv.org/abs/2505.22954
80 sats
\
0 comments
\
@0xbitcoiner
11 Jun 2025
tech
Signal Execs warn agentic AI is insecure, unreliable, & a surveillance nightmare
coywolf.com/news/productivity/signal-president-and-vp-warn-agentic-ai-is-insecure-unreliable-and-a-surveillance-nightmare/
208 sats
\
1 comment
\
@co574
15 Jan
AI
Meredith Whittaker calls out agentic AI for 'profound' security/privacy issues
techcrunch.com/2025/03/07/signal-president-meredith-whittaker-calls-out-agentic-ai-as-having-profound-security-and-privacy-issues/
864 sats
\
6 comments
\
@k00b
9 Mar 2025
privacy
more