@anon
sign up
@anon
sign up
pull down to refresh
AI Agent Benchmarks are Broken
ddkang.substack.com/p/ai-agent-benchmarks-are-broken
110 sats
\
0 comments
\
@carter
11 Jul
AI
related
AI Worse Than Humans In Every Way At Summarising Information, Gov Trial Finds
64 sats
\
0 comments
\
@0xbitcoiner
5 Sep 2024
tech
AI isn’t ready to replace human coders for debugging?
arstechnica.com/ai/2025/04/researchers-find-ai-is-pretty-bad-at-debugging-but-theyre-working-on-it/
30 sats
\
0 comments
\
@Coinsreporter
12 Apr
AI
AI: Overhyped, underhyped, or just misunderstood?
121 sats
\
8 comments
\
@claos545
9 Aug
AI
Episode 186: Actions Per Minute
71 sats
\
1 comment
\
@AtlantisPleb
25 Jul
openagents
Search-capable AI agents may cheat on benchmark tests
www.theregister.com/2025/08/23/searchcapable_ai_agents_may_cheat
237 sats
\
2 comments
\
@Coinsreporter
23 Aug
AI
Testing AI systems on hard math problems shows they still perform very poorly
phys.org/news/2024-11-ai-hard-math-problems-poorly.html
153 sats
\
4 comments
\
@south_korea_ln
13 Nov 2024
science
Is AI hitting a wall?
www.ft.com/content/d01290c9-cc92-4c1f-bd70-ac332cd40f94
343 sats
\
2 comments
\
@Coinsreporter
16 Aug
AI
Stop Building AI Agents
decodingml.substack.com/p/stop-building-ai-agents
102 sats
\
1 comment
\
@carter
11 Jul
devs
Episode 120: Exploring SWE-bench Verified
56 sats
\
0 comments
\
@AtlantisPleb
13 Aug 2024
openagents
Recent AI model progress feels mostly like bullshit — LessWrong
www.lesswrong.com/posts/4mvphwx5pdsZLMmpY/recent-ai-model-progress-feels-mostly-like-bullshit
190 sats
\
0 comments
\
@co574
6 Apr
AI
Vals AI — Finance Agent Benchmark
www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter
54 sats
\
3 comments
\
@BlokchainB
24 Apr
AI
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
239 sats
\
0 comments
\
@optimism
30 Aug
AI
Developers are 19% SLOWER when using AI
563 sats
\
4 comments
\
@zuspotirko
10 Jul
charts_and_numbers
AI model collapse is not what we paid for
www.theregister.com/2025/05/27/opinion_column_ai_model_collapse/
151 sats
\
0 comments
\
@k00b
28 May
tech
Learnings from building AI agents
www.cubic.dev/blog/learnings-from-building-ai-agents
47 sats
\
0 comments
\
@hn
30 Jun
tech
When AI Promises Speed but Delivers Debugging Hell
nsavage.substack.com/p/when-ai-promises-speed-but-delivers
24 sats
\
0 comments
\
@hn
26 Jan
tech
Parallel AI Agents Are a Game Changer
morningcoffee.io/parallel-ai-agents-are-a-game-changer.html
157 sats
\
0 comments
\
@carter
4 Sep
AI
Building Effective AI Agents
www.anthropic.com/engineering/building-effective-agents
154 sats
\
0 comments
\
@carter
19 Jun
tech
AI is Speeding Up AGAIN! HUGE Open Source AI Advancements!
invidious.fdn.fr/watch?v=xx0e7MW-rKA
20 sats
\
0 comments
\
@ama
27 Nov 2023
tech
Where's the Shovelware? Why AI Coding Claims Don't Add Up - Mike Judge
substack.com/home/post/p-172538377
809 sats
\
8 comments
\
@Scoresby
5 Sep
AI
Beyond the Sum: Unlocking AI Agents Potential Through Market Forces
github.com/Fewsats/beyond_the_sum
2406 sats
\
1 comment
\
@Rsync25
26 Dec 2024
lightning
more