items/1202902/related \ stacker news

pull down to refresh

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code arxiv.org/abs/2508.18106

32 sats \ 0 comments \ @optimism 1 Sep 2025 AI

related

MCP-Bench: Benchmarking Tool-Using LLM Agents arxiv.org/abs/2508.20453

239 sats \ 0 comments \ @optimism 30 Aug 2025 AI

Are LLMs Racist?

461 sats \ 11 comments \ @Tony 23 Oct 2025 AI

Qwen3-235B-A22B-2507 xcancel.com/Alibaba_Qwen/status/1947344511988076547

218 sats \ 0 comments \ @m0wer 24 Jul 2025 AI

LLM Rankings: programming | OpenRouter openrouter.ai/rankings/programming

96 sats \ 0 comments \ @m0wer 28 May 2025 tech

Claude 3.5 Sonnet www.anthropic.com/news/claude-3-5-sonnet

411 sats \ 0 comments \ @k00b 21 Jun 2024 tech

AI is actually bad at math, ORCA shows www.theregister.com/2025/11/17/ai_bad_math_orca/

167 sats \ 4 comments \ @0xbitcoiner 18 Nov 2025 AI

AI agents find $4.6M in blockchain smart contract exploits red.anthropic.com/2025/smart-contracts/

259 sats \ 2 comments \ @0xbitcoiner 2 Dec 2025 AI

"Benchwashing" - how do you defend against this?

1648 sats \ 10 comments \ @optimism 9 Aug 2025 AskSN

Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance

131 sats \ 1 comment \ @DrBrader99 19 Nov 2025 AI

Alibaba has released its flagship Qwen3-Max model with a trillion parameters chat.qwen.ai/

167 sats \ 0 comments \ @lunin 25 Sep 2025 AI

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

358 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth arxiv.org/abs/2509.03867

306 sats \ 0 comments \ @optimism 7 Sep 2025 AI

Boring is good jenson.org/boring

331 sats \ 1 comment \ @deSign_r 9 Oct 2025 Design

1133 sats \ 4 comments \ @crrdlx 9 Aug 2025 AI

Opti's Claude 4.5 Sonnet "vibe coding" report

1125 sats \ 13 comments \ @optimism 5 Oct 2025 AI

My lived experience writing with ChatGPT

567 sats \ 10 comments \ @realBitcoinDog 15 Apr 2025 BooksAndArticles

Vals AI — Finance Agent Benchmark www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter

54 sats \ 3 comments \ @BlokchainB 24 Apr 2025 AI

Adversarial Confusion Attacks: Disrupting Multimodal LLMs - Jakub Hoscilowicz www.researchgate.net/publication/396235412_Adversarial_Confusion_Attacks_Disrupting_Multimodal_LLMs

146 sats \ 0 comments \ @Scoresby 6 Oct 2025 AI

pylint MCP provider

1428 sats \ 6 comments \ @optimism 4 Jun 2025 builders

The flagship model, Qwen3-Max-Preview, has been released

100 sats \ 0 comments \ @lunin 5 Sep 2025 AI

OpenAI o1 vs GPT 4o – Is it worth paying 6x more? - Bind AI blog.getbind.co/2024/09/13/openai-o1-vs-gpt-4o-is-it-worth-paying-6x-more/

110 sats \ 0 comments \ @ch0k1 15 Sep 2024 tech