@anon
sign up
@anon
sign up
pull down to refresh
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
arxiv.org/abs/2508.18106
32 sats
\
0 comments
\
@optimism
1 Sep
AI
related
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
239 sats
\
0 comments
\
@optimism
30 Aug
AI
Are LLMs Racist?
461 sats
\
11 comments
\
@Tony
23 Oct
AI
LLM Rankings: programming | OpenRouter
openrouter.ai/rankings/programming
96 sats
\
0 comments
\
@m0wer
28 May
tech
Claude 3.5 Sonnet
www.anthropic.com/news/claude-3-5-sonnet
411 sats
\
0 comments
\
@k00b
21 Jun 2024
tech
Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal
fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance
131 sats
\
1 comment
\
@DrBrader99
19 Nov
AI
AI is actually bad at math, ORCA shows
www.theregister.com/2025/11/17/ai_bad_math_orca/
167 sats
\
4 comments
\
@0xbitcoiner
18 Nov
AI
Qwen3-235B-A22B-2507
xcancel.com/Alibaba_Qwen/status/1947344511988076547
218 sats
\
0 comments
\
@m0wer
24 Jul
AI
"Benchwashing" - how do you defend against this?
1648 sats
\
10 comments
\
@optimism
9 Aug
AskSN
Alibaba has released its flagship Qwen3-Max model with a trillion parameters
chat.qwen.ai/
167 sats
\
0 comments
\
@lunin
25 Sep
AI
Introducing Claude Opus 4.5
www.anthropic.com/news/claude-opus-4-5
418 sats
\
0 comments
\
@lunin
25 Nov
AI
GDPval: Measuring the performance of our models on real-world tasks - OpenAI
openai.com/index/gdpval/
358 sats
\
8 comments
\
@Scoresby
2 Oct
AI
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
arxiv.org/abs/2509.03867
306 sats
\
0 comments
\
@optimism
7 Sep
AI
Boring is good
jenson.org/boring
231 sats
\
0 comments
\
@deSign_r
9 Oct
Design
Wairdle
1133 sats
\
4 comments
\
@crrdlx
9 Aug
AI
Opti's Claude 4.5 Sonnet "vibe coding" report
1125 sats
\
13 comments
\
@optimism
5 Oct
AI
My lived experience writing with ChatGPT
567 sats
\
10 comments
\
@realBitcoinDog
15 Apr
BooksAndArticles
The flagship model, Qwen3-Max-Preview, has been released
100 sats
\
0 comments
\
@lunin
5 Sep
AI
pylint MCP provider
1428 sats
\
6 comments
\
@optimism
4 Jun
builders
Vals AI — Finance Agent Benchmark
www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter
54 sats
\
3 comments
\
@BlokchainB
24 Apr
AI
OpenAI o1 vs GPT 4o – Is it worth paying 6x more? - Bind AI
blog.getbind.co/2024/09/13/openai-o1-vs-gpt-4o-is-it-worth-paying-6x-more/
110 sats
\
0 comments
\
@ch0k1
15 Sep 2024
tech
Grok 4 Fast released!
grok.com
157 sats
\
3 comments
\
@lunin
20 Sep
AI
more