@anon
sign up
@anon
sign up
pull down to refresh
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
arxiv.org/abs/2508.18106
32 sats
\
0 comments
\
@optimism
1 Sep
AI
related
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
239 sats
\
0 comments
\
@optimism
30 Aug
AI
Claude 3.5 Sonnet
www.anthropic.com/news/claude-3-5-sonnet
411 sats
\
0 comments
\
@k00b
21 Jun 2024
tech
LLM Rankings: programming | OpenRouter
openrouter.ai/rankings/programming
96 sats
\
0 comments
\
@m0wer
28 May
tech
Qwen3-235B-A22B-2507
xcancel.com/Alibaba_Qwen/status/1947344511988076547
218 sats
\
0 comments
\
@m0wer
24 Jul
AI
"Benchwashing" - how do you defend against this?
1648 sats
\
10 comments
\
@optimism
9 Aug
AskSN
Alibaba has released its flagship Qwen3-Max model with a trillion parameters
chat.qwen.ai/
167 sats
\
0 comments
\
@lunin
25 Sep
AI
GDPval: Measuring the performance of our models on real-world tasks - OpenAI
openai.com/index/gdpval/
358 sats
\
8 comments
\
@Scoresby
2 Oct
AI
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
arxiv.org/abs/2509.03867
306 sats
\
0 comments
\
@optimism
7 Sep
AI
Boring is good
jenson.org/boring
231 sats
\
0 comments
\
@deSign_r
9 Oct
Design
Wairdle
1133 sats
\
4 comments
\
@crrdlx
9 Aug
AI
Opti's Claude 4.5 Sonnet "vibe coding" report
1125 sats
\
13 comments
\
@optimism
5 Oct
AI
My lived experience writing with ChatGPT
567 sats
\
10 comments
\
@realBitcoinDog
15 Apr
BooksAndArticles
The flagship model, Qwen3-Max-Preview, has been released
100 sats
\
0 comments
\
@lunin
5 Sep
AI
pylint MCP provider
1428 sats
\
6 comments
\
@optimism
4 Jun
builders
Vals AI — Finance Agent Benchmark
www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter
54 sats
\
3 comments
\
@BlokchainB
24 Apr
AI
OpenAI o1 vs GPT 4o – Is it worth paying 6x more? - Bind AI
blog.getbind.co/2024/09/13/openai-o1-vs-gpt-4o-is-it-worth-paying-6x-more/
110 sats
\
0 comments
\
@ch0k1
15 Sep 2024
tech
Researchers discover impressive learning capabilities in long-context LLMs
venturebeat.com/ai/deepmind-researchers-discover-impressive-learning-capabilities-in-long-context-llms/
297 sats
\
0 comments
\
@ch0k1
25 Apr 2024
tech
Grok 4 Fast released!
grok.com
157 sats
\
3 comments
\
@lunin
20 Sep
AI
The week in AI, August 11-17, 2025
1637 sats
\
4 comments
\
@optimism
21 Aug
AI
LLM Alignment: Reward-Based vs Reward-Free Methods
towardsdatascience.com/llm-alignment-reward-based-vs-reward-free-methods-ef0c0f6e8d88?gi=90f7a78bfcff
17 sats
\
0 comments
\
@ch0k1
6 Jul 2024
news
OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model
www.searchenginejournal.com/openai-secretly-funded-frontiermath-benchmarking-dataset/537760/
341 sats
\
0 comments
\
@frostdragon
21 Jan
tech
more