sign up
sign up
sign up
sign up
pull down to refresh
Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22%
quesma.com/blog/tau2-benchmark-improving-results-smaller-models/
130 sats
\
0 comments
\
@carter
24 Sep 2025
AI
related
Claude 3 beats GPT-4 on Aider's code editing benchmark
aider.chat/2024/03/08/claude-3.html
377 sats
\
2 comments
\
@hn
31 Mar 2024
tech
You can ask GPT-5 to pretend it is dumber than it is
517 sats
\
0 comments
\
@Tony
16 Aug 2025
AI
OpenAI's GPT-5 is a cost cutting exercise
www.theregister.com/2025/08/13/gpt_5_cost_cutting
247 sats
\
1 comment
\
@Coinsreporter
13 Aug 2025
AI
The week in AI, August 4-10, 2025
2353 sats
\
12 comments
\
@optimism
11 Aug 2025
AI
Benchmarking GPT-4 Turbo – A Cautionary Tale
blog.mentat.ai/benchmarking-gpt-4-turbo-a-cautionary-tale
109 sats
\
1 comment
\
@hn
9 Nov 2023
tech
Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT
openai.com/index/retiring-gpt-4o-and-older-models/
230 sats
\
1 comment
\
@lunin
31 Jan
AI
GPT-5 Has a Hidden System Prompt
simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/
170 sats
\
2 comments
\
@Tony
17 Aug 2025
AI
Linexjlin/GPTs: leaked prompts of GPTs
github.com/linexjlin/GPTs
297 sats
\
1 comment
\
@hn
28 Nov 2023
tech
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
269 sats
\
0 comments
\
@optimism
30 Aug 2025
AI
Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal
fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance
161 sats
\
1 comment
\
@DrBrader99
19 Nov 2025
AI
OpenAI is rumored to be dropping GPT-5 soon what we know about next-gen model
www.tomsguide.com/ai/chatgpt/openai-is-rumored-to-be-dropping-gpt-5-soon-heres-what-we-know-about-the-next-gen-model
481 sats
\
0 comments
\
@ch0k1
22 Apr 2024
tech
GDPval: Measuring the performance of our models on real-world tasks - OpenAI
openai.com/index/gdpval/
388 sats
\
8 comments
\
@Scoresby
2 Oct 2025
AI
What are your first impressions from ChatGPT5?
1172 sats
\
9 comments
\
@carter
8 Aug 2025
AI
Introducing the Prompt Enhancer and Optimizer Plugin for OpenAgents!
1646 sats
\
3 comments
\
@BrianisNice
20 May 2024
openagents
freebie
GPT-fabricated scientific papers on Google Scholar
misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/
261 sats
\
0 comments
\
@hn
8 Sep 2024
tech
Has Gemini surpassed ChatGPT? We put the AI models to the test.
arstechnica.com/features/2026/01/has-gemini-surpassed-chatgpt-we-put-the-ai-models-to-the-test/
166 sats
\
1 comment
\
@0xbitcoiner
21 Jan
AI
Large Language Models Pass the Turing Test
arxiv.org/pdf/2503.23674
374 sats
\
11 comments
\
@south_korea_ln
15 Apr 2025
AI
OpenAI o1 vs GPT 4o – Is it worth paying 6x more? - Bind AI
blog.getbind.co/2024/09/13/openai-o1-vs-gpt-4o-is-it-worth-paying-6x-more/
210 sats
\
0 comments
\
@ch0k1
15 Sep 2024
tech
AI does math: Multiplication using o1-mini vs GPT-4o
260 sats
\
5 comments
\
@zuspotirko
18 Sep 2024
tech
Meituan's LongCat-Flash reasoning model has been released
longcat.chat/
258 sats
\
5 comments
\
@lunin
22 Sep 2025
AI
"Benchwashing" - how do you defend against this?
1748 sats
\
10 comments
\
@optimism
9 Aug 2025
AskSN
more