items/1235139/related \ stacker news

pull down to refresh

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22%quesma.com/blog/tau2-benchmark-improving-results-smaller-models/

130 sats \ 0 comments \ @carter 24 Sep 2025 AI

related

Claude 3 beats GPT-4 on Aider's code editing benchmark aider.chat/2024/03/08/claude-3.html

377 sats \ 2 comments \ @hn 31 Mar 2024 tech

You can ask GPT-5 to pretend it is dumber than it is

517 sats \ 0 comments \ @Tony 16 Aug 2025 AI

OpenAI's GPT-5 is a cost cutting exercise www.theregister.com/2025/08/13/gpt_5_cost_cutting

247 sats \ 1 comment \ @Coinsreporter 13 Aug 2025 AI

The week in AI, August 4-10, 2025

2353 sats \ 12 comments \ @optimism 11 Aug 2025 AI

Benchmarking GPT-4 Turbo – A Cautionary Tale blog.mentat.ai/benchmarking-gpt-4-turbo-a-cautionary-tale

109 sats \ 1 comment \ @hn 9 Nov 2023 tech

Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT openai.com/index/retiring-gpt-4o-and-older-models/

230 sats \ 1 comment \ @lunin 31 Jan AI

GPT-5 Has a Hidden System Prompt simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/

170 sats \ 2 comments \ @Tony 17 Aug 2025 AI

Linexjlin/GPTs: leaked prompts of GPTs github.com/linexjlin/GPTs

297 sats \ 1 comment \ @hn 28 Nov 2023 tech

MCP-Bench: Benchmarking Tool-Using LLM Agents arxiv.org/abs/2508.20453

269 sats \ 0 comments \ @optimism 30 Aug 2025 AI

Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance

161 sats \ 1 comment \ @DrBrader99 19 Nov 2025 AI

OpenAI is rumored to be dropping GPT-5 soon what we know about next-gen model www.tomsguide.com/ai/chatgpt/openai-is-rumored-to-be-dropping-gpt-5-soon-heres-what-we-know-about-the-next-gen-model

481 sats \ 0 comments \ @ch0k1 22 Apr 2024 tech

GDPval: Measuring the performance of our models on real-world tasks - OpenAI openai.com/index/gdpval/

388 sats \ 8 comments \ @Scoresby 2 Oct 2025 AI

What are your first impressions from ChatGPT5?

1172 sats \ 9 comments \ @carter 8 Aug 2025 AI

Introducing the Prompt Enhancer and Optimizer Plugin for OpenAgents!

1646 sats \ 3 comments \ @BrianisNice 20 May 2024 openagents freebie

GPT-fabricated scientific papers on Google Scholar misinforeview.hks.harvard.edu/article/gpt-fabricated-scientific-papers-on-google-scholar-key-features-spread-and-implications-for-preempting-evidence-manipulation/

261 sats \ 0 comments \ @hn 8 Sep 2024 tech

Has Gemini surpassed ChatGPT? We put the AI models to the test.arstechnica.com/features/2026/01/has-gemini-surpassed-chatgpt-we-put-the-ai-models-to-the-test/

166 sats \ 1 comment \ @0xbitcoiner 21 Jan AI

Large Language Models Pass the Turing Test arxiv.org/pdf/2503.23674

374 sats \ 11 comments \ @south_korea_ln 15 Apr 2025 AI

OpenAI o1 vs GPT 4o – Is it worth paying 6x more? - Bind AI blog.getbind.co/2024/09/13/openai-o1-vs-gpt-4o-is-it-worth-paying-6x-more/

210 sats \ 0 comments \ @ch0k1 15 Sep 2024 tech

AI does math: Multiplication using o1-mini vs GPT-4o

260 sats \ 5 comments \ @zuspotirko 18 Sep 2024 tech

Meituan's LongCat-Flash reasoning model has been released longcat.chat/

258 sats \ 5 comments \ @lunin 22 Sep 2025 AI

"Benchwashing" - how do you defend against this?

1748 sats \ 10 comments \ @optimism 9 Aug 2025 AskSN