Perhaps the best evidence of cost-cutting is the fact that GPT-5 isn't actually one model. It's a collection of at least two models: a lightweight LLM that can quickly respond to most requests and a heavier duty one designed to tackle more complex topics. Which model prompts land in is determined by a router model, which acts a bit like an intelligent load balancer for the platform as a whole. Image prompts use a completely different model, Image Gen 4o.
pull down to refresh
related posts
It might cut cost for OpenAI, but if you use it via API, there’s no option to use any kind of a hybrid model, you’ve gotta do it manually via backend. There’s even no such thing as a “non-thinking” gpt-5.