It might cut cost for OpenAI, but if you use it via API, there’s no option to  use any kind of a hybrid model, you’ve gotta do it manually via backend. There’s even no such thing as a “non-thinking” gpt-5. 

![](https://m.stacker.news/104745)

Tony

> Perhaps the best evidence of cost-cutting is the fact that GPT-5 isn't actually one model. It's a collection of at least two models: a lightweight LLM that can quickly respond to most requests and a heavier duty one designed to tackle more complex topics. Which model prompts land in is determined by a router model, which acts a bit like an intelligent load balancer for the platform as a whole. Image prompts use a completely different model, Image Gen 4o.