pull down to refresh

I'm figuring out the best way to lower token cost of my LLM based news bot. Particularly, I'm working on splitting the workload between different models: give simpler tasks to lighter models and hard ones – like translations – to GPT-5.
As GPT-5 dropped I rushed to switch to it only to figure out that this novelty is incredibly token hungry. I covered this in more detail here #1075448.
I often run my ideas by GPT-5 with Microsoft's Copilot chatbot app 'cause I use OpenAI API to power my news bot. Copilot allows you to specifically use GPT-5 and not have it fall back to lighter models whenever it "wants". And I noticed something pretty cool – you can use natural language to ask GPT-5 to pretend it is a lighter model and test how your prompts or other approaches would work if you fed them to those non-reasoning models.
In this example GPT-5 processed the text as if was GPT-Mini and shared the results with me.
Will try this new split approach tomorrow and share my further findings here on SN.