We saw LLMs are possible to call tools for a long time and we even saw protocols like MCP out there to make it more standard.
It's simple to understand how it work at the surface level, but for me the big question was how a large LANGUAGE mode calls a tool? It was obvious that they had to SAY it and ask to the client somehow to do it for them, but they are very dynamic? how do we force them to always respond in a known format? So, based on this video I just understand they are fine-tuning models to learn how to call a tool correctly in standard form.
You may knew this, but it was always a question for me and after a long time I had enough time to do a search about it.
chat_template
(the composed text that gets presented to the tokenizer after you press enter). For qwen3's gguf, the tool injection looks like this: