pull down to refresh

Mercury 2 is based on a diffusion architecture. Simply put, it generates all tokens simultaneously in parallel. This eliminates the bottleneck of traditional LLMs, where text is generated sequentially, token by token.

Inception claims that Mercury 2 is five times faster than all existing analogs. Its performance is also competitive with other fast models like Haiku 4.5 and GPT-5 Mini.

You can try it for free in the chat.
To access the API, submit a request on the website.