Less is More: Recursive Reasoning w/ Tiny Networks - Alexia Jolicoeur-Martineau \ stacker news ~AI

To be honest, I'm not entirely sure if this is relevant to the discussion about a "living prompt" and updating models in #1250420, but it seems like it's connected.

Either way, this from the Motivation section caught my eye:

Tiny Recursion Model (TRM) is a recursive reasoning model that achieves amazing scores of 45% on ARC-AGI-1 and 8% on ARC-AGI-2 with a tiny 7M parameters neural network. The idea that one must rely on massive foundational models trained for millions of dollars by some big corporation in order to achieve success on hard tasks is a trap. Currently, there is too much focus on exploiting LLMs rather than devising and expanding new lines of direction. With recursive reasoning, it turns out that “less is more”: you don’t always need to crank up model size in order for a model to reason and solve hard problems. A tiny model pretrained from scratch, recursing on itself and updating its answers over time, can achieve a lot without breaking the bank.

This work came to be after I learned about the recent innovative Hierarchical Reasoning Model (HRM). I was amazed that an approach using small models could do so well on hard tasks like the ARC-AGI competition (reaching 40% accuracy when normally only Large Language Models could compete). But I kept thinking that it is too complicated, relying too much on biological arguments about the human brain, and that this recursive reasoning process could be greatly simplified and improved. Tiny Recursion Model (TRM) simplifies recursive reasoning to its core essence, which ultimately has nothing to do with the human brain, does not require any mathematical (fixed-point) theorem, nor any hierarchy.

They do note that "We found that replacing the self-attention with an MLP worked extremely well on Sudoku-Extreme (improving test accuracy by 10%), but poorly on other datasets." So the results here probably need to be marked as preliminary.

Link to paper

though they're now investing in themselves through making deals with OpenAI & co, so I guess that bubble be poppin' - ripe for disruption, and this seems a step in that direction. ↩

135 sats \ 0 replies \ @optimism 8 Oct

A lot of intelligent things are said here and I love that they're doing the "smarter, not bigger" thing. This is how we bring things beast forward, according to everything that ever caused technological advance, in my experience. Scaling out is dumb advance, and the only party that truly benefits from this push is Nvidia¹.

This isn't related to the living prompt though - it's rather the opposite, where you don't rely on input tokens and low RNG to continuously tune activation but on recursion inside the model to get better results. Error rates is a thing with recursion though.

Footnotes