pull down to refresh

Interesting how they claim to boost accuracy over the highest accurate model they use, just by mixing models?
I've been trying something similar on Roo Code, where I let Claude do the architecture and use self-hosted models for everything else. qwen3-coder isn't as good as claude-4-sonnet in coding, but it's still decent enough to let it slug it out.
I've been trying to build a special "hard-problem debug" mode but since I haven't found a single model that is capable of fixing concurrency issues without constant manual interruption (incl all of the commercial closed models) I've put that on hold. But this makes me think that if I can alternate between a model that's good at determination, and let it guide / judge a coder model... this may work?
reply
I wonder if its related to this phenomenon where running 2 times gives much better results https://brooker.co.za/blog/2012/01/17/two-random.html
reply
I read that as: when you gamble twice, you have a greater chance to win once. lol
reply
I think the big thing is this: Just random is the same no matter the delay, picking the fastest of 2 is much better but not that much worse than picking from 3. It seems similar to this strategy https://www.tiktok.com/t/ZP8BD7XQp/
reply
bro made me watch tiktok!
It's a good theory, but the reason why I say gamble is because of the randomization going on, even in MoE, where it's been reduced a lot. I'm not sure how this works in gpt-5 or claude-4 though, so maybe that's worth testing too.
reply