So, there is a new AI model out there that might be a contender for the smartest one out there yet. Claude 3.5. We'll see in the next few weeks. But which one is actually the smartest LLM? Generally, we answer this question with benchmarks. And just like with CPU benchmarks of the last decades there are dozens of them. Manufacturers cherrypick the benchmarks they're good in.
Over the last few months I think LMSYS has become the most competitive way to benchmark. Why? Because humans (specifically people interested in AI) come up with way more questions, way more creative questions and are the best judges possible to judge the output and logic of foundation models.
But we accidentally created a catch.
  1. Just like Intel back in the day created CPUs with a bazillion GHZ but little performance gains, we've created a benchmark that appeals to humans instead of objective intelligence. Humans will downvote any attempts of AI to become smarter than humans as missinformation. Because it is - to the best of our understanding.
  2. This is a prime target for exploitation. The humans voting for this have a big influence now. AFAIK and I researched searched through the web a little this is a novel idea. What if an organized group of humans votes on lmsys with an agenda in mind? I personally like photography, so I will ask it a lot on lenses and analog film to make a small dent in ensuring the future of AI being smart in this topic. Bland example, I know. But maybe you personally have more in mind? But maybe there are groups of people out there that have a more important agenda in mind?
  3. There is no going back now. Now that lmsys exsists, we will never ever go back to simple string contains benchmarks. They sure will continue to exsist at ai companies internally. But no one will ever again create the Geekbench or Cinebench of LLMs in the public standing. We have done the first step in loosing control.
Conclusion: Idk what to do now. This seems simultaneously like a small irrelevant insight and something that shapes humanity forever. As we move forward, it's essential to recognize the implications of our actions and consider the long-term consequences of ceding control over AI development. Nobody else seems to consider such basic game logic here. Go vote on https://chat.lmsys.org/ or somethink idk
I think this is an important point. There's no objectivity for us to train it on, so AI is going to be riddled with biases.
reply
Thanks for this. I was having a hard time figuring out what @zuspotirko was reacting to exactly. I got the sense it must be important but couldn't parse it out exactly.
reply
I just tried out Claude 3.5. It's impressively good and had some thoughts about the lmsys voting, ELO scores and leaderboard. Something important changed in LLM benchmarking and nobody seems to have talked about the implications yet. Maybe I'm overthinking it.
reply
I watched some of this video anon shared in saloon yesterday and it gave me similar "alarm":
The guest wrote a series of essays that have been circulating a lot, which predict, among other things, AI intelligence better than the average college graduate by 2026.
reply
Is it just me, or are other people scared of talking to AI?
I casually talked to the Instagram AI a couple weeks back and it was absolutely terrifying how good it was. It is getting exponentially better...
reply