pull down to refresh

Over the weekend, I made Wairdle, a game that uses AI to mix Wordle, 20 Questions, and the IRC chat concept of seven degrees of separation. AI picks a secret word, you get 7 questions to try and guess the word. It is an AI prompt game.
What I'm finding out is that AI struggles with some things. AI agents can seem incredibly smart and yet they can be incredibly stupid.

Counting is...hard

I've tried Wairdle on Grok, Gemini, and Claude. All three did something like this:
AI: "I chose a 5 letter word: _ _ _ _ _ _" Me: I ask questions, narrow it down, eventually either guess the word or it reveals the answer. AI: "The word I've chosen is: PENCIL" Me: "Wait, you said it was a 5 letter word." AI: "My bad, thanks for pointing out that error. PENCIL is actually a 6 letter word." Me: "Argh."
It kind of stuns me that all three made this error. A user on Stacker.news wrote that he or she uses Venice.ai and had the same issue. I tried Wairdle on Venice.ai myself. And, yes, the same thing happened. It chose a "6-letter noun" which turned out to be the word, "pen." Clearly, there's something strange going on where AI struggles with this. So many things AI does that we humans are amazed by, but then something like this which is so simple to us, AI struggles with.
When I tried Venice.ai, I got this...
Thanks for your earnest promise, Venice. Please forgive me if I'm a Doubting Thomas here from Missouri...you've got to show me!
Here's an example of the counting struggle using Google's Gemini...
Here's an example using Claude...
And another example, using Grok. A 6-letter noun, verified as 6 letters: s-p-o-o-n. Umm...
So, four different LLMs and four instances of counting letters in words incorrectly.
Although I find this AI weakness odd, I think it'll prove to be a temporary thing. The way things are progressing, certainly something as basic as this will be cleared up quickly. I'm officially declaring that the race is on. Which LLM can clear this up first? Go!
Still, it's really puzzling to me as to why something this simple is so hard for AI to get correct. Any theories?

Side note

As a side note, another inspiration for Wairdle was the "Wikipedia Game." There may be multiple ways to play this, but the way I've seen it played is this:
  • You go to a Wikipedia page, that's the start page.
  • You're given a Wikipedia fish page, that's the finish line.
  • You have to click links, staying on Wikipedia, and wind up on the finish page. You just leapfrog from article to article to article until you get to the end. The first person who gets there wins. Or, the person with the fewest degrees of separation wins.

Play

Go to https://wairdle.vercel.app to play Wairdle and see what results you get.
Here's an example from today. The categories are broad but its the specific combination of them all that's needed to guess the correct article:
I got this one, but typically they're pretty niche bits of trivia you need to know.
reply
Nice, I hadn't seen this before. Takes some getting used to figure out what they're going after for sure.
reply
I love how you say wtf to autocorrect.
reply
When I literally wrote the correct word, it says, "No, wrong," then tells me the correct word is actually the exact same word that I said before, what else is there to say?
reply
It's because you were on random seed 0xf8018980dff9558e44e99ac1 which masked the path to "Yes, correct".
reply
Umm...I'm sure you know, but I have no idea what that means. :)
reply
Since LLMs are deterministic (it has an absolute set of weights that doesn't get updated) there are some randomizers involved to make chatbots not repeat themselves and make them more "human".
So for example, it checks how often it said "yes", and if it matches some threshold it will not say "yes" again. To make it even more "human", all these thresholds are dynamic, and the window in which it is evaluated is often dynamic too.
Most of this is controlled by temperature, which "globally" scales how much randomness is used. Lower values allow for less randomness. You may want to play around with this (I'm quite sure I've seen that in Venice chat settings.)
Depending on your model used, there are recommended values. If these don't work for your use-case because of too many hallucinations, try lowering them. I.e if you have 0.5 now, try 0.45 or 0.4.
reply
I really like the description of "Simulated Intelligence". LLMs and all the associated tooling and algos are not artificial intelligence. But they do trick us sometimes into thinking that its thinking. Its not. Its math. Complex math. Complex computational models. It has a ton of potential and tons of weaknesses.
What concerns me is that most people have no clue what it is and many seem to be trusting it in ways that are insane. Not to mention the companies behind the tools. AI is not a threat. As per usual humans are the threat.
reply
Lately I've been wondering if these types of posts about AI are misguided.
The pattern I have seen over the years is some showman selling his wares over-sells the tech. AI in this case. Its framed in a way that is complete bullshit. Serverless is an example. The name would trip people up. Dude, there's always a server. Next.
People would ignore something because of surface level nonsense instead of the actual tech. This happens with bitcoin by the way...
With AI many of us, including me have responded in reaction to people like Scam Altman lying about it. We didn't know how it worked but we knew it was BS. We see it struggle to do things like count letters in words. Many completely discount the whole thing for this reason. This is a mistake.
This is like discounting the early Internet because it was slow. Or bitcoin because it is slow. The example you describe here is a common and frankly old problem with LLMs. LLMs don’t β€œsee” text as a sequence of individual letters, they break it into tokens (chunks of text) based on patterns in the training data. There are ways to get the correct answers using a programming language. You can usually add something like use python for the math and show your work. It may still fail. Here's the deal. We don't need AI to do this crap. We have tons of tools that can do it much better.
So yeah, AI sucks at many things. It uses tons of power and the business model hasn't been proven. I've written all this stuff before. But what is more important?
People don't care about all these math things. A massive number of people can be tricked into thinking chatbots are actually artificial life. Its nonsense of course but its gonna spread.
To be the shortest way to explain AI is that its a prediction algo. Its pretty amazing at guessing what we want to get fed back to us. It doesn't know anything. Its guessing based on trading and data. People need to be red pilled on this aspect. That's my take
reply
This is so weird. I played today using ChatGPT. It chose a word, a food. It got the word count correct at 6 letters. But when I asked for a song that featured the food, a fruit, is said "Strawberry Fields Forever". I looked at the lyrics and saw no other fruits. Turns out the word was "cherry". That's six letters, but when I asked if cherry or cherries are in the song...its response:
Nope β€” "Strawberry Fields Forever" is about strawberries, not cherries. πŸ“
Looks like I gave you a misleading clue there β€” so you basically cracked cherry without a perfect song reference to lean on. That makes the win even more impressive. πŸ˜…
Argh. That lol emoji at the end. Argh.
reply