Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4
I'm a bit confused why it would struggle with this.
I mean, I understand why it would based on how the language models work.... but you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package?
reply
you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package
Using sub-agents like that was what we did before GPT. Think of Google Assistant or Alexa or Siri when you ask them about the weather.
But that isn't OpenAIs, Anthropics, Mistrals etc goals. We're not trying to build alist of agents anymore. We're trying to build actual intelligence from scratch. AGI. Accelerate. No crutches, it should become actually intelligent.
reply
I don’t understand what’s so hard about asking chatgpt to write a script to do this with accuracy. People are hung up on the “how many R’s are in strawberry?” Any math problem you’re doing, you have to tell the AI to use a script, otherwise it will try to do from memory of webpages that have math problems.
reply
The goal isn't for you to change the question such that chatGPT solves it correctly. The goal is for LLMs to actually become smarter. It should become more intelligent.
An example would be that it should do math correctly even deeper inside of longer form text. Or even when the user didn't know his question would involve math.
reply