AI does math: Multiplication using o1-mini vs GPT-4o \ stacker news ~tech

pull down to refresh

AI does math: Multiplication using o1-mini vs GPT-4o

160 sats \ 5 comments \ @zuspotirko 18 Sep 2024 tech

Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

view all related items

21 sats \ 1 reply \ @SimpleStacker 18 Sep 2024

I'm a bit confused why it would struggle with this.

I mean, I understand why it would based on how the language models work.... but you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package?

0 sats \ 0 replies \ @zuspotirko OP 18 Sep 2024

you'd think it'd be easy enough to build in a mode where it can switch to math mode and just use an actual numerical software package

Using sub-agents like that was what we did before GPT. Think of Google Assistant or Alexa or Siri when you ask them about the weather.

But that isn't OpenAIs, Anthropics, Mistrals etc goals. We're not trying to build alist of agents anymore. We're trying to build actual intelligence from scratch. AGI. Accelerate. No crutches, it should become actually intelligent.

0 sats \ 1 reply \ @clarity 18 Sep 2024

I don’t understand what’s so hard about asking chatgpt to write a script to do this with accuracy. People are hung up on the “how many R’s are in strawberry?” Any math problem you’re doing, you have to tell the AI to use a script, otherwise it will try to do from memory of webpages that have math problems.

https://chatgpt.com/share/66eb00c7-d048-8010-890e-1607ae6e7d67

21 sats \ 0 replies \ @zuspotirko OP 18 Sep 2024

The goal isn't for you to change the question such that chatGPT solves it correctly. The goal is for LLMs to actually become smarter. It should become more intelligent.

An example would be that it should do math correctly even deeper inside of longer form text. Or even when the user didn't know his question would involve math.

0 sats \ 0 replies \ @nitter 18 Sep 2024

https://xcancel.com/yuntiandeng/status/1836114401213989366