Scary that just 1 month ago, after evaluating o1, the great Terrence Tao-
 
*anticipated that the benchmark would "resist AIs for several years at least," noting that the problems require substantial domain expertise and that we currently lack sufficient relevant training data.*

![](https://m.stacker.news/68853)

(https://arxiv.org/html/2411.04872v1)

Paticularly impressive to me

![](https://pbs.twimg.com/media/GfQtsVnXgAAnE6h?format=jpg&name=medium)

it solves 1/4 of research-level math questions 


zuspotirko

rafael_xmr

https://arcprize.org/blog/oai-o3-pub-breakthrough

https://xcancel.com/arcprize/status/1870169260850573333

nitter

tech

OpenAI released a new model called "o3" as part of their shipmas product release cycle

87.5% on ARC AGI (humans  are rougly 85%), while the last generation o1 was suck in the 30%ish area

![](https://pbs.twimg.com/media/GfQrfI2WcAAVnhl?format=jpg&name=medium)

https://x.com/arcprize/status/1870169260850573333