OpenAI released a new model called "o3" as part of their shipmas product release cycle
87.5% on ARC AGI (humans are rougly 85%), while the last generation o1 was suck in the 30%ish area
pull down to refresh
OpenAI released a new model called "o3" as part of their shipmas product release cycle
87.5% on ARC AGI (humans are rougly 85%), while the last generation o1 was suck in the 30%ish area
Paticularly impressive to me
it solves 1/4 of research-level math questions
Scary that just 1 month ago, after evaluating o1, the great Terrence Tao-
anticipated that the benchmark would "resist AIs for several years at least," noting that the problems require substantial domain expertise and that we currently lack sufficient relevant training data.
(https://arxiv.org/html/2411.04872v1)
AGI is coming.
https://arcprize.org/blog/oai-o3-pub-breakthrough
https://xcancel.com/arcprize/status/1870169260850573333