The core innovation in Kosmos is our use of structured world models, which allow us to efficiently incorporate information extracted over hundreds of agent trajectories and maintain coherence towards a specific research objective over tens of millions of tokens. An individual Kosmos run involves reading 1500 papers and running 42,000 lines of analysis code, far more than any other agent we are aware of.

Our beta users estimate that Kosmos can do in one day what would take them 6 months, and we find that 79.4% of its conclusions are accurate.

The average, across 7 scientists, was 6.14 months for 20-step Kosmos runs. We performed the same measurement for more shallow runs, with blinded evaluation, thus leading to the scaling law presented in the technical report.

Pretty awesome though even if that's only time they feel they're saving.

Kosmos: An AI Scientist for Autonomous Discovery

carter

> The core innovation in Kosmos is our use of structured world models, which allow us to efficiently incorporate information extracted over hundreds of agent trajectories and maintain coherence towards a specific research objective over tens of millions of tokens. An individual Kosmos run involves reading 1500 papers and running 42,000 lines of analysis code, far more than any other agent we are aware of.

> Our beta users estimate that Kosmos can do in one day what would take them 6 months, and we find that 79.4% of its conclusions are accurate.

The survey included only 7 scientists:

> The average, across 7 scientists, was 6.14 months for 20-step Kosmos runs. We performed the same measurement for more shallow runs, with blinded evaluation, thus leading to the scaling law presented in the technical report.

Pretty awesome though even if that's only time they feel they're saving.