I really wonder what would happen if they could only train on old-timey books and materials out of copyright.  That would be a fun LLM to talk to.

elvismercury

that explains why it is so good.  the best models will have the most illicit and broad training dataset

byzantine

Glad to see that Zuck has the source for pirated books that I do, though clearly on a greater scale.

 but paywalled. Link above is to Archive.is instead).

tech

Glad to see that Zuck has the source for pirated books that I do, though clearly on a greater scale. 

(Article originally at [The Atlantic](https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/) but paywalled. Link above is to Archive.is instead).