Apparently, even the "open source" ones are not completely open. Where is the full dataset used to train them available for download, modification and inspection? If they control the language, they still control the world.
Here's the dataset for gpt-j: https://pile.eleuther.ai/ Most other free models also have pubicly available datasets and are fully reproducible.
reply
Error 404 when I try to download the Pile.
But I want to believe.
reply
lol here's the torrent magnet:?xt=urn:btih:0d366035664fdf51cfbe9f733953ba325776e667&dn=EleutherAI_ThePile_v1&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
reply