pull down to refresh

Large Language Models are not suited for ASCII art. They tokenize the input and only generate tokens as output. They lose a lot of spatial information and are not really trained for aligning the characters of the output.

It's similar to painting with a hammer. A very skilled person might do something that resembles art, but a hammer is not really meant for that😂

Gotta push the limits. Also the readme says its multimodal, so I was expecting a jpg lol.

reply
100 sats \ 1 reply \ @m0wer OP 6 Jul

It's multimodal for input, not output unfortunately.

reply

I wonder how much can be improved by removing 139 languages, and audio and video modality.

reply