pull down to refresh
Gotta push the limits. Also the readme says its multimodal, so I was expecting a jpg lol.
reply
It's multimodal for input, not output unfortunately.
reply
I wonder how much can be improved by removing 139 languages, and audio and video modality.
reply
Large Language Models are not suited for ASCII art. They tokenize the input and only generate tokens as output. They lose a lot of spatial information and are not really trained for aligning the characters of the output.
It's similar to painting with a hammer. A very skilled person might do something that resembles art, but a hammer is not really meant for that😂