pull down to refresh
It's multimodal for input, not output unfortunately.
I wonder how much can be improved by removing 139 languages, and audio and video modality.
It's multimodal for input, not output unfortunately.