Can see a bunch of PoW went into this, so kudos! 🚀
Can you talk a bit about all of these models you are currently using? How did you arrive at your current list?
  • Text = Mixtral 8x7B-Instruct
  • Audio = tortoise-tts model
  • Image = Stable Diffusion XL
  • ‘Vision’ = LLaVA-13b
Thank you ! PoW is the only way!
We are constantly monitoring for the latest and greatest models. If a new model performs better, we deploy it! We are also tracking new things (Vision is a good example) and if we see value in it, we put it on the platform.
But to answer your question specifically:
Text:
  • Mixtral 8x7B is currently the best Open-Source model based on our internal tests as well as multiple benchmarks. It's also very efficient which is what always us to charge only 21 sats per prompt.
  • We also offer a "Code" Model called Call-Llama2 70b, which can produce better results than GPT-4 on this specific task.
  • We are looking into adding another totally uncensored model, where no subject/topics are off-limits.
Image:
  • Stable Diffusion is for now the best Open-Source image model. Other models exist that could be cheaper, but given the performance and limits (see above comment) of the best model, we don't think they actually bring a lot of value.
Vision
  • LLaVA is an incredible model that came out just a few days after GPT-Vision and is the best we tested so far. Multimodal is key to unlocking new use cases to LLM.
Audio:
  • This is more of a "toy" model. It's fun to try but is still very limited in its current form. We just wanted to put it out there so people can see another "side" of AI.
reply