Some discussion regarding this happened here
reply
I'd like it if someone could explain the flow-diagram in one paragraph. I was too tired to read the article yesterday.
reply
You mean this one?
fyi, there is probably a ViT out there that can do this for you (without hallucinating too much in a nontrivial way). If you find one, let me know, I would also be interested, lol
Vision Transformer (ViT): ViT is a general pre-trained transformer designed for computer vision tasks. It treats images as sequences of patches, allowing it to leverage the powerful capabilities of the transformer architecture for image classification and other vision tasks.
reply
Thanks for the link by the way. Had not heard of ViT. A good explanation.
Thought about prompting Mistral to tell me how I can compile and run it. But in the end decided to leave it for a while. Hanging out with LLM all day might be excessive.
reply
Yeah. The thought crossed my mind that some use of prompt-fed LLM could do that. I've never hear about ViT. Sounds like.. the future!
A difficult task for a skilled writer with technical background perhaps.
reply