PixNerd: Pixel Neural Field Diffusion \ stacker news ~AI

The current success of diffusion transformers heavily depends on the compressed latent space shaped by the pre-trained variational autoencoder(VAE). However, this two-stage training paradigm inevitably introduces accumulated errors and decoding artifacts. To address the aforementioned problems, researchers return to pixel space at the cost of complicated cascade pipelines and increased token complexity. In contrast to their efforts, we propose to model the patch-wise decoding with neural field and present a single-scale, single-stage, efficient, end-to-end solution, coined as pixel neural field diffusion

5 fingers, 2 legs, and 2 arms, from here on out!

I, or more likely the troll that lives inside me, couldn't help but wonder what it does if you ask it to render with 6 fingers though: so I used the demo and asked it for a photo of a person signing a contract, with 6 fingers:

100 sats \ 1 reply \ @Scoresby 4 Aug

I think the accidental six-finger images may have understood six-finger biology a little better. The left hand isn't bad, though.

102 sats \ 0 replies \ @optimism OP 4 Aug

On the right hand it looks as if it really struggled with the instruction though; so there's good hope for Mariana-trench-depth-fakes!