Imagen (Google's text-to-image neural net) implemented in Pytorch

Imagen (Google's text-to-image neural net) implemented in Pytorch ↦

Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a pip install imagen-pytorch away.

Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.