Google Icon Google

A text-to-image diffusion model with an unprecedented degree of photorealism  ↦

Google researchers are giving DALL-E a run for its money:

Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.

A text-to-image diffusion model with an unprecedented degree of photorealism

Discussion

Sign in or Join to comment or subscribe

0:00 / 0:00