Turns out DALL-E can read the seemingly gibberish writing it produces. Built its own mini-language that is consistent between its text input space and image output space:
DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.
The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.
A thread (1/n)🧵
8
23
9
132
A pretty good hypothesis on how this arises (as an artifact of how the text input space is tokenized):
I took a look at the BPE encoding of the name DALL-E uses for birds.
Its "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>".
Apo-didae & Plo-ceidae are families of birds, each with 100+ species. Apo-diformes is the biggest order of birds with 400+ species of birds.
Jun 1, 2022 · 12:11 AM UTC
3
3
49





