Unsupervised learning is starting to work, largely due to the Transformer model (which is what powers e.g. GPT-2). Before: could learn on sequence lengths of a few thousand. Now: can easily learn on tens of thousands.
Releasing the Sparse Transformer, a network which sets records at predicting what comes next in a sequence — whether text, images, or sound. Improvements to neural 'attention' let it extract patterns from sequences 30x longer than possible previously: openai.com/blog/sparse-trans…

Apr 23, 2019 · 6:22 PM UTC

3
51
1
246
Great work to @rewonfc @scottgray76 @AlecRad @ilyasut! Spare Transformer has become a core piece of OpenAI infrastructure, and we'll have some exciting results to share using it in upcoming months.
Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.
1
1
42
Replying to @gdb
Congratz! Now we can make a novel instead a paragraph?
Replying to @gdb
Will you train GPT-3 using a sparse transformer?