Unsupervised learning is starting to work, largely due to the Transformer model (which is what powers e.g. GPT-2). Before: could learn on sequence lengths of a few thousand. Now: can easily learn on tens of thousands.
Releasing the Sparse Transformer, a network which sets records at predicting what comes next in a sequence — whether text, images, or sound. Improvements to neural 'attention' let it extract patterns from sequences 30x longer than possible previously: openai.com/blog/sparse-trans…
3
51
1
246
Great work to @rewonfc @scottgray76 @AlecRad @ilyasut! Spare Transformer has become a core piece of OpenAI infrastructure, and we'll have some exciting results to share using it in upcoming months.
Releasing some work today with @scottgray76 @AlecRad and @ilyasut. Contains some simple adaptations for Transformers that extend them to long sequences.
Apr 23, 2019 · 6:44 PM UTC
1
1
42

