A surprising deep learning mystery:
Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time.
openai.com/blog/deep-double-…
94
673
72
2,047
Isn't this the "double descent" phenomenon studied in arxiv.org/abs/1812.11118 and subsequent works?
4
8
126
That's the first citation in the blog post. Note also that the first author, Mikhail Belkin, provided helpful discussions and feedback throughout this work, as mentioned at the bottom.
Dec 5, 2019 · 5:42 PM UTC
1
1
34




