A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time. openai.com/blog/deep-double-…
94
673
72
2,047
Replying to @TheGregYang @OpenAI
That's the first citation in the blog post. Note also that the first author, Mikhail Belkin, provided helpful discussions and feedback throughout this work, as mentioned at the bottom.

Dec 5, 2019 · 5:42 PM UTC

1
1
34
Hiding the citation in a hyperlink is the wrong way to do this. Readers might assume OpenAI were the first to discover this. Text of blog post should assign credit liberally/explicitly, and explain that the (valuable!) contribution is showing this empirically for CNNs etc.
1
3
62