Statistics and deep learning give conflicting intuitions about how models should behave. Turns out that many deep learning models transition from acting how statisticians think they should, to how deep learning practitioners think they should:
A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time. openai.com/blog/deep-double-…

Dec 5, 2019 · 5:47 PM UTC

1
15
57
Replying to @gdb
I'm applying for my ML PhD next year and developing robust theory for this phenomenon is exactly what I want to do! It's such a unique and high-consequence feature of deep learning that we can't explain generalization with traditional statistics
1