When designing a deep learning model, whenever a decision must be made, the answer is simple: multiply your data by a weight matrix. For example, in an LSTM, need to forget some data? Just multiply by a weight matrix to get the forget gate. The system will learn what you meant.

Dec 5, 2018 · 9:51 PM UTC

5
31
3
265
It's remarkable that so much power comes from literally just multiplying by a matrix.
6
6
1
92
Replying to @gdb
@gdb I think the time is right for a small article on “how to design a deep NN”. Looking forward to someone like you or Andrej writing it
1
4
Replying to @gdb
I'm confused.. isn't this already what deep learning is? Or am I missing something more experienced people know?
Replying to @gdb
+ some nonlinear activation that describes the decision (e.g. sigmoid for the forget gate). That a model can learn to forget inputs without us specifying what to forget is indeed remarkable.
Replying to @gdb
@gdb is it ever to late to learn computer science?
1
1
1