Does it have to a dichotomy?
From Richard Sutton (incompleteideas.net/), an essay on the repeated historical finding that computational scale has always beaten cleverness in AI (and some commentary on why this is such a hard-to-accept fact): incompleteideas.net/IncIdeas…
1
18
Replying to @etzioni
Both are important! Look at GPT-2 for instance — that's a general-purpose architectural improvement (i.e. the Transformer) run at massive scale. One interesting point from the essay is that scale gets a bad rap — doing the reverse isn't a good way of fixing the problem!

Mar 15, 2019 · 2:46 AM UTC

3
2
28
Replying to @gdb @etzioni
Sutton didn’t mean priors or knowledge not important but far less important than computing in the long run. It is not dichotomy but meant “don’t run in front of the train”, even if you have to, then only temporarily cuz the train will catch up fast.
Replying to @gdb @etzioni
Further, once we know how to solve a problem with more computation, more efficient methods often follow. Eg, VGG -> mobile net The rate limiting step is the first solution.
Replying to @gdb @etzioni
Hmm, wouldn't call transformers generic. It's got more moving parts & relies on more programming language control constructs than say an LSTM or a multiplicative RNN. It's got weighted context associative mem (self-attention) & unlike an LSTM,more depends on clever autodiff under
1