For differentiable problems, there’s backpropagation. For everything else, there’s RL.
17
58
6
450
Not quite right. A more accurate statement would be "for everything else, there is gradient-free (zeroth-order) optimization." RL is when there is a sequential decision process and what you see depends on previous actions you took.
8
24
296
Replying to @ylecun
We use different definitions of RL. In mine, any problem can be phrased as a one-step MDP (such as in arxiv.org/abs/1611.01578), and zeroth-order optimization is a special case. Can debate definitions, but I use mine because algos like PPO are doing RL regardless of MDP used.

Feb 3, 2019 · 5:59 PM UTC

3
28
Replying to @gdb @ylecun
Someone should really clear up the definition of RL. For some it’s sequential decision MDPs, for some it’s a collection of algorithms, for some it’s anything that involves an agent/environment loop, …
1
2
15
Replying to @gdb @ylecun
What about AGI, is it differentiable? Many people agree is not RL since we don't have the luxury of so many attempts
1
1
Replying to @gdb @ylecun
Please do it.