For differentiable problems, there’s backpropagation. For everything else, there’s RL.

Jan 31, 2019 · 5:11 PM UTC

17
58
6
451
Replying to @gdb
This thread confuses problems and algorithms. Based on its name, "Reinforcement Learning" is learning from (possibly-delayed) bandit feedback. That is, you only obtain feedback on the actions you take, and the feedback does not tell you what the correct action should be
1
1
7
Replying to @gdb
For everything that could be simulated there's RL
3
Replying to @gdb
Probably just REINFORCE, which is not strictly for reinforcement learning but also used for non-differentiable objective functions.
Replying to @gdb
When you only have a hammer...
3
Replying to @gdb
I’ve always thought of RL (where the difference between estimated and observed returns is calculated) as performing gradient descent (even in straight MDP). SGD for supervised problems is a special case of RL backup rules (not the other way around).
1
Replying to @gdb
Doesnt RL still use backpropagation though?