That feeling when your agent learns a task so well it crashes your code:
...
Mean reward: 0.9
Mean reward: 0.94
Mean reward: 1.0
./learn.py:122: RuntimeWarning: invalid value encountered in true_divide
A = (R- np.mean(R)) / np.std(R)
Mean reward: -0.02
...
Sep 20, 2018 · 6:38 PM UTC
1
13
231

