That feeling when your agent learns a task so well it crashes your code: ... Mean reward: 0.9 Mean reward: 0.94 Mean reward: 1.0 ./learn.py:122: RuntimeWarning: invalid value encountered in true_divide A = (R- np.mean(R)) / np.std(R) Mean reward: -0.02 ...

Sep 20, 2018 · 6:38 PM UTC

1
13
231
Replying to @gdb
Come on, don't tease us like this all the time, give us something to watch.
1