I don't see any problem whatsoever with manipulating / randomizing the simulation. Domain randomization seems HUGELY important for transfer. Just make sure the "real environment" is within the training distribution of environments, and you're set, with extra robustness!
Very interesting/fun solution to a specific problem, but quite far from how I imagine it *should* work. First there was reward shaping, now it's.. state space shaping?!
3
2
8
Replying to @catherineols
One thing that's interesting about blog.openai.com/learning-dex… — the real world *isn't* actually in the distribution of randomized simulations! Makes domain randomization feel more like something akin to a regularizer. The fact it works for both Dota and robotics feels important.

Aug 7, 2018 · 5:47 AM UTC

2
3
15
Replying to @gdb
TBH what I really want is domain randomization for aligned* agents. Train helper in a simulated world of randomized "user" agents. If it's helpful to many such "users", more likely helpful to human users! @geoffreyirving *aligned = actively trying to understand & help the user
1
1
The problem of course is knowing what properties the simulated "users" have to have - what distribution they have to come from - in order to ensure robust transfer. But humans not having to be literally inside that distribution for the approach to work would be a really nice plus
1