This is so very common. Reinforcement learning really does have an issue with generalization, and I think addressing this will be an important area of reinforcement learning research to come.
1
7
It reminds me a little bit of training a simulated racing car to drive (both with evolution, TD-family RL, and supervised learning). If it had never trained for certain situation (such as colliding with a wall), it would simply never learn to deal with that situation.
1
1
How can we humans deal with situations we have not trained for? By reasoning about them. Simulating a sequence of actions in our head. System 2 thinking, in Kahneman terminology. Tree search using a forward model, in classic AI terminology. Which brings me to my second point.
2
7
It's important to note that the setup here is very unlike e.g. Go, where AlphaGo (and all other agents) can simulate the effects of its actions - it has a forward model. In Dota, that's not the case.
2
1
1
13
Any "long-term planning" is something that the neural networks have had to learn, because we don't have a fast simulation of Dota available. Thus, it is impressive that the bots can do any long-term planning at all.
2
5
It's clear that the bots are much better at micro / (very) short-term planning, executing perfect combat on the second-to-second level. People barely even remark on this, because they are used to video game bots being better on micro than macro.
1
5
But the OpenAI bots seem to be doing some kind of long-term planning. Or do they? Have they just learned some behaviors that look like long-term planning, never actually playing the long term plan out in their "heads"?
5
6
That horizon is a 14-minute half-life on rewards, not a hard cutoff!
Aug 24, 2018 · 2:30 AM UTC
1
9


