You make what you measure. Procgen Benchmark lets you directly measure how well and how quickly an RL agent learns generalizable skills. We've found that with fewer than ~500-1000 levels, today's algorithms memorize rather than learn something general.
We're releasing Procgen Benchmark, 16 procedurally-generated environments for measuring how quickly a reinforcement learning agent learns generalizable skills. This has become the standard research platform used by the OpenAI RL team: openai.com/blog/procgen-benc…

Dec 3, 2019 · 5:19 PM UTC

3
27
2
129
Replying to @gdb
Bleeding edge amazing! Seems a good solution for a system with far more capable memory than humans. Perhaps not ideal in an analog environment outside of a computer. True generalization may be more important in the endless unforeseeable edge cases of real world interaction.