I'm thrilled to let everyone know that I've joined OpenAI as a research scientist on AI policy. I'm excited to work with such a great team of researchers towards the goal of making sure that AI is beneficial for all. blog.openai.com/openai-chart…
An agent which learned to play Mario without rewards. Instead, it was incentivized to avoid "boredom" (that is, getting into states where it can predict what will happen next). Discovered warp levels, how to defeat bosses, etc. More details: blog.openai.com/reinforcemen…
Performance on Montezuma's Revenge, on which the 2015 DQN algorithm famously achieved a score of 0. We've now developed an algorithm which exceeds human performance: blog.openai.com/reinforcemen…
Congrats to my co-founder @ilyasut for winning the Market for Intelligence award for ideas. Ilya is a visionary and working with him is one of the best parts of my job:
A lot of OpenAI's AGI safety research focus on specifying goals without reward functions — since it's clear that we can't hand-program objective functions for complex real-world tasks. One potential paradigm for avoiding reward functions:
Iterated Amplification: An AI safety technique that has the potential to let us specify behaviors and goals that are beyond human scale: blog.openai.com/amplifying-a…
Now accepting applications for our second OpenAI Scholars program. Diversity is crucial for ensuring that AI benefits everyone: nitter.vloup.ch/openai/status/10….
Apply for our Winter 2019 OpenAI Scholars Program, open to individuals from underrepresented groups in STEM interested in becoming deep learning practitioners -blog.openai.com/openai-schol…
Want to get into machine learning, but don't have pre-existing background? The OpenAI Fellows program is a great way for people from adjacent fields (software engineering, science, etc) to become machine learning practitioners:
Have been making serious progress on OpenAI Five. Looking for Dota teams with average MMR 6.5k+ or Tier 3 and higher to test the latest version. Ping brooke@openai.com if you’d like to try it out!
Lots of cool projects being presented tonight at OpenAI Scholars Demo Day (mobile.twitter.com/openai/st…). Really impressed with the progress everyone made in 3 months of study.
Neural waltzes, semantic trees, CycleGANs, and more - details on OpenAI Scholars' 2018 Final Projects: blog.openai.com/openai-schol…
Demo day in San Francisco on September 20th!
That feeling when your agent learns a task so well it crashes your code:
...
Mean reward: 0.9
Mean reward: 0.94
Mean reward: 1.0
./learn.py:122: RuntimeWarning: invalid value encountered in true_divide
A = (R- np.mean(R)) / np.std(R)
Mean reward: -0.02
...
(Also, "at scale" is a very relative term. After two doublings, our latest Five model is 4096-units, and consumes roughly the same amount of compute per second as a honeybee.)