We're now on our 3rd generation of system which learns from human preferences (cf openai.com/blog/deep-reinfor…, openai.com/blog/fine-tuning-…). I'm hopeful that this approach will ultimately help align powerful AI systems without needing to explicitly write down "what humans want".
A very rare bit of research that is directly, straight-up relevant to real alignment problems! They trained a reward function on human preferences AND THEN measured how hard you could optimize against the trained function before the results got actually worse.

Sep 4, 2020 · 7:08 PM UTC

3
16
96
Replying to @gdb
Let's clear all pending access request on this nice occasion 😊😇 #gpt3
2