One common AI safety concern is that coding a full set of values, with all their nuances and trade offs, into an AI system might be intractable. So we want to develop techniques for machines to learn from humans in natural language. Step in that direction:
We've fine-tuned GPT-2 using human feedback for tasks such as summarizing articles, matching the preferences of human labelers (if not always our own). We're hoping this brings safety methods closer to machines learning values by talking with humans. openai.com/blog/fine-tuning-…

Sep 19, 2019 · 4:17 PM UTC

1
6
56
Replying to @gdb
That is an important step! But no panacea: Coding values into human beings via natural language has often terrible outcomes: Nazism, Maoism, Stalinism etc. were results of human deontology. Without better models of consequences, human morality may be endangering human survival.
5