An approach to better align language models with human preferences:
We've used reinforcement learning from human feedback to train language models for summarization. The resulting models produce better summaries than 10x larger models trained only with supervised learning: openai.com/blog/learning-to-…
Sep 4, 2020 · 4:28 PM UTC
4
10
81



