Nice result: hinting to a language model that it's ok to show its work (by adding "Let's think step by step" to the prompt) rather than directly outputting the final answer massively boosts results on reasoning benchmarks. True of humans too.
Large Language Models are Zero-Shot Reasoners
Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.
arxiv.org/abs/2205.11916
May 27, 2022 · 5:00 PM UTC
2
8
1
87


