Paper Summary: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Last updated: 13 Mar 2026

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Source

WHAT

Chain-of-Thought (CoT) is a technique whereby the output begins with extra reasoning steps before giving the final answer.

Chain-of-Thought prompting example. Source

WHY

Because even large LLMs struggle with tasks that require multiple steps of reasoning and/or symbolic logic. And adding more parameters doesn't seem to help much.

HOW

Using few-shot prompting, one adds examples of how the model's output should include reasoning steps, before the final answer.

CLAIMS/QUOTES

Types of tasks amenable to CoT: "... chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks"
Interpretability: CoT may add some level of interpretability to the way the LLM is thinking as it produces an output. Authors note this needs more study.
CoT can be added to any model without needing retraining or fine-tuning: You can add CoT capabilities to any pre-trained LLM by just adding CoT examples and then using few-shot prompting.
Emergent behavior: CoT only works in large models: "... chain-of-thought prompting does not positively impact performance for small models, and only yields performance gains when used with models of ∼100B parameters."
- In smaller models, the chains produced were "fluent but illogical", actually making results worse than with normal prompting.
Performance vs Special-purpose models: Vanilla LLMs with CoT in-context learning outperform LLMs that have been fine-tuned to excel in specific domains (e.g. GPT-3 fine-tuned on math).
CoT helps more, the more complex a task is. The performance gains of CoT are larger for tasks that require multi-step reasoning, such as logic and common-sense tasks.

NOTES

The CoT needs not be shown to a user using the model. It can just be hidden.
The experiments used just 8 examples of CoT in the context, for few-shot prompting.
A related zero-shot approach is to simply append "let's think step-by-step" to the prompt, without any few-shot exemplars. This was proposed by Kojima et al. (2022) in "Large Language Models are Zero-Shot Reasoners", and is a separate technique not covered in this paper.

MY 2¢

It's important to realize that CoT is an inference-type technique. It does not change the training-time setup of a model at all!
- "No language models were finetuned in the process of writing this paper."
This wasn't discussed in the paper, but there definitely are latency tradeoffs when adding CoT to a model.

REFERENCES

Felipe 15 Jun 2025 13 Mar 2026 paper-summary language-modeling reasoning

WHAT

WHY

HOW

CLAIMS/QUOTES

NOTES

MY 2¢

REFERENCES

Dialogue & Discussion