Paper Summary: LLaMA: Open and Efficient Foundation Language Models

Last updated: 02 Aug 2023

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

WHAT

An LLM (LLaMA) is trained from scratch using more data but fewer training iterations than GPT3. Only public data is used.

To test how the tradeoff data vs compute budget behaves as the scale grows.

LLaMA is a standard Transformer LLM with some optimizations used by previous LMs. It's trained exclusively on open-access data.

Models with fewer parameters are cheaper to use at inference time
LLaMA outperforms or matches LMs having 3-10x the number of parameters (GPT3, Gopher, Chinchilla) at most natural language tasks (Zero-shot and Few-shot)

Felipe 04 Jun 2023 02 Aug 2023 paper-summary llms