Paper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Paper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

llama-adapter LLaMA-Adapter: Efficient Fine-tuning of Language
Models with Zero-init Attention
Source

WHAT

A cheaper way to fine-tune a vanilla LLM based the on 52k input/output pairs from self-instruct.

WHY

To reduce the cost to fine-tune LLMs for instruction-following.

HOW

  • A few layers (1.2M parameters) are added to a pre-trained LLaMA model and only these are unfrozen and fine-tuned.

  • Attention mechanisms in the unfrozen layers are initialized with zeros and a gating mechanism, to prevent disturbing the information coming from the base LLM.

CLAIMS

  • Fine-tuning a LLaMA 7B model takes 1 hour. Comparable performance to Alpaca while taking 1/3 of the time.

EXTENDS/USES

NOTES

  • LLaMA-adapter also supports other modalities (audio, images, video).

  • LLaMA-adapter is a type of Parameter-Efficient Fine-Tuning (PEFT)

MY 2c

  • No quantitative comparison with Alpaca, only examples (possibly cherry-picked) and a vague claim of "comparable instruction-following proficiency with the 7B Alpaca"

References

Dialogue & Discussion