Paper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Paper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Last updated: 14 Jan 2024

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

llama-adapter

LLaMA-Adapter: Efficient Fine-tuning of Language
Models with Zero-init Attention
Source

WHAT

A cheaper way to fine-tune a vanilla LLM based the on 52k input/output pairs from self-instruct.

WHY

To reduce the cost to fine-tune LLMs for instruction-following.

HOW

A few layers (1.2M parameters) are added to a pre-trained LLaMA model and only these are unfrozen and fine-tuned.
Attention mechanisms in the unfrozen layers are initialized with zeros and a gating mechanism, to prevent disturbing the information coming from the base LLM.

CLAIMS

Fine-tuning a LLaMA 7B model takes 1 hour. Comparable performance to Alpaca while taking 1/3 of the time.

EXTENDS/USES

Adapter-based Fine-tuning from Houlsby et al 2019
Fine-tuning input/output pairs from Self-instruct
Base LLM from LLaMA

NOTES

LLaMA-adapter also supports other modalities (audio, images, video).
LLaMA-adapter is a type of Parameter-Efficient Fine-Tuning (PEFT)

MY 2c

No quantitative comparison with Alpaca, only examples (possibly cherry-picked) and a vague claim of "comparable instruction-following proficiency with the 7B Alpaca"

References

Felipe 04 Jun 2023 14 Jan 2024 paper-summary language-modeling instruction-following

