Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
Models with Zero-init Attention
A cheaper way to fine-tune a vanilla LLM based the on 52k input/output pairs from self-instruct.
To reduce the cost to fine-tune LLMs for instruction-following.
A few layers (1.2M parameters) are added to a pre-trained LLaMA model and only these are unfrozen and fine-tuned.
Attention mechanisms in the unfrozen layers are initialized with zeros and a gating mechanism, to prevent disturbing the information coming from the base LLM.
- Fine-tuning a LLaMA 7B model takes 1 hour. Comparable performance to Alpaca while taking 1/3 of the time.
- Adapter-based Fine-tuning from Houlsby et al 2019
Fine-tuning input/output pairs from Self-instruct
Base LLM from LLaMA
LLaMA-adapter also supports other modalities (audio, images, video).
LLaMA-adapter is a type of Parameter-Efficient Fine-Tuning (PEFT)
- No quantitative comparison with Alpaca, only examples (possibly cherry-picked) and a vague claim of "comparable instruction-following proficiency with the 7B Alpaca"