Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
Fine-tune LaMDA-PT 137B with NLP tasks framed as natural language instructions. The final model is called FLAN.
To understand the impact of instruction-tuning LMs for free-form NLP problems.
Took supervised datasets for 12 NLP tasks and rewrote those as pure natural language tasks.
Fine-tuned a LaMDA-PT 137B model on the rewritten tasks
Compared the results from the fine-tuned model (FLAN) with the pre-trained version (LaMDA-PT) and GPT-3 on several regimes1 and tasks.
FLAN outperforms GPT-3 (untuned) on most zero-shot tasks.
FLAN performs better using zero-shot in some tasks than GPT-3 using few-shot examples.
Instruction-tuning enhances results even on unseen tasks.
model reach a minimum number of parameters. Under that threshold, fine-tuning
actually hurts performance. Source
Data processing from T5 summary
Prompt Tuning (Lester et al., 2021)
1: Zero-shot and few-shot learning.