Paper Summary: Multitask Prompted Training Enables Zero-Shot Task Generalization

Last updated: 02 Apr 2024

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

Multitask Prompted Training Enables Zero-Shot Task Generalization Source

WHAT

Investigate if and how fine-tuning a vanilla LM on text-to-text NLP tasks (like T5 summary) helps it perform better on unseen tasks.

To see how the results from T5 generalize over unseen tasks;
To compare those gains (if any) with the performance from larger vanilla LMs such as GPT3;

1) Pretrain a vanilla LM using masked language modeling on the C4 Dataset;

2) Fine-tune (SFT) that model on input-output pairs of NLP tasks described in natural language;

3) Test how the model from step 2 performs when prompted to solve NLP tasks not in the SFT training set, in a zero-shot manner.

Fine-tuning LLMs on some NLP tasks makes them better on other, unseen NLP tasks (as compared with vanilla LMs such as GPT3). Even when using much smaller models.

Masked Language Modeling FTW: "We note that masked language modeling has repeatedly been shown to be a dramatically more effective pre-training strategy."

Seems to me it was a large article with relatively few important points. Looks like an addendum to T5 only.
Very similar to FLAN, in terms of scope and findings, except for the fact that FLAN claims that fine-tuning hurts model performance on unseen tasks (if the model capacity is too low).