Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.
A way to fine-tune LLMs to follow instructions using only information from the model itself—no human annotation needed.
Because human-annotated datasets are expensive to come by.
1) Use the pre-trained LLM itself to generate input/output instruction pairs, from a small set of seed pairs (one seed example per task, 175 examples in total).
2) Perform supervised fine-tuning with the pairs from step 1), using heuristics to classify which outputs are better than others.
- In one experiment, GPT3self-instruct hits 44.4% of correct answers while InstructGPT (GPT3 aligned with RLHF) hits 50.7%.
- All tasks are represented in the form
(task definition, input/output pairs). It's a versatile way to represent any kind of task. Example below:
- No need to host a local version of GPT3. Everything was done using Open AI CLI tools and making HTTP requests to GPT3 endpoints
The contribution is how to generate alignment examples from a vanilla LLM.