Instruction Tuning
Fine-tuning a language model on a dataset of instructions paired with desired responses, teaching it to follow human instructions across diverse tasks.
How It Works
The model is trained on thousands of (instruction, response) pairs covering different tasks: summarization, translation, question answering, coding, creative writing, etc. This teaches the model to interpret and follow instructions rather than just predict the next token.
Impact
Instruction tuning transforms a raw language model (which just completes text) into a helpful assistant (which follows instructions). Models like InstructGPT, Flan-T5, and Alpaca demonstrated that instruction tuning dramatically improves usability.
Data Sources
Human-written instruction-response pairs, synthetic data generated by stronger models, and crowd-sourced datasets like FLAN, Dolly, and Open Assistant.