PEFT (Parameter-Efficient Fine-Tuning)
A family of techniques that fine-tune only a small number of model parameters while keeping most of the pre-trained model frozen, dramatically reducing compute and memory requirements.
Methods
LoRA: Low-rank matrix decomposition. QLoRA: LoRA + quantization. Prefix tuning: Learn soft prompt prefixes. Adapters: Small bottleneck layers between existing layers. (IA)^3: Learned vectors that rescale activations.
Benefits
10-100x fewer trainable parameters. Multiple task-specific adapters can share one base model. Lower risk of catastrophic forgetting. Can fine-tune large models on consumer GPUs.