LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that adds small, trainable low-rank matrices to existing model layers instead of updating all weights.
How It Works
Instead of fine-tuning all model weights, LoRA freezes the original model and adds two small matrices (A and B) to each attention layer. The product A*B approximates the weight changes needed for the new task. Only these small matrices are trained.
Benefits
Dramatically reduces memory requirements (often 10-100x fewer trainable parameters). Multiple LoRA adapters can be trained for different tasks and swapped at inference time. The base model remains unchanged, preventing catastrophic forgetting.
Variants
QLoRA: Combines LoRA with 4-bit quantization, enabling fine-tuning of large models on consumer GPUs. DoRA: Decomposes weight into magnitude and direction for better performance. LoRA+: Uses different learning rates for A and B matrices.