AI Glossary

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that adds small, trainable low-rank matrices to existing model layers instead of updating all weights.

How It Works

Instead of fine-tuning all model weights, LoRA freezes the original model and adds two small matrices (A and B) to each attention layer. The product A*B approximates the weight changes needed for the new task. Only these small matrices are trained.

Benefits

Dramatically reduces memory requirements (often 10-100x fewer trainable parameters). Multiple LoRA adapters can be trained for different tasks and swapped at inference time. The base model remains unchanged, preventing catastrophic forgetting.

Variants

QLoRA: Combines LoRA with 4-bit quantization, enabling fine-tuning of large models on consumer GPUs. DoRA: Decomposes weight into magnitude and direction for better performance. LoRA+: Uses different learning rates for A and B matrices.

← Back to AI Glossary

LoRA (Low-Rank Adaptation)

How It Works

Benefits

Variants

Related Articles

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Guide

Related Concepts