Differential Privacy
A mathematical framework that provides formal guarantees that individual data cannot be identified in model outputs.
Overview
Differential privacy is a mathematical framework that provides provable guarantees about the privacy of individuals in a dataset. A mechanism is differentially private if its output is statistically indistinguishable whether or not any single individual's data is included. The privacy guarantee is parameterized by epsilon — smaller values mean stronger privacy.
Key Details
In machine learning, differential privacy is applied through techniques like DP-SGD (adding calibrated noise to gradients during training) and output perturbation. It protects against membership inference attacks and data extraction from trained models. Apple, Google, and the US Census Bureau use differential privacy. The key challenge is the privacy-utility tradeoff — stronger privacy guarantees typically reduce model accuracy.