Data Poisoning
An adversarial attack where malicious data is injected into a training dataset to corrupt the model's learned behavior.
How It Works
An attacker introduces carefully crafted examples into the training data. These poisoned examples can cause the model to learn incorrect associations, create backdoors (triggers that cause specific misclassification), or degrade overall performance.
Examples
Poisoning a spam filter's training data so it learns to classify certain spam as legitimate. Adding backdoor triggers to an image classifier so a specific pattern always causes a target classification.
Defenses
Data validation and anomaly detection, provenance tracking (knowing where data comes from), robust training methods, differential privacy, and careful curation of training datasets.