AI Glossary

Data Poisoning

An adversarial attack where malicious data is injected into a training dataset to corrupt the model's learned behavior.

How It Works

An attacker introduces carefully crafted examples into the training data. These poisoned examples can cause the model to learn incorrect associations, create backdoors (triggers that cause specific misclassification), or degrade overall performance.

Examples

Poisoning a spam filter's training data so it learns to classify certain spam as legitimate. Adding backdoor triggers to an image classifier so a specific pattern always causes a target classification.

Defenses

Data validation and anomaly detection, provenance tracking (knowing where data comes from), robust training methods, differential privacy, and careful curation of training datasets.

← Back to AI Glossary

Last updated: March 5, 2026