Adversarial Attack
Deliberately crafted inputs designed to cause AI models to make incorrect predictions, often imperceptible to humans but devastating to models.
Types
Evasion attacks: Modify input at test time (adding noise to images). Poisoning attacks: Corrupt training data. Model extraction: Steal a model by querying its API. Prompt injection: Override LLM instructions with malicious prompts.
Defenses
Adversarial training (include adversarial examples in training), input preprocessing, ensemble methods, certified defenses (provable robustness), and anomaly detection on inputs.