AI Glossary

Adversarial Robustness

A model's ability to maintain correct predictions when inputs are intentionally perturbed by an attacker.

Overview

Adversarial robustness measures how well a machine learning model maintains its performance when inputs are deliberately modified by an adversary. Adversarial attacks can involve imperceptible perturbations to images, carefully crafted text inputs, or manipulated data points designed to cause misclassification.

Key Details

Defenses include adversarial training (training on adversarial examples), certified robustness (mathematical guarantees within perturbation bounds), input preprocessing, and ensemble methods. Robustness is critical for safety-critical applications like autonomous vehicles, medical diagnosis, and security systems where adversaries may actively try to fool the model.

Related Concepts

adversarial attack • ai safety • red teaming

← Back to AI Glossary