Mask R-CNN
An extension of Faster R-CNN that adds pixel-level instance segmentation to object detection.
Overview
Mask R-CNN, introduced by He et al. in 2017, extends Faster R-CNN by adding a branch for predicting segmentation masks on each detected object, in parallel with bounding box regression and classification. This enables instance segmentation — detecting each object and delineating its exact pixel boundary.
Key Details
The architecture uses RoIAlign (instead of RoIPool) for precise spatial alignment, which is critical for pixel-level accuracy. Mask R-CNN is widely used in autonomous driving, medical imaging, robotics, and image editing. It demonstrated that instance segmentation could be achieved with minimal overhead beyond detection.