Label (Ground Truth)
The correct answer or target value associated with a training example in supervised learning, used to calculate loss and guide model training.
Types of Labels
Classification: Discrete categories (cat, dog, bird). Regression: Continuous values (price = $45.50). Sequence labeling: Per-token labels (NER tags). Bounding boxes: Object locations in images.
Annotation
Labels are created by human annotators, automated systems, or LLMs. Data annotation is often the most expensive and time-consuming part of ML projects. Platforms like Scale AI, Labelbox, and Amazon SageMaker Ground Truth help manage annotation.
Label Quality
Noisy or incorrect labels directly hurt model performance. Inter-annotator agreement metrics measure label quality. Techniques like label smoothing (softening hard labels) and confident learning help handle label noise.