XLNet
A generalized autoregressive model that captures bidirectional context using permutation-based training.
Overview
XLNet, introduced by Yang et al. in 2019, addresses limitations of both autoregressive models (like GPT) and masked models (like BERT). It uses permutation language modeling, where the model learns to predict tokens in all possible orderings of the input sequence, capturing bidirectional context without the [MASK] token.
Key Details
XLNet also incorporates the Transformer-XL mechanism for handling long-range dependencies through segment-level recurrence. It outperformed BERT on 20 benchmark tasks at release, demonstrating the value of combining autoregressive modeling with bidirectional context understanding.
Related Concepts
bert • gpt • transformer