- tags
- Transformers, NLP
- paper
- (Dai et al. 2019)
Architecture
This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer.
Parameter count
151M
Bibliography
- Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. . June 2, 2019DOI.
Loading comments...