- tags
- Transformers, GPT, BERT, T5
- paper
- (Shoeybi et al. 2020)
Architecture
The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used.
Bibliography
- Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. . March 13, 2020DOI.
Loading comments...