BLOOM

tags
Transformers, GPT, NLP
blog post
BLOOM announcement blog post

Architecture

It is similar to the architecture of GPT-3, using full attention instead of sparse attention.

Parameter count

176B

Bibliography

    Last changed | authored by

    Comments

    Loading comments...

    Leave a comment

    Back to Notes