Emma
Summary:
-
The original transformer figure has a slight discrepancy, with layer normalization placed differently than in the official code implementation.
-
Understanding transformers from a historical perspective helps uncover the evolution of language models.