Emma

Summary:

  • The original transformer figure has a slight discrepancy, with layer normalization placed differently than in the official code implementation.
  • Understanding transformers from a historical perspective helps uncover the evolution of language models.