Emma

Summary:

  • Meta's LLaMA model packs powerful capabilities in a small size, while MosaicML's MPT-7B model achieves a larger context size thanks to ALiBi position encoding.
  • New techniques such as Early Dropout, Multi-Query Attention, and the LiON optimizer are being used to improve code model training and performance.