Optimal layer–dimension configurations for Starbucks embeddings
Determine whether there exist specific combinations of transformer encoder layer counts and embedding dimensions in BERT-based embedding models trained with Starbucks Representation Learning (and optionally Starbucks Masked Autoencoding pre-training) that yield higher effectiveness than configurations obtained by simply increasing layers and dimensions.
Sponsor
References
Our results show that increasing dimension and layer numbers always led to improvements in effectiveness; however, we still do not know if there are specific combinations of layers and dimensions that would be more effective.
— Starbucks-v2: Improved Training for 2D Matryoshka Embeddings
(2410.13230 - Zhuang et al., 17 Oct 2024) in Section: Limitations