Effective Optimizer for Complex Multimodal Architectures Combining BERT and Cross-Attention
Identify which optimizer is effective for training complex multimodal deep learning architectures that combine a BERT-based textual encoder with a cross-attention Transformer component on sparse textual data, by rigorously comparing optimizers such as Adam, Nadam, and Adamax for Yelp rating prediction tasks.
References
In terms of optimization, many existing studies have adopted Adam as an optimizer; however, as described in {\bf H2}, it has yet to be clarified what optimizer is effective for a complex architecture of multimodal learning.
— An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention
(2405.07435 - Niimi, 13 May 2024) in Section 3.2 (Evaluation)