Optimizing strategies to boost multimodal pretraining Transformer performance

Determine effective combinations and training strategies of pretext objectives and losses for Transformer-based multimodal pretraining that improve performance while controlling optimization challenges, such as balancing multiple loss terms and avoiding overly complex objectives.

Background

The survey observes that overly compound pretraining objectives can complicate optimization due to the need to balance different losses, and notes that the difficulty and complexity of pretext tasks may not straightforwardly translate to better performance.

Establishing principled guidelines for objective selection and multi-task optimization is needed to reliably improve multimodal pretraining outcomes.

References

How to boost the performance for multimodal pretraining Transformers is an open problem.

— Multimodal Learning with Transformers: A Survey (2206.06488 - Xu et al., 2022) in Discussion under Subsubsection "Task-Agnostic Multimodal Pretraining" (Section 4.1.1)

Optimizing strategies to boost multimodal pretraining Transformer performance

Background

References

Related Problems