Papers
Topics
Authors
Recent
2000 character limit reached

Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training (2412.16148v2)

Published 20 Dec 2024 in cs.CV

Abstract: Vision LLMs (VLMs) can be trained more efficiently if training sets can be reduced in size. Recent work has shown the benefits of masking text during VLM training using a variety of approaches: truncation, random masking, block masking and syntax masking. In this paper, we show that the best masking strategy changes over training epochs and that, given sufficient training epochs. We analyze existing text masking approaches including syntax masking, which is currently the state of the art, and identify the word frequency distribution as important in determining their success. Experiments on a large range of data sets demonstrate that syntax masking is outperformed by other approaches, given sufficient epochs, and that our proposed frequency-based approach, called Contrastive Language-Image Pre-training with Word Frequency Masking (CLIPF) has numerous advantages. The benefits are particularly evident as the number of input tokens decreases.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.