Systematic evaluation of alternative motion-aware masking strategies
Investigate and quantitatively compare the performance impact of alternative motion-aware visible-token sampling strategies—specifically the motion bins, Bernoulli high/low, sorting, uniform top‑k, and exclude top‑k schemes—within the TrackMAE masked video pretraining framework across different pretraining datasets (such as Kinetics-400 and Something-Something V2) and across diverse downstream tasks, to determine which strategies and hyperparameters yield the most robust and generalizable video representations.
References
We leave the exploration of such masking strategies, their impact on different pretraining data and downstream tasks for future work.
— TrackMAE: Video Representation Learning via Track Mask and Predict
(2603.27268 - Vandeghen et al., 28 Mar 2026) in Supplementary, Section: Discussion on Motion Masking; Subsection: Sampling Strategy in Masking