Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture (2403.07347v2)
Abstract: Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with stable energy always possess spatial structure and less noise, making them suitable for modeling the subtle motion field. To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space. Since high-frequency details and subtle motions are susceptible to information degradation due to their inherent subtlety and unavoidable external interference from noise, we carefully design Sparse High/Low-pass Filters to enhance the integrity of details and motion structures, and a Sparse Frequency Mixer to promote seamless recoupling. Besides, we innovatively design a contrastive regularization for this task to strengthen the model's ability to discriminate irrelevant features, reducing undesired motion magnification. Extensive experiments on both Real-world and Synthetic Datasets show that our FD4MM outperforms SOTA methods. Meanwhile, FD4MM reduces FLOPs by 1.63$\times$ and boosts inference speed by 1.68$\times$ than the latest method. Our code is available at https://github.com/Jiafei127/FD4MM.
- A novel noninvasive method for remote heart failure monitoring: the eulerian video magnification applications in heart failure study (amplify). NPJ Digital Medicine, 2(1):80, 2019.
- Image coding using wavelet transform. IEEE TIP, 1:20–5, 1992.
- David F Barbe. Imaging devices using the charge-coupled concept. Proceedings of the IEEE, 63(1):38–67, 1975.
- Unsupervised behaviour analysis and magnification (ubam) using deep learning. Nature Machine Intelligence, 3(6):495–506, 2021.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607, 2020.
- Learning a sparse transformer network for effective image deraining. In CVPR, pages 5896–5905, 2023.
- Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In ICCV, pages 3435–3444, 2019.
- Discrete haze level dehazing network. In ACM MM, pages 1828–1836, 2020.
- Visual vibrometry: Estimating material properties from small motion in video. In CVPR, pages 5335–5343, 2015.
- Visual vibrometry: Estimating material properties from small motions in video. IEEE TPAMI, 39(4), 2017.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In CVPR, pages 11963–11975, 2022.
- Unsupervised magnification of posture deviations across subjects. In CVPR, pages 8256–8266, 2020.
- Effect of broad-band phase-based motion magnification on modal parameter estimation. Mechanical Systems and Signal Processing, 146:106995, 2021.
- 3d motion magnification: Visualizing subtle motions from time-varying radiance fields. In ICCV, pages 9837–9846, 2023.
- Neural image compression via attentional multi-scale back projection and frequency decomposition. In ICCV, pages 14677–14686, 2021.
- Dadnet: Dilated-attention-deformable convnet for crowd counting. In Proceedings of the 27th ACM international conference on multimedia, pages 1823–1832, 2019.
- Momentum contrast for unsupervised visual representation learning. In CVPR, pages 9729–9738, 2020.
- Motion magnification in robotic sonography: Enabling pulsation-aware artery segmentation. IROS, 2023a.
- Contrastive semi-supervised learning for underwater image restoration via reliable bank. In CVPR, pages 18145–18155, 2023b.
- Perceptual losses for real-time style transfer and super-resolution. In ECCV, pages 694–711, 2016.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Anh Cat Le Ngo and Raphael C-W Phan. Seeing the invisible: Survey of video motion magnification and small motion analysis. ACM CSUR, 52(6):1–20, 2019.
- Proposal-free video grounding with contextual pyramid network. In AAAI, pages 1902–1910, 2021.
- Vigt: proposal-free video grounding with a learnable token in the transformer. Science China Information Sciences, 66(10):202102, 2023a.
- Transformer-based visual grounding with cross-modality interaction. ACM Transactions on Multimedia Computing, Communications and Applications, 19(6):1–19, 2023b.
- Dlgsanet: Lightweight dynamic local and global self-attention networks for image super-resolution. In ICCV, pages 12792–12801, 2023c.
- Dilated convolutional transformer for high-quality image deraining. In CVPRW, pages 4198–4206, 2023d.
- Omni-frequency channel-selection representations for unsupervised anomaly detection. IEEE TIP, 32:4327–4340, 2023.
- Motion magnification. ACM TOG, 24(3):519–526, 2005.
- Gating dropout: Communication-efficient regularization for sparsely activated transformers. In ICLR, pages 13782–13792, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
- Micron-bert: Bert-based facial micro-expression recognition. In CVPR, pages 1482–1492, 2023.
- Deconvolution and checkerboard artifacts. Distill, 1(10):e3, 2016.
- Learning-based video motion magnification. In ECCV, pages 633–648, 2018.
- Fast vision transformers with hilo attention. In NeurIPS, pages 14541–14554, 2022.
- Highly accurate dichotomous image segmentation. In ECCV, pages 38–56, 2022.
- Revealing invisible changes in the world. Science, 339(6119):519–519, 2013.
- Adaptive dynamic filtering network for image denoising. In AAAI, pages 2227–2235, 2023a.
- A study on relu and softmax in transformer. arXiv preprint arXiv:2302.06461, 2023b.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, pages 1874–1883, 2016.
- Inception transformer. In NeurIPS, pages 23495–23509, 2022.
- Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- Lightweight network for video motion magnification. In WACV, pages 2041–2050, 2023a.
- Multi domain learning for motion magnification. In CVPR, pages 13914–13923, 2023b.
- Jerk-aware video acceleration magnification. In CVPR, pages 1769–1777, 2018.
- Video magnification in the wild using fractional anisotropy in temporal distribution. In CVPR, pages 1614–1622, 2019.
- Bilateral video magnification filter. In CVPR, pages 17369–17378, 2022.
- Graph-based multimodal sequential embedding for sign language translation. IEEE TMM, 24:4433–4445, 2021.
- Gloss semantic-enhanced network with online back-translation for sign language production. In ACM MM, pages 5630–5638, 2022.
- Stripformer: Strip transformer for fast image deblurring. In ECCV, pages 146–162, 2022.
- Phase-based video motion processing. ACM TOG, 32(4):1–10, 2013.
- Eulermormer: Robust eulerian motion magnification via dynamic filtering within transformer. arXiv preprint arXiv:2312.04152, 2023.
- Kvt: k-nn attention for boosting vision transformers. In ECCV, pages 285–302, 2022a.
- Motion-induced error reduction for phase-shifting profilometry with phase probability equalization. Optics and Lasers in Engineering, 156:107088, 2022b.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004.
- Deraincyclegan: Rain attentive cyclegan for single image deraining and rainmaking. IEEE TIP, 30:4788–4801, 2021.
- Robust attention deraining network for synchronous rain streaks and raindrops removal. In ACM MM, pages 6464–6472, 2022a.
- Sginet: Toward sufficient interaction between single image deraining and semantic segmentation. In ACM MM, pages 6202–6210, 2022b.
- Replacing softmax with relu in vision transformers. arXiv preprint arXiv:2309.08586, 2023.
- Contrastive learning for compact single image dehazing. In CVPR, pages 10551–10560, 2021.
- Eulerian video magnification for revealing subtle changes in the world. ACM TOG, 31(4):1–8, 2012.
- Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE TIP, 29:8590–8605, 2020.
- Maniqa: Multi-dimension attention network for no-reference image quality assessment. In CVPRW, pages 1191–1200, 2022.
- Spanet: Frequency-balancing token mixer using spectral pooling aggregation modulation. In ICCV, pages 6113–6124, 2023.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, pages 5728–5739, 2022.
- Hybrid eulerian–lagrangian framework for structural full-field vibration quantification and modal shape visualization. Measurement, 219:113270, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
- Edge detection algorithm of image fusion based on improved sobel operator. In 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference, pages 457–461, 2017a.
- Video acceleration magnification. In CVPR, pages 529–537, 2017b.
- Image demoireing with learnable bandpass filters. In CVPR, pages 3636–3645, 2020.
- Audio–visual segmentation. In ECCV, pages 386–403, 2022.
- Contrastive positive sample propagation along the audio-visual event line. IEEE TPAMI, 45(6):7239–7257, 2023a.
- Srformer: Permuted self-attention for single image super-resolution. In ICCV, pages 12780–12791, 2023b.