Test-Time Personalization with Meta Prompt for Gaze Estimation (2401.01577v3)
Abstract: Despite the recent remarkable achievement in gaze estimation, efficient and accurate personalization of gaze estimation without labels is a practical problem but rarely touched on in the literature. To achieve efficient personalization, we take inspiration from the recent advances in NLP by updating a negligible number of parameters, "prompts", at the test time. Specifically, the prompt is additionally attached without perturbing original network and can contain less than 1% of a ResNet-18's parameters. Our experiments show high efficiency of the prompt tuning approach. The proposed one can be 10 times faster in terms of adaptation speed than the methods compared. However, it is non-trivial to update the prompt for personalized gaze estimation without labels. At the test time, it is essential to ensure that the minimizing of particular unsupervised loss leads to the goals of minimizing gaze estimation error. To address this difficulty, we propose to meta-learn the prompt to ensure that its updates align with the goal. Our experiments show that the meta-learned prompt can be effectively adapted even with a simple symmetry loss. In addition, we experiment on four cross-dataset validations to show the remarkable advantages of the proposed method. Code is available at https://github.com/hmarkamcan/TPGaze.
- Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction, 6(1): 25–63.
- Generalizing gaze estimation with rotation consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4207–4216.
- Biswas, P.; et al. 2021. Appearance-based gaze estimation using attention and difference mechanism. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3143–3152.
- Utilizing VR and gaze tracking to develop AR solutions for industrial maintenance. In Proceedings of the CHI Conference on Human Factors in Computing Systems, 1–13.
- Source-Free Adaptive Gaze Estimation by Uncertainty Reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22035–22045.
- Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing. In ACM symposium on eye tracking research and applications, 1–10.
- Offset calibration for appearance-based gaze estimation via gaze decomposition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 270–279.
- A coarse-to-fine adaptive network for appearance-based gaze estimation. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Appearance-based gaze estimation via evaluation-guided asymmetric regression. In Proceedings of the European Conference on Computer Vision (ECCV), 100–115.
- Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark. arXiv preprint arXiv:2104.12668.
- Gaze estimation by exploring two-eye asymmetry. IEEE Transactions on Image Processing, 29: 5259–5272.
- Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9137–9146.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, 1126–1135.
- Rt-gene: Real-time eye gaze estimation in natural environments. In Proceedings of the European conference on computer vision (ECCV), 334–352.
- Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In Proceedings of the symposium on eye tracking research and applications, 255–258.
- Mtgls: Multi-task gaze estimation with limited supervision. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3223–3234.
- Domain adaptation gaze estimation by embedding with prediction consistency. In Proceedings of the Asian Conference on Computer Vision.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Visual prompt tuning. In Proceedings of the European Conference on Computer Vision (ECCV), 709–727. Springer.
- Prompting visual-language models for efficient video understanding. In Proceedings of the European Conference on Computer Vision (ECCV), 105–124. Springer.
- Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6912–6921.
- Speaker-adaptive Lip Reading with User-dependent Padding. In Proceedings of the European Conference on Computer Vision (ECCV), 576–593. Springer.
- Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition. arXiv preprint arXiv:2302.08102.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Eye tracking for everyone. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2176–2184.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Multiview multitask gaze estimation with deep convolutional neural networks. IEEE transactions on neural networks and learning systems, 30(10): 3010–3023.
- RGBD based gaze estimation via multi-task CNN. In Proceedings of the AAAI conference on artificial intelligence.
- Towards fast adaptation of neural architectures with meta learning. In International Conference on Learning Representations.
- A differential approach for gaze estimation. IEEE transactions on pattern analysis and machine intelligence, 43(3).
- Meta-Auxiliary Learning for Future Depth Prediction in Videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 5756–5765.
- Towards Multi-Domain Single Image Dehazing via Test-Time Training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5831–5840.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35.
- Self-supervised generalisation with meta auxiliary learning. Advances in Neural Information Processing Systems, 32.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.
- Generalizing gaze estimation with outlier-guided collaborative adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3835–3844.
- Few-shot adaptive gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision, 9368–9377.
- Deep pictorial gaze estimation. In Proceedings of the European conference on computer vision (ECCV), 721–738.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763.
- Viewing direction estimation based on 3D eyeball construction for HRI. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 24–31. IEEE.
- GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9676–9685.
- Meta-learning with memory-augmented neural networks. In International conference on machine learning, 1842–1850.
- Personalization of deep learning. In Proceedings of the 3rd International Data Science Conference–iDSC2020, 89–96. Springer.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
- Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning, 9229–9248.
- Generalizing eye tracking with bayesian adversarial learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11907–11916.
- Contrastive regression for domain adaptation on gaze estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19376–19385.
- Wollaston, W. H. 1824. Xiii. on the apparent direction of eyes in a portrait. Philosophical Transactions of the Royal Society of London.
- Eyetab: Model-based gaze estimation on unmodified tablet computers. In Proceedings of the symposium on eye tracking research and applications.
- Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797.
- Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11937–11946.
- Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In Proceedings of the European Conference on Computer Vision (ECCV), 365–381. Springer.
- It’s written all over your face: Full-face appearance-based gaze estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 51–60.
- Mpiigaze: Real-world dataset and deep appearance-based gaze estimation. IEEE transactions on pattern analysis and machine intelligence, 41(1): 162–175.
- Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts. In Advances in Neural Information Processing Systems, volume 35, 22243–22257.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9): 2337–2348.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.