Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multistream Gaze Estimation with Anatomical Eye Region Isolation by Synthetic to Real Transfer Learning (2206.09256v2)

Published 18 Jun 2022 in cs.CV and cs.LG

Abstract: We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information through a multistream framework. Our proposed solution comprises two components, first a network for isolating anatomical eye regions, and a second network for multistream gaze estimation. The eye region isolation is performed with a U-Net style network which we train using a synthetic dataset that contains eye region masks for the visible eyeball and the iris region. The synthetic dataset used in this stage is procured using the UnityEyes simulator, and consists of 80,000 eye images. Successive to training, the eye region isolation network is then transferred to the real domain for generating masks for the real-world eye images. In order to successfully make the transfer, we exploit domain randomization in the training process, which allows for the synthetic images to benefit from a larger variance with the help of augmentations that resemble artifacts. The generated eye region masks along with the raw eye images are then used together as a multistream input to our gaze estimation network, which consists of wide residual blocks. The output embeddings from these encoders are fused in the channel dimension before feeding into the gaze regression layers. We evaluate our framework on three gaze estimation datasets and achieve strong performances. Our method surpasses the state-of-the-art by 7.57% and 1.85% on two datasets, and obtains competitive results on the other. We also study the robustness of our method with respect to the noise in the data and demonstrate that our model is less sensitive to noisy data. Lastly, we perform a variety of experiments including ablation studies to evaluate the contribution of different components and design choices in our solution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. L. R. Young and D. Sheena, “Survey of eye movement recording methods,” Behavior Research Methods & Instrumentation, vol. 7, no. 5, pp. 397–429, 1975.
  2. J. Zagermann, U. Pfeil, and H. Reiterer, “Measuring cognitive load using eye tracking technology in visual computing,” in Proceedings of the 6th Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization, 2016, pp. 78–85.
  3. Y. Yamada and M. Kobayashi, “Detecting mental fatigue from eye-tracking data gathered while watching video: Evaluation in younger and older adults,” Artificial Intelligence in Medicine, vol. 91, pp. 39–48, 2018.
  4. C. Jyotsna and J. Amudha, “Eye gaze as an indicator for stress level analysis in students,” International Conference on Advances in Computing, Communications and Informatics, pp. 1588–1593, 2018.
  5. P. Majaranta and A. Bulling, “Eye tracking and eye-based human–computer interaction,” Advances in Physiological Computing, pp. 39–65, 2014.
  6. S. Andrist, X. Z. Tan, M. Gleicher, and B. Mutlu, “Conversational gaze aversion for humanlike robots,” Proceedings of the 9th ACM/IEEE International Conference on Human-Robot Interaction, pp. 25–32, 2014.
  7. H. Liu and I. Heynderickx, “Visual attention in objective image quality assessment: Based on eye-tracking data,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 7, pp. 971–982, 2011.
  8. H. M. Park, S. H. Lee, and J. S. Choi, “Wearable augmented reality system using gaze interaction,” in 7th IEEE/ACM International Symposium on Mixed and Augmented Reality, 2008, pp. 175–176.
  9. V. Clay, P. König, and S. Koenig, “Eye tracking in virtual reality,” Journal of Eye Movement Research, vol. 12, no. 1, 2019.
  10. T. Louw and N. Merat, “Are you in the loop? using gaze dispersion to understand driver visual attention during vehicle automation,” Transportation Research Part C: Emerging Technologies, vol. 76, pp. 35–50, 2017.
  11. S. Baluja and D. Pomerleau, “Non-intrusive gaze tracking using artificial neural networks,” Advances in Neural Information Processing Systems, vol. 6, 1993.
  12. K.-H. Tan, D. J. Kriegman, and N. Ahuja, “Appearance-based eye gaze estimation,” in 6th IEEE Workshop on Applications of Computer Vision, 2002, pp. 191–195.
  13. Y. Sugano, Y. Matsushita, and Y. Sato, “Learning-by-synthesis for appearance-based 3d gaze estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1821–1828.
  14. K. A. Funes Mora, F. Monay, and J.-M. Odobez, “Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras,” in Proceedings of the Symposium on Eye Tracking Research and Applications, 2014, pp. 255–258.
  15. X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “Mpiigaze: Real-world dataset and deep appearance-based gaze estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 162–175, 2017.
  16. A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2107–2116.
  17. K. Lee, H. Kim, and C. Suh, “Simulated+unsupervised learning with adaptive data generation and bidirectional mappings,” in International Conference on Learning Representations, 2018.
  18. Y. Yu, G. Liu, and J.-M. Odobez, “Deep multitask gaze estimation with a constrained landmark-gaze model,” in Proceedings of the European Conference on Computer Vision Workshops, 2018.
  19. X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “Appearance-based gaze estimation in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4511–4520.
  20. S. Park, X. Zhang, A. Bulling, and O. Hilliges, “Learning to find eye region landmarks for remote gaze estimation in unconstrained settings,” in Proceedings of the ACM Symposium on Eye Tracking Research & Applications, 2018, pp. 1–10.
  21. N. Sinha, M. Balazia, and F. Bremond, “Flame: Facial landmark heatmap activated multimodal gaze estimation,” in 17th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2021, pp. 1–8.
  22. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer Assisted Intervention.   Springer, 2015, pp. 234–241.
  23. E. Wood, T. Baltrušaitis, L.-P. Morency, P. Robinson, and A. Bulling, “Learning an appearance-based gaze estimator from one million synthesised images,” in Proceedings of the 9th Biennial ACM Symposium on Eye Tracking Research & Applications, 2016, pp. 131–138.
  24. S. Park, A. Spurr, and O. Hilliges, “Deep pictorial gaze estimation,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 721–738.
  25. B. A. Smith, Q. Yin, S. K. Feiner, and S. K. Nayar, “Gaze locking: passive eye contact detection for human-object interaction,” in Proceedings of the 26th annual ACM symposium on User interface software and technology, 2013, pp. 271–280.
  26. Y. Yu, G. Liu, and J.-M. Odobez, “Improving few-shot user-specific gaze adaptation via gaze redirection synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 937–11 946.
  27. K. Wang, R. Zhao, H. Su, and Q. Ji, “Generalizing eye tracking with bayesian adversarial learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 907–11 916.
  28. Y. Yu and J.-M. Odobez, “Unsupervised representation learning for gaze estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7314–7324.
  29. S. Ghosh, M. Hayat, A. Dhall, and J. Knibbe, “Mtgls: Multi-task gaze estimation with limited supervision,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 3223–3234.
  30. P. Kellnhofer, A. Recasens, S. Stent, W. Matusik, and A. Torralba, “Gaze360: Physically unconstrained gaze estimation in the wild,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6912–6921.
  31. Z. Mahmud, P. Hungler, and A. Etemad, “Gaze estimation with eye region segmentation and self-supervised multistream learning,” AAAI Workshop on Human-Centric Self-Supervised Learning, 2022.
  32. X. Cai, J. Zeng, S. Shan, and X. Chen, “Source-free adaptive gaze estimation by uncertainty reduction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 035–22 045.
  33. X. Zhang, S. Park, T. Beeler, D. Bradley, S. Tang, and O. Hilliges, “Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation,” in Proceedings of the 16th European Conference on Computer Vision.   Springer, 2020, pp. 365–381.
  34. K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba, “Eye tracking for everyone,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2176–2184.
  35. X. Wang, L. Xie, C. Dong, and Y. Shan, “Real-esrgan: Training real-world blind super-resolution with pure synthetic data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1905–1914.
  36. S. Jin, J. Dai, and T. Nguyen, “Kappa angle regression with ocular counter-rolling awareness for gaze estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2658–2667.
  37. S. Jindal and R. Manduchi, “Contrastive representation learning for gaze estimation,” in Proceedings of The 1st Gaze Meets ML Workshop.   PMLR, 2023, pp. 37–49.
  38. S. Park, E. Aksan, X. Zhang, and O. Hilliges, “Towards end-to-end video-based eye-tracking,” in Proceedings of the 16th European Conference on Computer Vision.   Springer, 2020, pp. 747–763.
  39. F. Martinez, A. Carbone, and E. Pissaloux, “Gaze estimation using local features and non-linear regression,” in 19th IEEE International Conference on Image Processing, 2012, pp. 1961–1964.
  40. M. X. Huang, T. C. Kwok, G. Ngai, H. V. Leong, and S. C. Chan, “Building a self-learning eye gaze model from user interaction data,” in Proceedings of the 22nd ACM International Conference on Multimedia, 2014, pp. 1017–1020.
  41. X. Xiong, Z. Liu, Q. Cai, and Z. Zhang, “Eye gaze tracking using an rgbd camera: A comparison with a rgb solution,” in Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014, pp. 1113–1121.
  42. E. Wood and A. Bulling, “Eyetab: Model-based gaze estimation on unmodified tablet computers,” in Proceedings of the Symposium on Eye Tracking Research and Applications, 2014, pp. 207–210.
  43. E. Wood, T. Baltrušaitis, L.-P. Morency, P. Robinson, and A. Bulling, “A 3d morphable eye region model for gaze estimation,” in European Conference on Computer Vision.   Springer, 2016, pp. 297–313.
  44. K. Wang and Q. Ji, “Real time eye gaze tracking with 3d deformable eye-face model,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1003–1011.
  45. A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision.   Springer, 2016, pp. 483–499.
  46. A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.
  47. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.
  48. E. Wood, T. Baltrušaitis, X. Zhang, Y. Sugano, P. Robinson, and A. Bulling, “Rendering of eyes for eye-shape registration and gaze estimation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3756–3764.
  49. Y. Cheng, Y. Bao, and F. Lu, “Puregaze: Purifying gaze feature for generalizable gaze estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
  50. G. Liu, Y. Yu, K. A. F. Mora, and J.-M. Odobez, “A differential approach for gaze estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 3, pp. 1092–1099, 2019.
  51. S. Park, S. D. Mello, P. Molchanov, U. Iqbal, O. Hilliges, and J. Kautz, “Few-shot adaptive gaze estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9368–9377.
  52. Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,” in European Conference on Computer Vision.   Springer, 2016, pp. 87–102.
  53. Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for recognising faces across pose and age,” in 13th IEEE International Conference on Automatic Face and Gesture Recognition), 2018, pp. 67–74.
  54. “Casia iris image database,” https://hycasia.github.io/dataset/casia-irisv4/, 2004.
  55. H. Proença and L. A. Alexandre, “Ubiris: A noisy iris image database,” in 13th International Conference on Image Analysis and Processing.   Springer, 2005, pp. 970–977.
  56. H. Proença, S. Filipe, R. Santos, J. Oliveira, and L. A. Alexandre, “The ubiris. v2: A database of visible wavelength iris images captured on-the-move and at-a-distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1529–1535, 2009.
  57. S. J. Garbin, Y. Shen, I. Schuetz, R. Cavin, G. Hughes, and S. S. Talathi, “Openeds: Open eye dataset,” arXiv preprint arXiv:1905.03702, 2019.
  58. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
  59. P. Kansal and S. Devanathan, “Eyenet: Attention based convolutional encoder-decoder network for eye region segmentation,” in IEEE/CVF International Conference on Computer Vision Workshop, pp. 3688–3693.
  60. S.-H. Kim, G.-S. Lee, H.-J. Yang et al., “Eye semantic segmentation with a lightweight model,” in IEEE/CVF International Conference on Computer Vision Workshop, 2019, pp. 3694–3697.
  61. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
  62. A. K. Chaudhary, R. Kothari, M. Acharya, S. Dangi, N. Nair, R. Bailey, C. Kanan, G. Diaz, and J. B. Pelz, “Ritnet: Real-time semantic segmentation of the eye for gaze tracking,” in IEEE/CVF International Conference on Computer Vision Workshop, 2019, pp. 3698–3702.
  63. J. Perry and A. S. Fernandez, “Eyeseg: Fast and efficient few-shot semantic segmentation,” in in Proceedings of the European Conference on Computer Vision Workshops.   Springer, 2020, pp. 570–582.
  64. A. K. Chaudhary, P. K. Gyawali, L. Wang, and J. B. Pelz, “Semi-supervised learning for eye image segmentation,” in ACM Symposium on Eye Tracking Research and Applications, 2021, pp. 1–7.
  65. Y. Shen, O. Komogortsev, and S. S. Talathi, “Domain adaptation for eye segmentation,” in in Proceedings of the European Conference on Computer Vision Workshops.   Springer, 2020, pp. 555–569.
  66. S. Zagoruyko and N. Komodakis, “Wide residual networks,” 27th British Machine Vision Conference, 2016.
  67. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 23–30.
  68. X. Zhang, Y. Sugano, and A. Bulling, “Revisiting data normalization for appearance-based gaze estimation,” in Proceedings of the ACM Symposium on Eye Tracking Research & Applications, 2018, pp. 1–9.
  69. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2015.
  70. J. Immerkaer, “Fast noise variance estimation,” Computer Vision and Image Understanding, vol. 64, no. 2, pp. 300–302, 1996.
  71. T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L.-P. Morency, “Openface 2.0: Facial behavior analysis toolkit,” in 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018, pp. 59–66.
Citations (3)

Summary

We haven't generated a summary for this paper yet.