Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors (2404.10836v1)

Published 16 Apr 2024 in cs.CV and eess.IV

Abstract: The aim of this work is to establish how accurately a recent semantic-based foveal active perception model is able to complete visual tasks that are regularly performed by humans, namely, scene exploration and visual search. This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations. It has been used previously in scene exploration tasks. In this paper, we revisit the model and extend its application to visual search tasks. To illustrate the benefits of using semantic information in scene exploration and visual search tasks, we compare its performance against traditional saliency-based models. In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model in accurately representing the semantic information present in the visual scene. In visual search experiments, searching for instances of a target class in a visual field containing multiple distractors shows superior performance compared to the saliency-driven model and a random gaze selection algorithm. Our results demonstrate that semantic information, from the top-down, influences visual exploration and search tasks significantly, suggesting a potential area of research for integrating it with traditional bottom-up cues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. E. E. Stewart, M. Valsecchi, and A. C. Schütz, “A review of interactions between peripheral and foveal vision,” Journal of Vision, vol. 20, no. 12, pp. 2–2, 2020.
  2. R. Bajcsy, Y. Aloimonos, and J. K. Tsotsos, “Revisiting active perception,” Autonomous Robots, vol. 42, pp. 177–196, 2018.
  3. R. P. de Figueiredo and A. Bernardino, “An overview of space-variant and active vision mechanisms for resource-constrained human inspired robotic vision,” Autonomous Robots, pp. 1–17, 2023
  4. S. Frintrop, T. Werner, and G. Martin Garcia, “Traditional saliency reloaded: A good old model in new shape,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 82–90, 2015.
  5. R. Burt, N. N. Thigpen, A. Keil, and J. C. Principe, “Unsupervised foveal vision neural architecture with top-down attention,” Neural Networks, vol. 141, pp. 145–159, 2021.
  6. M. Zhang, J. Feng, K. T. Ma, J. H. Lim, Q. Zhao, and G. Kreiman, “Finding any Waldo with zero-shot invariant and efficient visual search,” Nature Communications, vol. 9, no. 1, p. 3730, 2018.
  7. M. Kümmerer and M. Bethge, “Predicting visual fixations,” Annual Review of Vision Science, vol. 9, pp. 269–291, 2023.
  8. M. Xie, S. Li, R. Zhang, and C. H. Liu, “Dirichlet-based Uncertainty Calibration for Active Domain Adaptation,” The Eleventh International Conference on Learning Representations, 2023.
  9. T. Silva Filho, H. Song, M. Perello-Nieto, R. Santos-Rodriguez, M. Kull, and P. Flach, “Classifier calibration: a survey on how to assess and improve predicted class probabilities,” Machine Learning, vol. 112, no. 9, pp. 3211–3260, Sep. 2023.
  10. L. K. Chan and W. G. Hayward, “Visual search,” Wiley Interdisciplinary Reviews: Cognitive Science, vol. 4, no. 4, pp. 415–429, 2013.
  11. V. J. Traver and A. Bernardino, “A review of log-polar imaging for visual perception in robotics,” Robotics and Autonomous Systems, vol. 58, no. 4, pp. 378–398, 2010.
  12. P. Ozimek, N. Hristozova, L. Balog, and J. P. Siebert, “A space-variant visual pathway model for data efficient deep learning,” Frontiers in Cellular Neuroscience, vol. 13, p. 36, 2019.
  13. H. Lukanov, P. König, and G. Pipa, “Biologically inspired deep learning model for efficient foveal-peripheral vision,” Frontiers in Computational Neuroscience, vol. 15, p. 746204, 2021.
  14. A. F. Almeida, R. Figueiredo, A. Bernardino, and J. Santos-Victor, “Deep networks for human visual attention: A hybrid model using foveal vision,” in ROBOT 2017: Third Iberian Robotics Conference: Volume 2, pp. 117–128, 2018.
  15. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
  16. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” University of Washington, 2018.
  17. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision, 2020, pp. 213–229.
  18. R. P. de Figueiredo, A. Bernardino, J. Santos-Victor, and H. Araújo, “On the advantages of foveal mechanisms for active stereo systems in visual search tasks,” Autonomous Robots, vol. 42, pp. 459–476, 2018.
  19. A. Torralba, A. Oliva, M. S. Castelhano, and J. M. Henderson, “Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search.,” Psychological review, vol. 113, no. 4, p. 766, 2006.
  20. M. Assens Reina, X. Giro-i Nieto, K. McGuinness, and N. E. O’Connor, “Saltinet: Scan-path prediction on 360 degree images using saliency volumes,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2331–2338, 2017.
  21. M. Assens, X. Giro-i Nieto, K. McGuinness, and N. E. O’Connor, “PathGAN: Visual scanpath prediction with generative adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018.
  22. X. Sui, Y. Fang, H. Zhu, S. Wang, and Z. Wang, “ScanDMM: A deep markov model of scanpath prediction for 360deg images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6989–6999, 2023.
  23. M. Kümmerer, M. Bethge, and T. S. Wallis, “DeepGaze III: Modeling free-viewing human scanpaths with deep learning,” Journal of Vision, vol. 22, no. 5, pp. 7–7, 2022.
  24. Z. Yang, S. Mondal, S. Ahn, G. Zelinsky, M. Hoai, and D. Samaras, “Predicting Human Attention using Computational Attention,” 2023.
  25. M. Kummerer, T. S. Wallis, and M. Bethge, “Saliency benchmarking made easy: Separating models, maps and metrics,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 770–787, 2018.
  26. Y. Chen, Z. Yang, S. Ahn, D. Samaras, M. Hoai, and G. Zelinsky, “Coco-search18 fixation dataset for predicting goal-directed attention control,” Scientific reports, vol. 11, no. 1, p. 8776, 2021.
  27. T. Judd, F. Durand, and A. Torralba, “A benchmark of computational models of saliency to predict human fixations,” 2012.
  28. A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” 2015.
  29. M. Kümmerer and M. Bethge, “State-of-the-art in human scanpath prediction,” 2021.
  30. G. Elsayed, S. Kornblith, and Q. V. Le, “Saccader: Improving accuracy of hard attention models for vision,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  31. R. Fu, J. Liu, X. Chen, Y. Nie, and W. Xiong, “Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning,” arXiv preprint arXiv:2403.11401, 2024.
  32. T. Minka, “Estimating a Dirichlet distribution,” Massachusetts Institute of Technology, Tech Report, 2000.
  33. J. M. Wolfe, “Visual search: How do we find what we are looking for?,” Annual review of vision science, vol. 6, pp. 539–562, 2020.
  34. S. Rashidi, W. Xu, D. Lin, A. Turpin, L. Kulik, and K. Ehinger, “An active foveated gaze prediction algorithm based on a Bayesian ideal observer,” Pattern Recognition, vol. 143, p. 109694, 2023.

Summary

We haven't generated a summary for this paper yet.