Papers
Topics
Authors
Recent
2000 character limit reached

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Published 3 Sep 2024 in cs.CV and cs.HC | (2409.02224v2)

Abstract: Touch contact and pressure are essential for understanding how humans interact with and manipulate objects, insights which can significantly benefit applications in mixed reality and robotics. However, estimating these interactions from an egocentric camera perspective is challenging, largely due to the lack of comprehensive datasets that provide both accurate hand poses on contacting surfaces and detailed annotations of pressure information. In this paper, we introduce EgoPressure, a novel egocentric dataset that captures detailed touch contact and pressure interactions. EgoPressure provides high-resolution pressure intensity annotations for each contact point and includes accurate hand pose meshes obtained through our proposed multi-view, sequence-based optimization method processing data from an 8-camera capture rig. Our dataset comprises 5 hours of recorded interactions from 21 participants captured simultaneously by one head-mounted and seven stationary Kinect cameras, which acquire RGB images and depth maps at 30 Hz. To support future research and benchmarking, we present several baseline models for estimating applied pressure on external surfaces from RGB images, with and without hand pose information. We further explore the joint estimation of the hand mesh and applied pressure. Our experiments demonstrate that pressure and hand pose are complementary for understanding hand-object interactions. ng of hand-object interactions in AR/VR and robotics research. Project page: \url{https://yiming-zhao.github.io/EgoPressure/}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. P. Grady, C. Tang, S. Brahmbhatt, C. D. Twigg, C. Wan, J. Hays, and C. C. Kemp, “Pressurevision: Estimating hand pressure from a single rgb image,” in European Conference on Computer Vision.   Springer, 2022, pp. 328–345.
  2. P. Grady, J. A. Collins, C. Tang, C. D. Twigg, K. Aneja, J. Hays, and C. C. Kemp, “Pressurevision++: Estimating fingertip pressure from diverse rgb images,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 8698–8708.
  3. S. Brahmbhatt, C. Tang, C. D. Twigg, C. C. Kemp, and J. Hays, “Contactpose: A dataset of grasps with object contact and hand pose,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16.   Springer, 2020, pp. 361–378.
  4. O. Taheri, N. Ghorbani, M. J. Black, and D. Tzionas, “Grab: A dataset of whole-body human grasping of objects,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16.   Springer, 2020, pp. 581–600.
  5. Z. Fan, O. Taheri, D. Tzionas, M. Kocabas, M. Kaufmann, M. J. Black, and O. Hilliges, “Arctic: A dataset for dexterous bimanual hand-object manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 943–12 954.
  6. T. Kwon, B. Tekin, J. Stühmer, F. Bogo, and M. Pollefeys, “H2o: Two hands manipulating objects for first person interaction recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 138–10 148.
  7. L. Yang, K. Li, X. Zhan, F. Wu, A. Xu, L. Liu, and C. Lu, “Oakink: A large-scale knowledge repository for understanding hand-object interaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 953–20 962.
  8. X. Zhan, L. Yang, Y. Zhao, K. Mao, H. Xu, Z. Lin, K. Li, and C. Lu, “Oakink2: A dataset of bimanual hands-object manipulation in complex task completion,” arXiv preprint arXiv:2403.19417, 2024.
  9. Y.-W. Chao, W. Yang, Y. Xiang, P. Molchanov, A. Handa, J. Tremblay, Y. S. Narang, K. Van Wyk, U. Iqbal, S. Birchfield et al., “Dexycb: A benchmark for capturing hand grasping of objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9044–9053.
  10. S. Hampali, M. Rad, M. Oberweger, and V. Lepetit, “Honnotate: A method for 3d annotation of hand and object poses,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3196–3206.
  11. Y. Liu, H. Yang, X. Si, L. Liu, Z. Li, Y. Zhang, Y. Liu, and L. Yi, “Taco: Benchmarking generalizable bimanual tool-action-object understanding,” arXiv preprint arXiv:2401.08399, 2024.
  12. S. Christen, L. Feng, W. Yang, Y.-W. Chao, O. Hilliges, and J. Song, “Synh2r: Synthesizing hand-object motions for learning human-to-robot handovers,” arXiv preprint arXiv:2311.05599, 2023.
  13. S. Li, J. Jiang, P. Ruppel, H. Liang, X. Ma, N. Hendrich, F. Sun, and J. Zhang, “A mobile robot hand-arm teleoperation system by vision and imu,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 10 900–10 906.
  14. S. Han, P.-c. Wu, Y. Zhang, B. Liu, L. Zhang, Z. Wang, W. Si, P. Zhang, Y. Cai, T. Hodan et al., “Umetrack: Unified multi-view end-to-end hand tracking for vr,” in SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9.
  15. S. Han, B. Liu, R. Cabezas, C. D. Twigg, P. Zhang, J. Petkau, T.-H. Yu, C.-J. Tai, M. Akbay, Z. Wang et al., “Megatrack: monochrome egocentric articulated hand-tracking for virtual reality,” ACM Transactions on Graphics (ToG), vol. 39, no. 4, pp. 87–1, 2020.
  16. J. M. Rehg and T. Kanade, “Digiteyes: Vision-based hand tracking for human-computer interaction,” in Proceedings of 1994 IEEE workshop on motion of non-rigid and articulated objects.   IEEE, 1994, pp. 16–22.
  17. G. Boato, N. Conci, M. Daldoss, F. G. De Natale, and N. Piotto, “Hand tracking and trajectory analysis for physical rehabilitation,” in 2009 IEEE International Workshop on Multimedia Signal Processing.   IEEE, 2009, pp. 1–6.
  18. G. Lugo, M. Ibarra-Manzano, F. Ba, and I. Cheng, “Virtual reality and hand tracking system as a medical tool to evaluate patients with parkinson’s,” in Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare, 2017, pp. 405–408.
  19. J. Hein, M. Seibold, F. Bogo, M. Farshad, M. Pollefeys, P. Fürnstahl, and N. Navab, “Towards markerless surgical tool and hand pose estimation,” International journal of computer assisted radiology and surgery, vol. 16, pp. 799–808, 2021.
  20. F. Mueller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt, “Real-time hand tracking under occlusion from an egocentric rgb-d sensor,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1154–1163.
  21. G. Pavlakos, D. Shan, I. Radosavovic, A. Kanazawa, D. Fouhey, and J. Malik, “Reconstructing hands in 3D with transformers,” in CVPR, 2024.
  22. C. Zimmermann, D. Ceylan, J. Yang, B. Russell, M. Argus, and T. Brox, “Freihand: A dataset for markerless capture of hand pose and shape from single rgb images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 813–822.
  23. S. Yuan, Q. Ye, B. Stenger, S. Jain, and T.-K. Kim, “Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4866–4874.
  24. G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16.   Springer, 2020, pp. 548–564.
  25. G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim, “First-person hand action benchmark with rgb-d videos and 3d hand pose annotations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 409–419.
  26. K. Grauman, A. Westbury, L. Torresani, K. Kitani, J. Malik, T. Afouras, K. Ashutosh, V. Baiyya, S. Bansal, B. Boote et al., “Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives,” arXiv preprint arXiv:2311.18259, 2023.
  27. Y. F. Cheng, T. Luong, A. R. Fender, P. Streli, and C. Holz, “Comfortable user interfaces: Surfaces reduce input error, time, and exertion for tabletop and mid-air user interfaces,” in 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).   IEEE, 2022.
  28. Y. Shi, H. Zhang, J. Cao, and S. Nanayakkara, “Versatouch: A versatile plug-and-play system that enables touch interactions on everyday passive surfaces,” in Proceedings of the Augmented Humans International Conference, ser. AHs ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3384657.3384778
  29. R. Takahashi, M. Fukumoto, C. Han, T. Sasatani, Y. Narusue, and Y. Kawahara, “Telemetring: A batteryless and wireless ring-shaped keyboard using passive inductive telemetry,” in Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 1161–1168. [Online]. Available: https://doi.org/10.1145/3379337.3415873
  30. Y. Gu, C. Yu, Z. Li, W. Li, S. Xu, X. Wei, and Y. Shi, “Accurate and low-latency sensing of touch contact on any surface with finger-worn imu sensor,” in Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’19.   New York, NY, USA: Association for Computing Machinery, 2019, p. 1059–1070. [Online]. Available: https://doi.org/10.1145/3332165.3347947
  31. Y. Shi, H. Zhang, K. Zhao, J. Cao, M. Sun, and S. Nanayakkara, “Ready, steady, touch! sensing physical contact with a finger-mounted imu,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 4, no. 2, jun 2020. [Online]. Available: https://doi.org/10.1145/3397309
  32. M. Meier, P. Streli, A. Fender, and C. Holz, “Tapid: Rapid touch interaction in virtual reality using wearable sensing,” in 2021 IEEE Virtual Reality and 3D User Interfaces (VR).   IEEE, 2021, pp. 519–528.
  33. J. Gong, A. Gupta, and H. Benko, “Acustico: Surface tap detection and localization using wrist-based acoustic tdoa sensing,” in Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 406–419. [Online]. Available: https://doi.org/10.1145/3379337.3415901
  34. P. Streli, J. Jiang, A. Fender, M. Meier, H. Romat, and C. Holz, “Taptype: Ten-finger text entry on everyday surfaces via bayesian inference,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.   New York, NY, USA: Association for Computing Machinery, 2022.
  35. M. Lee, W. Woo et al., “Arkb: 3d vision-based augmented reality keyboard.” in ICAT, 2003.
  36. C. Liang, X. Wang, Z. Li, C. Hsia, M. Fan, C. Yu, and Y. Shi, “Shadowtouch: Enabling free-form touch-based hand-to-surface interaction with wrist-mounted illuminant by shadow projection,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–14.
  37. A. D. Wilson, “Playanywhere: a compact interactive tabletop projection-vision system,” in Proceedings of the 18th annual ACM symposium on User interface software and technology, 2005, pp. 83–92.
  38. P. Streli, J. Jiang, J. Rossie, and C. Holz, “Structured light speckle: Joint ego-centric depth estimation and low-latency contact detection via remote vibrometry,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–12.
  39. A. D. Wilson, “Using a depth camera as a touch sensor,” in ACM International Conference on Interactive Tabletops and Surfaces, ser. ITS ’10.   New York, NY, USA: Association for Computing Machinery, 2010, p. 69–72. [Online]. Available: https://doi.org/10.1145/1936652.1936665
  40. E. N. Saba, E. C. Larson, and S. N. Patel, “Dante vision: In-air and touch gesture sensing for natural surface interaction with combined depth and thermal cameras,” in 2012 IEEE International Conference on Emerging Signal Processing Applications, 2012, pp. 167–170.
  41. S. Gustafson, C. Holz, and P. Baudisch, “Imaginary phone: Learning imaginary interfaces by transferring spatial memory from a familiar device,” in Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’11.   New York, NY, USA: Association for Computing Machinery, 2011, p. 283–292. [Online]. Available: https://doi.org/10.1145/2047196.2047233
  42. V. Shen, J. Spann, and C. Harrison, “Farout touch: Extending the range of ad hoc touch sensing with depth cameras,” in Proceedings of the 2021 ACM Symposium on Spatial User Interaction, ser. SUI ’21.   New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3485279.3485281
  43. J. Cha, S.-m. Kim, I. Oakley, J. Ryu, and K. H. Lee, “Haptic interaction with depth video media,” in Advances in Multimedia Information Processing - PCM 2005, Y.-S. Ho and H. J. Kim, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 420–430.
  44. N. X. Fan and R. Xiao, “Reducing the latency of touch tracking on ad-hoc surfaces,” Proc. ACM Hum.-Comput. Interact., vol. 6, no. ISS, nov 2022. [Online]. Available: https://doi.org/10.1145/3567730
  45. R. Xiao, J. Schwarz, N. Throm, A. D. Wilson, and H. Benko, “Mrtouch: Adding touch input to head-mounted mixed reality,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 4, pp. 1653–1660, 2018.
  46. R. Xiao, S. Hudson, and C. Harrison, “Direct: Making touch tracking on ordinary surfaces practical with hybrid depth-infrared sensing,” in Proceedings of the 2016 ACM International Conference on Interactive Surfaces and Spaces, 2016, pp. 85–94.
  47. A. Inc. (2024) Apple vision pro. [Online]. Available: https://www.apple.com/apple-vision-pro/
  48. M. Corporation. (2019) Microsoft hololens, mixed reality technology for business. [Online]. Available: https://www.microsoft.com/en-us/hololens
  49. P. Grady, C. Tang, C. D. Twigg, M. Vo, S. Brahmbhatt, and C. C. Kemp, “Contactopt: Optimizing contact to improve grasps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1471–1481.
  50. Z. Zhu, J. Wang, Y. Qin, D. Sun, V. Jampani, and X. Wang, “Contactart: Learning 3d interaction priors for category-level articulated object and hand poses estimation,” arXiv preprint arXiv:2305.01618, 2023.
  51. E. Corona, A. Pumarola, G. Alenya, F. Moreno-Noguer, and G. Rogez, “Ganhand: Predicting human grasp affordances in multi-object scenes,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5031–5041.
  52. Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. J. Black, I. Laptev, and C. Schmid, “Learning joint reconstruction of hands and manipulated objects,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 807–11 816.
  53. P. Schumacher, D. Häufle, D. Büchler, S. Schmitt, and G. Martius, “Dep-rl: Embodied exploration for reinforcement learning in overactuated and musculoskeletal systems,” arXiv preprint arXiv:2206.00484, 2022.
  54. S. Christen, M. Kocabas, E. Aksan, J. Hwangbo, J. Song, and O. Hilliges, “D-grasp: Physically plausible dynamic grasp synthesis for hand-object interactions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 577–20 586.
  55. P. Mandikal and K. Grauman, “Learning dexterous grasping with object-centric visual affordances,” in 2021 IEEE international conference on robotics and automation (ICRA).   IEEE, 2021, pp. 6169–6176.
  56. P. Quinn, W. Feng, and S. Zhai, “Deep touch: Sensing press gestures from touch image sequences,” Artificial Intelligence for Human Computer Interaction: A Modern Approach, pp. 169–192, 2021.
  57. N. Chen, G. Westling, B. B. Edin, and P. van der Smagt, “Estimating fingertip forces, torques, and local curvatures from fingernail images,” Robotica, vol. 38, no. 7, pp. 1242–1262, 2020.
  58. S. A. Mascaro and H. H. Asada, “Measurement of finger posture and three-axis fingertip touch force using fingernail sensors,” IEEE Transactions on Robotics and Automation, vol. 20, no. 1, pp. 26–35, 2004.
  59. K. Ehsani, S. Tulsiani, S. Gupta, A. Farhadi, and A. Gupta, “Use the force, luke! learning to predict physical forces by simulating effects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 224–233.
  60. Z. Li, J. Sedlar, J. Carpentier, I. Laptev, N. Mansard, and J. Sivic, “Estimating 3d motion and forces of person-object interactions from monocular video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8640–8649.
  61. T.-H. Pham, N. Kyriazis, A. A. Argyros, and A. Kheddar, “Hand-object contact force estimation from markerless visual tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2883–2896, 2017.
  62. G. H. Büscher, R. Kõiva, C. Schürmann, R. Haschke, and H. J. Ritter, “Flexible and stretchable fabric-based tactile sensor,” Robotics and Autonomous Systems, vol. 63, pp. 244–252, 2015.
  63. S. Sundaram, P. Kellnhofer, Y. Li, J.-Y. Zhu, A. Torralba, and W. Matusik, “Learning the signatures of the human grasp using a scalable tactile glove,” Nature, vol. 569, no. 7758, pp. 698–702, 2019.
  64. P. U. Limited. (2023) Tactileglove - hand pressure and force measurement. [Online]. Available: https://pressureprofile.com/body-pressure-mapping/tactile-glove?_gl=1*rkuoga*_up*MQ..&gclid=Cj0KCQjw6uWyBhD1ARIsAIMcADptXiRJErWsW1DuSGDRAD1sO8IUW-_cPk2pyVHIQaf4ba5pMJlFfJsaAriJEALw_wcB
  65. P. M. Sensors. (2024) Tekscan. [Online]. Available: https://www.tekscan.com/pressure-mapping-sensors
  66. H.-K. Kim, S. Lee, and K.-S. Yun, “Capacitive tactile sensor array for touch screen application,” Sensors and Actuators A: Physical, vol. 165, no. 1, pp. 2–7, 2011.
  67. R. Bhirangi, T. Hellebrekers, C. Majidi, and A. Gupta, “Reskin: versatile, replaceable, lasting tactile skins,” in 5th Annual Conference on Robot Learning, 2021.
  68. Y. Luo, Y. Li, P. Sharma, W. Shou, K. Wu, M. Foshey, B. Li, T. Palacios, A. Torralba, and W. Matusik, “Learning human–environment interactions using conformal tactile textiles,” Nature Electronics, vol. 4, no. 3, pp. 193–201, 2021.
  69. S. Inc. (2024) Sensel morph. [Online]. Available: https://morph.sensel.com/
  70. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
  71. “Azure kinect dk hardware specifications,” Microsoft, 2019. [Online]. Available: https://docs.microsoft.com/en-us/azure/kinect-dk/hardware-specification
  72. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
  73. W. Chen, J. Gao, H. Ling, E. Smith, J. Lehtinen, A. Jacobson, and S. Fidler, “Learning to predict 3d objects with an interpolation-based differentiable renderer,” in Advances In Neural Information Processing Systems, 2019.
  74. J. Romero, D. Tzionas, and M. J. Black, “Embodied hands: Modeling and capturing hands and bodies together,” ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), Nov. 2017. [Online]. Available: http://doi.acm.org/10.1145/3130800.3130883
  75. Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. J. Black, I. Laptev, and C. Schmid, “Learning joint reconstruction of hands and manipulated objects,” in CVPR, 2019.
  76. K. Karunratanakul, S. Prokudin, O. Hilliges, and S. Tang, “HARP: Personalized Hand Reconstruction from a Monocular RGB Video,” 2023.
  77. T. Karras, “Maximizing parallelism in the construction of bvhs, octrees, and k-d trees,” in Proceedings of the Fourth ACM SIGGRAPH / Eurographics Conference on High-Performance Graphics.   Eurographics Association, 2012, pp. 33–37. [Online]. Available: https://doi.org/10.2312/EGGH/HPG12/033-037
  78. D. Tzionas, L. Ballan, A. Srikantha, P. Aponte, M. Pollefeys, and J. Gall, “Capturing hands in action using discriminative salient points and physics simulation,” International Journal of Computer Vision (IJCV), vol. 118, no. 2, pp. 172–193, Jun. 2016. [Online]. Available: https://doi.org/10.1007/s11263-016-0895-4
  79. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  80. O. Sorkine and M. Alexa, “As-rigid-as-possible surface modeling,” in Proceedings of EUROGRAPHICS/ACM SIGGRAPH Symposium on Geometry Processing, 2007, pp. 109–116.
  81. M. Desbrun, M. Meyer, P. Schröder, and A. H. Barr, “Implicit fairing of irregular meshes using diffusion and curvature flow,” in Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999, pp. 317–324.
  82. J. Jian, X. Liu, M. Li, R. Hu, and J. Liu, “Affordpose: A large-scale dataset of hand-object interactions with affordance-driven hand pose,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 713–14 724.
  83. T. Ohkawa, K. He, F. Sener, T. Hodan, L. Tran, and C. Keskin, “Assemblyhands: Towards egocentric activity understanding via 3d hand pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 999–13 008.
  84. Y. Liu, Y. Liu, C. Jiang, K. Lyu, W. Wan, H. Shen, B. Liang, Z. Fu, H. Wang, and L. Yi, “Hoi4d: A 4d egocentric dataset for category-level human-object interaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21 013–21 022.
  85. H. Zheng, R. Lee, and Y. Lu, “Ha-vid: A human assembly video dataset for comprehensive assembly knowledge understanding,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  86. K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, J. Hamburger, H. Jiang, M. Liu, X. Liu et al., “Ego4d: Around the world in 3,000 hours of egocentric video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 995–19 012.
  87. D. Damen, H. Doughty, G. M. Farinella, A. Furnari, E. Kazakos, J. Ma, D. Moltisanti, J. Munro, T. Perrett, W. Price et al., “Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100,” International Journal of Computer Vision, pp. 1–23, 2022.
  88. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
  89. J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” 2019.
  90. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  91. P. Iakubovskii, “Segmentation models pytorch,” https://github.com/qubvel/segmentation_models.pytorch, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.