PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images (2301.02310v3)
Abstract: Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We present a novel approach that enables diverse data to be captured with only an RGB camera and a cooperative participant. Our key insight is that people can be prompted to apply pressure in a certain way, and this prompt can serve as a weak label to supervise models to perform well under varied conditions. We collect a novel dataset with 51 participants making fingertip contact with diverse objects. Our network, PressureVision++, outperforms human annotators and prior work. We also demonstrate an application of PressureVision++ to mixed reality where pressure estimation allows everyday surfaces to be used as arbitrary touch-sensitive interfaces. Code, data, and models are available online.
- Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4981–4990, 2018.
- TouchPose: Hand pose prediction, depth estimation, and touch classification from capacitive images. In The 34th Annual ACM Symposium on User Interface Software and Technology, pages 997–1009, 2021.
- What’s the point: Semantic segmentation with point supervision. In European Conference on Computer Vision (ECCV), pages 549–565. Springer, 2016.
- Reskin: versatile, replaceable, lasting tactile skins. Conference on Robot Learning (CoRL), 2021.
- ContactPose: A dataset of grasps with object contact and hand pose. In European Conference on Computer Vision (ECCV), August 2020.
- Samarth Manoj Brahmbhatt. Grasp contact between hand and object: Capture, analysis, and applications. PhD thesis, Georgia Institute of Technology, 2020.
- Flexible and stretchable fabric-based tactile sensor. Robotics and Autonomous Systems, 63:244–252, 2015.
- Weakly-supervised semantic segmentation via sub-category exploration. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8991–9000, 2020.
- Dexycb: A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9044–9053, 2021.
- Estimating fingertip forces, torques, and local curvatures from fingernail images. Robotica, 38(7):1242–1262, 2020.
- Comfortable user interfaces: Surfaces reduce input error, time, and exertion for tabletop and mid-air user interfaces. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 150–159. IEEE, 2022.
- 3d hand pose estimation on conventional capacitive touchscreens. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction, MobileHCI ’21, New York, NY, USA, 2021. Association for Computing Machinery.
- The cityscapes dataset for semantic urban scene understanding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016.
- Use the force, Luke! Learning to predict physical forces by simulating effects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 224–233, 2020.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6):2280–2292, 2014.
- Acustico: surface tap detection and localization using wrist-based acoustic tdoa sensing. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pages 406–419, 2020.
- Learning to detect touches on cluttered tables. arXiv preprint arXiv:2304.04687, 2023.
- PressureVision: estimating hand pressure from a single RGB image. European Conference on Computer Vision (ECCV), 2022.
- ContactOpt: Optimizing contact to improve grasps. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1471–1481, 2021.
- Capauth: Identifying and differentiating user handprints on commodity capacitive touchscreens. In Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces, pages 59–62, 2015.
- Hololens 2 technical evaluation as mixed reality guide. arXiv preprint arXiv:2207.09554, 2022.
- HOnnotate: A method for 3D annotation of hand and object poses. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3196–3206, 2020.
- Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Transactions on Graphics (ToG), 39:87, 2020.
- Umetrack: Unified multi-view end-to-end hand tracking for vr. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Depth errors analysis and correction for time-of-flight (tof) cameras. Sensors, 17(1):92, 2017.
- Visual cues for perceiving distances from objects to surfaces. Presence: Teleoperators & Virtual Environments, 11(6):652–664, 2002.
- Visual cues for imminent object contact in realistic virtual environments. In Proceedings Visualization 2000., pages 179–185. IEEE, 2000.
- Squeeze-and-excitation networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7132–7141, 2018.
- The relative contributions of stereo, lighting, and background scenes in promoting 3d depth visualization. ACM Transactions on Computer Humam Interaction, 6(3):214–242, 1999.
- Inferring interaction force from visual information without using physical force sensors. Sensors, 17(11):2455, 2017.
- Coding and use of tactile signals from the fingertips in object manipulation tasks. Nature Reviews Neuroscience, 10:345––359, 2009.
- Electroring: Subtle pinch and touch detection with a ring. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2021.
- Capacitive tactile sensor array for touch screen application. Sensors and Actuators A: Physical, 165(1):2–7, 2011.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR), 2015.
- Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European Conference on Computer Vision (ECCV), pages 695–711. Springer, 2016.
- Estimating 3d motion and forces of person-object interactions from monocular video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8640–8649, 2019.
- Feature pyramid networks for object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2125, 2017.
- Mediapipe: A framework for perceiving and processing reality. In Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR) 2019, 2019.
- Learning human–environment interactions using conformal tactile textiles. Nature Electronics, 4(3):193–201, 2021.
- Photoplethysmograph fingernail sensors for measuring finger forces without haptic obstruction. IEEE Transactions on Robotics and Automation, 17(5):698–708, 2001.
- Measurement of finger posture and three-axis fingertip touch force using fingernail sensors. IEEE Transactions on Robotics and Automation, 20(1):26–35, 2004.
- Crafting a multi-task CNN for viewpoint estimation. British Machine Vision Conference (BMVC), 2016.
- Tapid: Rapid touch interaction in virtual reality using wearable sensing. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR), pages 519–528. IEEE, 2021.
- Morph. Sensel Morph haptic sensing tablet. www.sensel.com/pages/the-sensel-morph, Last accessed on 2020-02-25.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Domain adaptive semantic segmentation using weak labels. In European Conference on Computer Vision (ECCV), pages 571–587. Springer, 2020.
- Towards force sensing from vision: Observing hand-object interactions to infer manipulation forces. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2810–2819, 2015.
- Hand-object contact force estimation from markerless visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2883–2896, 2017.
- Pressure Profile Systems. PPS TactileGlove.
- Utilisation of tactile sensors in ergonomic assessment of hand–handle interface: a review. Agron. Res, 12(3):907–914, 2014.
- Understanding everyday hands in action from RGB-D images. In IEEE International Conference on Computer Vision (ICCV), pages 3889–3897, 2015.
- Robust and reliable fabric, piezoresistive multitouch sensing surfaces for musical controllers. In Conference on New Interfaces for Musical Expression, pages 393–398, 2011.
- Timothy A Salthouse. Effects of age and skill in typing. Journal of Experimental Psychology: General, 113(3):345, 1984.
- Farout touch: Extending the range of ad hoc touch sensing with depth cameras. In Proceedings of the 2021 ACM Symposium on Spatial User Interaction, pages 1–12, 2021.
- Ready, steady, touch! sensing physical contact with a finger-mounted imu. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(2):1–25, 2020.
- Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3d model views. In IEEE International Conference on Computer Vision (ICCV), pages 2686–2694, 2015.
- Learning the signatures of the human grasp using a scalable tactile glove. Nature, 569(7758):698–702, 2019.
- GRAB: A dataset of whole-body human grasping of objects. In European Conference on Computer Vision (ECCV), 2020.
- Tekscan. Pressure mapping, force measurement and tactile sensors. www.tekscan.com/products-solutions/sensorsLast, Last accessed on 2020-02-25.
- MRTouch: adding touch input to head-mounted mixed reality. IEEE Transactions on Visualization and Computer Graphics, 24(4):1653–1660, 2018.
- Aggregated residual transformations for deep neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2017.
- Pavel Yakubovskiy. Segmentation models Pytorch, 2020.
- Oakink: A large-scale knowledge repository for understanding hand-object interaction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20953–20962, 2022.
- Actitouch: Robust touch detection for on-skin ar/vr interfaces. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pages 1151–1159, 2019.