TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric Vision

Published 8 Oct 2024 in cs.CV and cs.HC | (2410.05940v1)

Abstract: While passive surfaces offer numerous benefits for interaction in mixed reality, reliably detecting touch input solely from head-mounted cameras has been a long-standing challenge. Camera specifics, hand self-occlusion, and rapid movements of both head and fingers introduce considerable uncertainty about the exact location of touch events. Existing methods have thus not been capable of achieving the performance needed for robust interaction. In this paper, we present a real-time pipeline that detects touch input from all ten fingers on any physical surface, purely based on egocentric hand tracking. Our method TouchInsight comprises a neural network to predict the moment of a touch event, the finger making contact, and the touch location. TouchInsight represents locations through a bivariate Gaussian distribution to account for uncertainties due to sensing inaccuracies, which we resolve through contextual priors to accurately infer intended user input. We first evaluated our method offline and found that it locates input events with a mean error of 6.3 mm, and accurately detects touch events (F1=0.99) and identifies the finger used (F1=0.96). In an online evaluation, we then demonstrate the effectiveness of our approach for a core application of dexterous touch input: two-handed text entry. In our study, participants typed 37.0 words per minute with an uncorrected error rate of 2.9% on average.

Abstract PDF HTML Upgrade to Chat

Authors (6)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces TouchInsight, a real-time pipeline that leverages egocentric vision and neural networks to detect ten-finger touch events on any surface.
It employs a bivariate Gaussian distribution with contextual priors to manage sensing uncertainties, achieving a 6.3 mm mean position error and F1 scores of 0.99 for detection and 0.96 for finger identification.
The method enables efficient MR interactions, supporting two-handed text entry at 37.0 words per minute with a 2.9% error rate, which paves the way for ergonomic MR interfaces.

TouchInsight: Uncertainty-aware Touch and Text Input for Mixed Reality

The paper "TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric Vision" addresses a longstanding challenge in mixed reality (MR) interaction: the reliable detection of touch input using head-mounted cameras. The complexity arises from factors such as camera specifics, self-occlusion, and rapid movements, which introduce uncertainty in pinpointing touch events.

Core Contribution

The primary contribution is a real-time pipeline, termed Touch, capable of detecting touch input from all ten fingers on any surface, leveraging egocentric hand tracking. This method integrates a neural network to predict touch events, including the moment, the finger in contact, and the precise location of the touch. A notable aspect is the representation of touch locations using a bivariate Gaussian distribution, accounting for uncertainties due to sensing inaccuracies. This is resolved through contextual priors, refining the inference of intended user input.

Numerical Results

The efficacy of the method is evident in the offline evaluation:

Mean position error for input events is reported at 6.3 mm.
Touch events are detected with an F1-score of 0.99.
Finger identification yields an F1-score of 0.96.

An online evaluation highlights the system's potential in dexterous applications, such as two-handed text entry:

Participants achieved a typing speed of 37.0 words per minute with an uncorrected error rate of 2.9%.

Theoretical and Practical Implications

Theoretically, accounting for uncertainties arising from both user and sensing errors via a probabilistic framework represents a significant advancement. This approach could encourage further exploration of uncertainty estimation in other AI applications, particularly where precision is critical.

Practically, the implications for MR systems are substantial. The method's ability to work with head-mounted displays without external sensors or additional instrumentation broadens the usability in real-world applications, potentially leading to more ergonomic and efficient MR interfaces.

Future Developments

Future developments could focus on:

Expanding character sets and command inputs.
Integrating gaze tracking for enhanced user intention prediction.
Leveraging motion cues for further reducing uncertainty in input prediction.

Conclusion

The paper makes a compelling case for an uncertainty-aware framework in MR interactions, offering a feasible solution for text input via surface touch detection using egocentric vision. This work not only enhances the understanding of touch detection in dynamic environments but also sets a foundation for increasingly sophisticated MR applications. The integration of probabilistic methods, coupled with contextual priors, presents pathways for improvements not just in MR systems but in wider fields where precise input tracking is essential.

Markdown Report Issue