Human I/O: Towards a Unified Approach to Detecting Situational Impairments (2403.04008v1)
Abstract: Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with LLMs, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.
- Vision-Based Human Activity Recognition: A Survey. Multimedia Tools and Applications 79 (2020), 30509–30555. https://doi.org/10.1007/s11042
- Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901. https://doi.org/10.5555/3495724.3495883
- Stuart K Card. 2018. The Psychology of Human-Computer Interaction. Crc Press.
- FaceBit: Smart Face Masks Platform. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–44. https://doi.org/10.1145/3494991
- Mobile Phone Based Drunk Driving Detection. In 2010 4th International Conference on Pervasive Computing Technologies for Healthcare. IEEE, IEEE, 1–8. https://doi.org/10.4108/ICST.PERVASIVEHEALTH2010.8901
- Human-Computer Interaction. Pearson Education.
- Geollery: A Mixed Reality Social Media Platform. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI, 685). ACM, 13. https://doi.org/10.1145/3290605.3300915
- DepthLab: Real-time 3D Interaction With Depth Maps for Mobile Augmented Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST). ACM, 829–843. https://doi.org/10.1145/3379337.3415881
- Mathematical Capabilities of ChatGPT. ArXiv Preprint ArXiv:2301.13867 (2023). https://doi.org/10.48550/arXiv.2301.13867
- HydroSense: Infrastructure-Mediated Single-Point Sensing of Whole-Home Water Activity. In Proceedings of the 11th International Conference on Ubiquitous Computing. 235–244. https://doi.org/10.1145/1620545.1620581
- WalkType: Using Accelerometer Data to Accomodate Situational Impairments in Mobile Touch Screen Text Entry. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2687–2696. https://doi.org/10.1145/2207676.2208662
- ContextType: Using Hand Posture Information to Improve Mobile Touch Screen Text Entry. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2795–2798. https://doi.org/10.1145/2470654.2481386
- Google. 2023a. Google Activity Recognition API. https://developers.google.com/location-context/activity-recognition
- Google. 2023b. Object Detection Task Guide. https://developers.google.com/mediapipe/solutions/vision/object_detector
- Ego4d: Around the World in 3,000 Hours of Egocentric Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18995–19012. https://doi.org/10.1109/CVPR52688.2022.01842
- Content-Based Access to Video Objects: Temporal Segmentation, Visual Summarization, and Feature Extraction. Signal Processing 66, 2 (1998), 261–280. https://doi.org/10.1016/S0165
- SenseCam: A Retrospective Memory Aid. In UbiComp 2006: Ubiquitous Computing: 8th International Conference, UbiComp 2006 Orange County, CA, USA, September 17-21, 2006 Proceedings 8. Springer, Springer, 177–193. https://doi.org/10.1007/1185356_11
- Genline and Genform: Two Tools for Interacting With Generative Language Models in a Code Editor. In Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology. 145–147. https://doi.org/10.1145/3474349.3480209
- Takeo Kanade and Martial Hebert. 2012. First-Person Vision. Proc. IEEE 100, 8 (2012), 2442–2453. https://doi.org/6
- Getting Off the Treadmill: Evaluating Walking User Interfaces for Mobile Devices in Public Spaces. In Proceedings of the 10th International Conference on Human Computer Interaction With Mobile Devices and Services. 109–118. https://doi.org/10.1145/1409240.1409253
- A Review on Video-Based Human Activity Recognition. Computers 2, 2 (2013), 88–131. https://doi.org/10.1109/CONFLUENCE.2016.7508177
- Activity Recognition on Smartphones via Sensor-Fusion and KDA-Based SVMs. International Journal of Distributed Sensor Networks 10, 5 (2014), 503291. https://doi.org/10.1155/2014/503291
- Activity Forecasting. In Computer Vision-ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12. Springer, Springer, 201–214. https://doi.org/10.1007/978-3-642-33765-_15
- Large Language Models Are Zero-Shot Reasoners. ArXiv Preprint ArXiv:2205.11916 (2022). https://doi.org/10.48550/arXiv.2205.11916
- Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 213–224. https://doi.org/10.1145/3242587.3242609
- Gierad Laput and Chris Harrison. 2019. Sensing Fine-Grained Hand Activity With Smartwatches. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3290605.3300568
- Viband: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 321–333. https://doi.org/10.1145/2984511.2984582
- Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 3986–3999. https://doi.org/10.1145/3025453.3025773
- Oscar D Lara and Miguel A Labrador. 2012. A Survey on Human Activity Recognition Using Wearable Sensors. IEEE Communications Surveys & Tutorials 15, 3 (2012), 1192–1209. https://doi.org/10.1007/978-3-031-24352-_5
- Solving Quantitative Reasoning Problems With Language Models. ArXiv Preprint ArXiv:2206.14858 (2022). https://doi.org/10.48550/arXiv.2206.14858
- Blip-2: Bootstrapping Language-Image Pre-Training With Frozen Image Encoders and Large Language Models. ArXiv Preprint ArXiv:2301.12597 (2023). https://doi.org/10.48550/arXiv.2301.12597
- Context-Aware Online Adaptation of Mixed Reality Interfaces. In UIST ’19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3332165.3347945
- Modeling and Improving Text Stability in Live Captions. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA, 208). ACM, 1–9. https://doi.org/10.1145/3544549.3585609
- Visual Captions: Augmenting Verbal Communication With On-the-Fly Visuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI). ACM. https://doi.org/10.1145/3544548.3581566
- CrossA11y: Identifying Video Accessibility Issues via Cross-Modal Grounding. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 43, 14 pages. https://doi.org/10.1145/3526113.3545703
- Designing EyeTap Digital Eyeglasses for Continuous Lifelong Capture and Sharing of Personal Experiences. Alt. Chi, Proc. CHI 2005 (2005). https://doi.org/10.1007/978-3-319-07788-_27
- SwitchBack: Using Focus and Saccade Tracking to Guide Users’ Attention for Mobile Task Resumption. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 2953–2962. https://doi.org/10.1145/2702123.2702539
- Drunk User Interfaces: Determining Blood Alcohol Level Through Everyday Smartphone Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3173574.3173808
- Human Physical Activity Recognition Based on Computer Vision With Deep Learning Model. In 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings. IEEE, IEEE, 1–6. https://doi.org/10.1109/I2MTC.2016.7520541
- Wearable Subtitles: Augmenting Spoken Communication With Lightweight Eyewear for All-Day Captioning. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 1108–1120. https://doi.org/10.1145/3379337.3415817
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–18. https://doi.org/10.1145/3526113.3545616
- Manoj Plakal and Dan. Ellis. 2020. YAMNet. https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
- Category-Specific Video Summarization. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13. Springer, Springer, 540–555. https://doi.org/10.1007/978-3-319-10599-_35
- BodyBeat: A Mobile System for Sensing Non-Speech Body Sounds. In MobiSys, Vol. 14. 2–594. https://doi.org/10.1145/2594368.2594386
- Multimodal Human Action Recognition in Assistive Human-Robot Interaction. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, IEEE, 2702–2706. https://doi.org/10.1109/ICASSP.2016.7472168
- Physical Disabilities and Computing Technologies: An Analysis of Impairments. In The Human-Computer Interaction Handbook. CRC Press, 87–110. https://doi.org/10.1201/9781410615862
- Hyun Soo Park and Jianbo Shi. 2015. Social Saliency Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4777–4785. https://doi.org/10.1109/CVPR.2015.7299110
- QUILL: Query Intent With Large Language Models Using Retrieval Augmentation and Multi-Stage Distillation. ArXiv Preprint ArXiv:2210.15718 (2022). https://doi.org/10.48550/arXiv.2210.15718
- Human Action Recognition From Various Data Modalities: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022). https://doi.org/10.1109/TPAMI.2022.3183112
- RainCheck: Overcoming Capacitive Interference Caused by Rainwater on Smartphones. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 464–471. https://doi.org/10.1145/3242969.3243028
- Enabling Conversational Interaction With Mobile UI Using Large Language Models. ArXiv Preprint ArXiv:2209.08655 (2022). https://arxiv.org/pdf/2209.08655
- Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv Preprint ArXiv:2201.11903 (2022). https://doi.org/10.48550/arXiv.2201.11903
- Jacob O Wobbrock. 2019. Situationally Aware Mobile Devices for Overcoming Situational Impairments. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. 1–18. https://doi.org/10.1145/3319499.3330292
- Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language. ArXiv Preprint ArXiv:2204.00598 (2022). https://doi.org/10.48550/arXiv.2204.00598
- Mediapipe Hands: On-Device Real-Time Hand Tracking. ArXiv Preprint ArXiv:2006.10214 (2020). https://arxiv.org/pdf/2006.10214
- InstructPipe: Building Visual Programming Pipelines With Human Instructions. https://doi.org/10.48550/arXiv.2312.09672
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.