Papers
Topics
Authors
Recent
2000 character limit reached

Human I/O: Towards a Unified Approach to Detecting Situational Impairments (2403.04008v1)

Published 6 Mar 2024 in cs.HC

Abstract: Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a unified approach to detecting a wide range of SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with LLMs, Human I/O achieves a 0.22 mean absolute error and a 82% accuracy in availability prediction across 60 in-the-wild egocentric video recordings in 32 different scenarios. Furthermore, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the efficacy of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Vision-Based Human Activity Recognition: A Survey. Multimedia Tools and Applications 79 (2020), 30509–30555. https://doi.org/10.1007/s11042
  2. Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901. https://doi.org/10.5555/3495724.3495883
  3. Stuart K Card. 2018. The Psychology of Human-Computer Interaction. Crc Press.
  4. FaceBit: Smart Face Masks Platform. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–44. https://doi.org/10.1145/3494991
  5. Mobile Phone Based Drunk Driving Detection. In 2010 4th International Conference on Pervasive Computing Technologies for Healthcare. IEEE, IEEE, 1–8. https://doi.org/10.4108/ICST.PERVASIVEHEALTH2010.8901
  6. Human-Computer Interaction. Pearson Education.
  7. Geollery: A Mixed Reality Social Media Platform. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI, 685). ACM, 13. https://doi.org/10.1145/3290605.3300915
  8. DepthLab: Real-time 3D Interaction With Depth Maps for Mobile Augmented Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST). ACM, 829–843. https://doi.org/10.1145/3379337.3415881
  9. Mathematical Capabilities of ChatGPT. ArXiv Preprint ArXiv:2301.13867 (2023). https://doi.org/10.48550/arXiv.2301.13867
  10. HydroSense: Infrastructure-Mediated Single-Point Sensing of Whole-Home Water Activity. In Proceedings of the 11th International Conference on Ubiquitous Computing. 235–244. https://doi.org/10.1145/1620545.1620581
  11. WalkType: Using Accelerometer Data to Accomodate Situational Impairments in Mobile Touch Screen Text Entry. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2687–2696. https://doi.org/10.1145/2207676.2208662
  12. ContextType: Using Hand Posture Information to Improve Mobile Touch Screen Text Entry. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2795–2798. https://doi.org/10.1145/2470654.2481386
  13. Google. 2023a. Google Activity Recognition API. https://developers.google.com/location-context/activity-recognition
  14. Google. 2023b. Object Detection Task Guide. https://developers.google.com/mediapipe/solutions/vision/object_detector
  15. Ego4d: Around the World in 3,000 Hours of Egocentric Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18995–19012. https://doi.org/10.1109/CVPR52688.2022.01842
  16. Content-Based Access to Video Objects: Temporal Segmentation, Visual Summarization, and Feature Extraction. Signal Processing 66, 2 (1998), 261–280. https://doi.org/10.1016/S0165
  17. SenseCam: A Retrospective Memory Aid. In UbiComp 2006: Ubiquitous Computing: 8th International Conference, UbiComp 2006 Orange County, CA, USA, September 17-21, 2006 Proceedings 8. Springer, Springer, 177–193. https://doi.org/10.1007/1185356_11
  18. Genline and Genform: Two Tools for Interacting With Generative Language Models in a Code Editor. In Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology. 145–147. https://doi.org/10.1145/3474349.3480209
  19. Takeo Kanade and Martial Hebert. 2012. First-Person Vision. Proc. IEEE 100, 8 (2012), 2442–2453. https://doi.org/6
  20. Getting Off the Treadmill: Evaluating Walking User Interfaces for Mobile Devices in Public Spaces. In Proceedings of the 10th International Conference on Human Computer Interaction With Mobile Devices and Services. 109–118. https://doi.org/10.1145/1409240.1409253
  21. A Review on Video-Based Human Activity Recognition. Computers 2, 2 (2013), 88–131. https://doi.org/10.1109/CONFLUENCE.2016.7508177
  22. Activity Recognition on Smartphones via Sensor-Fusion and KDA-Based SVMs. International Journal of Distributed Sensor Networks 10, 5 (2014), 503291. https://doi.org/10.1155/2014/503291
  23. Activity Forecasting. In Computer Vision-ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12. Springer, Springer, 201–214. https://doi.org/10.1007/978-3-642-33765-_15
  24. Large Language Models Are Zero-Shot Reasoners. ArXiv Preprint ArXiv:2205.11916 (2022). https://doi.org/10.48550/arXiv.2205.11916
  25. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 213–224. https://doi.org/10.1145/3242587.3242609
  26. Gierad Laput and Chris Harrison. 2019. Sensing Fine-Grained Hand Activity With Smartwatches. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3290605.3300568
  27. Viband: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 321–333. https://doi.org/10.1145/2984511.2984582
  28. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 3986–3999. https://doi.org/10.1145/3025453.3025773
  29. Oscar D Lara and Miguel A Labrador. 2012. A Survey on Human Activity Recognition Using Wearable Sensors. IEEE Communications Surveys & Tutorials 15, 3 (2012), 1192–1209. https://doi.org/10.1007/978-3-031-24352-_5
  30. Solving Quantitative Reasoning Problems With Language Models. ArXiv Preprint ArXiv:2206.14858 (2022). https://doi.org/10.48550/arXiv.2206.14858
  31. Blip-2: Bootstrapping Language-Image Pre-Training With Frozen Image Encoders and Large Language Models. ArXiv Preprint ArXiv:2301.12597 (2023). https://doi.org/10.48550/arXiv.2301.12597
  32. Context-Aware Online Adaptation of Mixed Reality Interfaces. In UIST ’19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3332165.3347945
  33. Modeling and Improving Text Stability in Live Captions. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA, 208). ACM, 1–9. https://doi.org/10.1145/3544549.3585609
  34. Visual Captions: Augmenting Verbal Communication With On-the-Fly Visuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI). ACM. https://doi.org/10.1145/3544548.3581566
  35. CrossA11y: Identifying Video Accessibility Issues via Cross-Modal Grounding. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 43, 14 pages. https://doi.org/10.1145/3526113.3545703
  36. Designing EyeTap Digital Eyeglasses for Continuous Lifelong Capture and Sharing of Personal Experiences. Alt. Chi, Proc. CHI 2005 (2005). https://doi.org/10.1007/978-3-319-07788-_27
  37. SwitchBack: Using Focus and Saccade Tracking to Guide Users’ Attention for Mobile Task Resumption. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 2953–2962. https://doi.org/10.1145/2702123.2702539
  38. Drunk User Interfaces: Determining Blood Alcohol Level Through Everyday Smartphone Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3173574.3173808
  39. Human Physical Activity Recognition Based on Computer Vision With Deep Learning Model. In 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings. IEEE, IEEE, 1–6. https://doi.org/10.1109/I2MTC.2016.7520541
  40. Wearable Subtitles: Augmenting Spoken Communication With Lightweight Eyewear for All-Day Captioning. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 1108–1120. https://doi.org/10.1145/3379337.3415817
  41. Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–18. https://doi.org/10.1145/3526113.3545616
  42. Manoj Plakal and Dan. Ellis. 2020. YAMNet. https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
  43. Category-Specific Video Summarization. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13. Springer, Springer, 540–555. https://doi.org/10.1007/978-3-319-10599-_35
  44. BodyBeat: A Mobile System for Sensing Non-Speech Body Sounds. In MobiSys, Vol. 14. 2–594. https://doi.org/10.1145/2594368.2594386
  45. Multimodal Human Action Recognition in Assistive Human-Robot Interaction. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, IEEE, 2702–2706. https://doi.org/10.1109/ICASSP.2016.7472168
  46. Physical Disabilities and Computing Technologies: An Analysis of Impairments. In The Human-Computer Interaction Handbook. CRC Press, 87–110. https://doi.org/10.1201/9781410615862
  47. Hyun Soo Park and Jianbo Shi. 2015. Social Saliency Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4777–4785. https://doi.org/10.1109/CVPR.2015.7299110
  48. QUILL: Query Intent With Large Language Models Using Retrieval Augmentation and Multi-Stage Distillation. ArXiv Preprint ArXiv:2210.15718 (2022). https://doi.org/10.48550/arXiv.2210.15718
  49. Human Action Recognition From Various Data Modalities: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022). https://doi.org/10.1109/TPAMI.2022.3183112
  50. RainCheck: Overcoming Capacitive Interference Caused by Rainwater on Smartphones. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 464–471. https://doi.org/10.1145/3242969.3243028
  51. Enabling Conversational Interaction With Mobile UI Using Large Language Models. ArXiv Preprint ArXiv:2209.08655 (2022). https://arxiv.org/pdf/2209.08655
  52. Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv Preprint ArXiv:2201.11903 (2022). https://doi.org/10.48550/arXiv.2201.11903
  53. Jacob O Wobbrock. 2019. Situationally Aware Mobile Devices for Overcoming Situational Impairments. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. 1–18. https://doi.org/10.1145/3319499.3330292
  54. Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language. ArXiv Preprint ArXiv:2204.00598 (2022). https://doi.org/10.48550/arXiv.2204.00598
  55. Mediapipe Hands: On-Device Real-Time Hand Tracking. ArXiv Preprint ArXiv:2006.10214 (2020). https://arxiv.org/pdf/2006.10214
  56. InstructPipe: Building Visual Programming Pipelines With Human Instructions. https://doi.org/10.48550/arXiv.2312.09672
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 21 likes about this paper.