MarkupLens: An AI-Powered Tool to Support Designers in Video-Based Analysis at Scale (2403.05201v1)
Abstract: Video-Based Design (VBD) is a design methodology that utilizes video as a primary tool for understanding user interactions, prototyping, and conducting research to enhance the design process. AI can be instrumental in video-based design by analyzing and interpreting visual data from videos to enhance user interaction, automate design processes, and improve product functionality. In this study, we explore how AI can enhance professional video-based design with a State-of-the-Art (SOTA) deep learning model. We developed a prototype annotation platform (MarkupLens) and conducted a between-subjects eye-tracking study with 36 designers, annotating videos with three levels of AI assistance. Our findings indicate that MarkupLens improved design annotation quality and productivity. Additionally, it reduced the cognitive load that designers exhibited and enhanced their User Experience (UX). We believe that designer-AI collaboration can greatly enhance the process of eliciting insights in video-based design.
- Onur Asan and Enid Montague. 2014. Using video-based observation research methods in primary care health encounters to evaluate complex interactions. 21, 4 (2014), 161–170. https://doi.org/10.14236/jhi.v21i4.72
- Optical brain monitoring for operator training and mental workload assessment. 59, 1 (2012), 36–47. https://doi.org/10.1016/j.neuroimage.2011.06.023
- Fabio Babiloni. 2019. Mental Workload Monitoring: New Perspectives from Neuroscience. In Human Mental Workload: Models and Applications (Cham) (Communications in Computer and Information Science), Luca Longo and Maria Chiara Leva (Eds.). Springer International Publishing, 3–19. https://doi.org/10.1007/978-3-030-32423-0_1
- Laura Baecher and Bede McCormack. 2015. The impact of video review on supervisory conferencing. 29, 2 (2015), 153–173. https://doi.org/10.1080/09500782.2014.992905
- Dazed: measuring the cognitive load of solving technical interview problems at the whiteboard. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results (Gothenburg Sweden). ACM, 93–96. https://doi.org/10.1145/3183399.3183415
- An interactive tool for manual, semi-automatic and automatic video annotation. 131 (2015), 88–99. https://doi.org/10.1016/j.cviu.2014.06.015
- Distracted worker: Using pupil size and blink rate to detect cognitive load during manufacturing tasks. 106 (2023), 103867. https://doi.org/10.1016/j.apergo.2022.103867
- Crowdsourcing – A Step Towards Advanced Machine Learning. 132 (2018), 632–642. https://doi.org/10.1016/j.procs.2018.05.062
- The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems. In 2015 International Conference on Healthcare Informatics (2015-10). 160–169. https://doi.org/10.1109/ICHI.2015.26
- To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making. 5 (2021), 188:1–188:21. Issue CSCW1. https://doi.org/10.1145/3449287
- Experimental evaluation of eye-blink parameters as a drowsiness measure. 89, 3 (2003), 319–325. https://doi.org/10.1007/s00421-003-0807-5
- Eye activity as a measure of human mental effort in HCI. In Proceedings of the 16th international conference on Intelligent user interfaces (New York, NY, USA) (IUI ’11). Association for Computing Machinery, 315–318. https://doi.org/10.1145/1943403.1943454
- Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations. 7 (2023), 370:1–370:32. Issue CSCW2. https://doi.org/10.1145/3610219
- Eye movements in reading and information processing: Keith Rayner’s 40year legacy. 86 (2016), 1–19. https://doi.org/10.1016/j.jml.2015.07.004
- Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. 13, 3 (1989), 319–340. https://doi.org/10.2307/249008 Publisher: Management Information Systems Research Center, University of Minnesota.
- User Acceptance of Computer Technology: A Comparison of Two Theoretical Models. 35, 8 (1989), 982–1003. https://doi.org/10.1287/mnsc.35.8.982 Publisher: INFORMS.
- Cardiovascular and eye activity measures as indices for momentary changes in mental effort during simulated flight. 51, 9 (2008), 1295–1319. https://doi.org/10.1080/00140130802120267
- Evaluating and reducing cognitive load should be a priority for machine learning in healthcare. 28, 7 (2022), 1331–1333. https://doi.org/10.1038/s41591-022-01833-z Number: 7 Publisher: Nature Publishing Group.
- The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations. https://doi.org/10.48550/arXiv.2107.13509 arXiv:2107.13509 [cs]
- The effects of driving environment complexity and dual tasking on drivers’ mental workload and eye blink behavior. 40 (2016), 78–90. https://doi.org/10.1016/j.trf.2016.04.007
- A Case Study on the Design and Use of an Annotation and Analytical Tool Tailored To Lead Climbing. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (New York, NY, USA) (CHI EA ’23). Association for Computing Machinery, 1–8. https://doi.org/10.1145/3544549.3573876
- Sketch-based Video A Storytelling for UX Validation in AI Design for Applied Research. CHI Extended Abstracts (2020). https://doi.org/10.1145/3334480.3375221
- Video annotation tools: A Review. In 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). 911–914. https://doi.org/10.1109/ICACCCN.2018.8748669
- Joseph H Goldberg and Xerxes P Kotval. 1999. Computer interface evaluation using eye movements: methods and constructs. 24, 6 (1999), 631–645. https://doi.org/10.1016/S0169-8141(98)00068-7
- Canadian Centre for Occupational Health Government of Canada and Safety. 2023. CCOHS: Office Ergonomics - Positioning the Monitor. https://www.ccohs.ca/oshanswers/ergonomics/office/monitor_positioning.html Last Modified: 2023-06-13.
- Looking Away and Catching Up: Dealing with Brief Attentional Disconnection in Synchronous Groupware. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland Oregon USA). ACM, 2221–2235. https://doi.org/10.1145/2998181.2998226
- Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology. Vol. 52. Elsevier, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
- Impulsive decision making and working memory. 29, 2 (2003), 298–306. https://doi.org/10.1037/0278-7393.29.2.298
- Human Performance Research Group at NASA’s Ames Research Center. 2022. NASA Task Load Index (NASA-TLX) Paper and Pencil Version Instruction Manual. NASA, Moffett Field, CA. Available at: https://humansystems.arc.nasa.gov/groups/tlx/tlxpaperpencil.php.
- How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection. 11, 1 (2021), 1–9. https://doi.org/10.1038/s41398-021-01224-x Number: 1 Publisher: Nature Publishing Group.
- Markup as you talk: establishing effective memory cues while still contributing to a meeting. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (Seattle Washington USA). ACM, 349–358. https://doi.org/10.1145/2145204.2145260
- An innovative web-based collaborative platform for video annotation. 70, 1 (2014), 413–432. https://doi.org/10.1007/s11042-013-1419-7
- How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation. https://doi.org/10.48550/arXiv.2205.07874 arXiv:2205.07874 [cs]
- Alan Latham and Peter R H Wood. 2015. Inhabiting Infrastructure: Exploring the Interactional Spaces of Urban Cycling. 47, 2 (2015), 300–319. https://doi.org/10.1068/a140049p
- Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 1950–1965. https://proceedings.neurips.cc/paper_files/paper/2022/file/0cde695b83bd186c1fd456302888454c-Paper-Conference.pdf
- Supporting task resumption using visual feedback. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (Baltimore Maryland USA). ACM, 767–777. https://doi.org/10.1145/2531602.2531710
- Video artifacts for design: bridging the Gap between abstraction and detail. In Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques (New York City New York USA). ACM, 72–82. https://doi.org/10.1145/347642.347666
- Eye blink rate increases as a function of cognitive load during an auditory oddball paradigm. 736 (2020-09-25), 135293. https://doi.org/10.1016/j.neulet.2020.135293
- Xiangming Mu. 2010. Towards effective video annotation: An approach to automatically link notes with video content. 55, 4 (2010), 1752–1763. https://doi.org/10.1016/j.compedu.2010.07.021
- The Use of Digital Video Annotation in Teacher Training: The Teachers’ Perspectives. 69 (2012), 600–613. https://doi.org/10.1016/j.sbspro.2012.11.452
- The Challenge of Data Annotation in Deep Learning—A Case Study on Whole Plant Corn Silage. 22, 4 (2022), 1596. https://doi.org/10.3390/s22041596
- Keith Rayner. 1998. Eye movements in reading and information processing: 20 years of research. 124, 3 (1998), 372–422. https://doi.org/10.1037/0033-2909.124.3.372 Place: US Publisher: American Psychological Association.
- Dario D Salvucci. 1999. Inferring intent in eye-based interfaces: tracing eye movements with process models. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 254–261.
- Applying the User Experience Questionnaire (UEQ) in Different Evaluation Scenarios. 383–392. https://doi.org/10.1007/978-3-319-07668-3_37
- Construction of a Benchmark for the User Experience Questionnaire (UEQ). 4 (2017), 40–44. https://doi.org/10.9781/ijimai.2017.445
- Justin Spinney. 2011. A Chance to Catch a Breath: Using Mobile Video Ethnography in Cycling Research. 6, 2 (2011), 161–182. https://doi.org/10.1080/17450101.2011.552771
- Deborah Tatar. 1989. Using video-based observation to shape the design of a new technology. ACM SIGCHI Bulletin 21, 2 (1989), 108–111. https://doi.org/10.1145/70609.70628
- Russ Tedrake. 2023. CH. 9 - Object Detection and Segmentation, Robotic Manipulation. http://manipulation.mit.edu
- J. A. Veltman and A. W. K. Gaillard. 1998. Physiological workload reactions to increasing levels of task difficulty. 41, 5 (1998), 656–669. https://doi.org/10.1080/001401398186829 Publisher: Taylor & Francis eprint: https://doi.org/10.1080/001401398186829.
- Viswanath Venkatesh. 2000. Determinants of Perceived Ease of Use: Integrating Control, Intrinsic Motivation, and Emotion into the Technology Acceptance Model. 11, 4 (2000), 342–365. https://doi.org/10.1287/isre.11.4.342.11872 Publisher: INFORMS.
- Laurie Vertelney. 1989. Using video to prototype user interfaces. ACM SIGCHI Bulletin 21, 2 (1989), 57–61. https://doi.org/10.1145/70609.70615
- Deep Learning for Computer Vision: A Brief Review. 2018 (2018), e7068349. https://doi.org/10.1155/2018/7068349
- MarkIt: A Collaborative Artificial Intelligence Annotation Platform Leveraging Blockchain For Medical Imaging Research. 4 (2021), 10.30953/bhty.v4.176. https://doi.org/10.30953/bhty.v4.176
- Salu Ylirisku and Jacob Buur. 2007a. Making sense and editing videos. In Designing with video: Focusing the user-centred design process. Springer London, 86––135. https://doi.org/10.1007/978-1-84628-961-3_2
- Salu Ylirisku and Jacob Buur. 2007b. Studying what people do. In Designing with video: Focusing the user-centred design process. Springer London, 36–85. https://doi.org/10.1007/978-1-84628-961-3_2