RealitySummary: Exploring On-Demand Mixed Reality Text Summarization and Question Answering using Large Language Models (2405.18620v2)
Abstract: LLMs are gaining popularity as tools for reading and summarization aids. However, little is known about their potential benefits when integrated with mixed reality (MR) interfaces to support everyday reading assistants. We developed RealitySummary, an MR reading assistant that seamlessly integrates LLMs with always-on camera access, OCR-based text extraction, and augmented spatial and visual responses in MR interfaces. Developed iteratively, RealitySummary evolved across three versions, each shaped by user feedback and reflective analysis: 1) a preliminary user study to understand user perceptions (N=12), 2) an in-the-wild deployment to explore real-world usage (N=11), and 3) a diary study to capture insights from real-world work contexts (N=5). Our findings highlight the unique advantages of combining AI and MR, including an always-on implicit assistant, minimal context switching, and spatial affordances, demonstrating significant potential for future LLM-MR interfaces beyond traditional screen-based interactions.
- [n.d.]. Learning With The Times’s ’Anatomy of a Scene’. https://www.nytimes.com/2023/03/07/learning/learning-with-the-timess-anatomy-of-a-scene.html
- [n.d.]. Russian invasion of Ukraine. https://en.wikipedia.org/wiki/Russian_invasion_of_Ukraine
- Jiban Adhikary and Keith Vertanen. 2021. Text entry in virtual environments using speech and a midair keyboard. IEEE Transactions on Visualization and Computer Graphics 27, 5 (2021), 2648–2658.
- SummaryLens–A Smartphone App for Exploring Interactive Use of Automated Text Summarization in Everyday Life. In 27th International Conference on Intelligent User Interfaces. 93–96.
- Zero-Shot Opinion Summarization with GPT-3. arXiv preprint arXiv:2211.15914 (2022).
- The magicbook-moving seamlessly between reality and virtuality. IEEE Computer Graphics and applications 21, 3 (2001), 6–8.
- John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
- Firefox voice: an open and extensible voice assistant built upon the web. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–18.
- CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
- Marvista: A Human-AI Collaborative Reading Tool. arXiv preprint arXiv:2207.08401 (2022).
- Augmenting static visualizations with paparvis designer. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
- Kun-Hung Cheng. 2017. Reading an augmented reality book: An exploration of learners’ cognitive load, motivation, and attitudes. Australasian Journal of Educational Technology 33, 4 (2017).
- Medically aware GPT-3 as a data generator for medical dialogue summarization. In Machine Learning for Healthcare Conference. PMLR, 354–372.
- Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math Textbooks. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–16.
- Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–13.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2019).
- Creating interactive physics education books with augmented reality. In Proceedings of the 24th Australian computer-human interaction conference. 107–114.
- Automatic text summarization: A comprehensive survey. Expert systems with applications 165 (2021), 113679.
- Docudesk: An interactive surface for creating and rehydrating many-to-many linkages among paper and digital documents. In 2008 3rd IEEE International Workshop on Horizontal Interactive Human Computer Systems. IEEE, 25–28.
- Exploring the placement and design of word-scale visualizations. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2291–2300.
- Google. [n.d.]. Semantic Reactor. https://research.google.com/semanticexperiences/semantic-reactor.html. Accessed: 2023-03-18.
- News summarization and evaluation in the era of gpt-3. arXiv preprint arXiv:2209.12356 (2022).
- The design of a mixed-reality book: Is it still a real book?. In 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE, 99–102.
- Replicate and reuse: Tangible interaction design for digitally-augmented physical media objects. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
- Augmenting scientific papers with just-in-time, position-sensitive definitions of terms and symbols. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–18.
- Magic Book with Augmented Reality Technology for Introducing Rare Animal. In 2020 3rd International Conference on Computer and Informatics Engineering (IC2IE). IEEE, 355–360.
- Informal information gathering techniques for active reading. In Proceedings of the SIGCHI conference on human factors in computing systems. 1893–1896.
- Pushpak: Voice command-based ebook navigator. In Proceedings of the 16th International Web for All Conference. 1–2.
- EncounteredLimbs: A room-scale encountered-type haptic presentation using wearable robotic arms. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). IEEE, 260–269.
- ComLittee: Literature Discovery with Personal Elected Author Committees. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- Development of an interactive book with augmented reality for teaching and learning geometric shapes. In 7th Iberian Conference on Information Systems and Technologies (CISTI 2012). IEEE, 1–6.
- Luis A Leiva. 2018. Responsive text summarization. Inform. Process. Lett. 130 (2018), 52–57.
- Holodoc: Enabling mixed reality workspaces that harness physical and digital content. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
- Pacer: fine-grained interactive paper via camera-touch hybrid gestures on a cell phone. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2441–2450.
- RealityTalk: Real-time speech-driven augmented presentation for AR live storytelling. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–12.
- VizByWiki: Mining data visualizations from the web to enrich news articles. In Proceedings of the 2018 World Wide Web Conference. 873–882.
- Potluck: Dynamic documents as personal software.
- Charagraph: Interactive Generation of Charts for Realtime Annotation of Data-Rich Paragraphs. In CHI 2023-ACM Conference on Human Factors in Computing Systems (CHI 2023). ACM.
- Chameleon: Bringing Interactivity to Static Digital Documents. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
- Fabrice Matulic and Moira C Norrie. 2012. Supporting active reading on pen and touch-operated tabletops. In Proceedings of the International Working Conference on Advanced Visual Interfaces. 612–619.
- Fabrice Matulic and Moira C Norrie. 2013. Pen and touch gestural environment for document editing on interactive tabletops. In Proceedings of the 2013 ACM international conference on Interactive tabletops and surfaces. 41–50.
- Gesture-supported document creation on pen and touch tabletops. In CHI’13 Extended Abstracts on Human Factors in Computing Systems. 1191–1196.
- Metatation: Annotation as implicit interaction to bridge close and distant reading. ACM Transactions on Computer-Human Interaction (TOCHI) 24, 5 (2017), 1–41.
- Teachable reality: Prototyping tangible augmented reality with everyday objects by leveraging interactive machine teaching. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
- Sharon Oviatt. 2007. Multimodal interfaces. The human-computer interaction handbook (2007), 439–458.
- Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- Jonas Parnow and Marian Dörk. 2015. Micro visualizations: Data-driven typography and graphical text enhancement. In Proc. IEEE InfoVis Posters. 12–13.
- Matt Payne. 2022. State of the Art GPT-3 Summarizer For Any Size Document or Format. https://www.width.ai/post/gpt3-summarizer.
- Anne Peirson-Smith. 2013. Fashioning the fantastical self: An examination of the cosplay dress-up phenomenon in Southeast Asia. Fashion Theory 17, 1 (2013), 77–111.
- XLibris: The active reading machine. In CHI 98 conference summary on Human factors in computing systems. 22–23.
- Dually noted: layout-aware annotations with smartphone augmented reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15.
- Shwetha Rajaram and Michael Nebeling. 2022. Paper trail: An immersive authoring system for augmented reality instructional experiences. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16.
- Know what you don’t know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).
- Thomas A Robinson and Hillary P Rodrigues. 2022. World Religions: A Guide to the Essentials. Baker Academic.
- Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv preprint arXiv:1608.07323 (2016).
- A. J. Sellen and R. H. R. Harper. 2002. The Myth of the Paperless Office. MIT Press.
- Brett E Shelton and Nicholas R Hedley. 2004. Exploring a cognitive basis for learning spatial relationships with augmented reality. Technology, Instruction, Cognition and Learning 1, 4 (2004), 323.
- SOCRAR: Semantic OCR through Augmented Reality. In Proceedings of the 12th International Conference on the Internet of Things. 25–32.
- Affinity lens: data-assisted affinity diagramming with augmented reality. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.
- Texsketch: Active diagramming through pen-and-ink annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
- Craig S Tashman and W Keith Edwards. 2011a. Active reading and its discontents: the situations, problems and ideas of readers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2927–2936.
- Craig S Tashman and W Keith Edwards. 2011b. LiquidText: A flexible, multitouch environment to support active reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3285–3294.
- Conversations with documents: An exploration of document-centered assistance. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 43–52.
- Bret Victor. 2011. Explorable explanations.
- Verse: Bridging screen readers and voice assistants for enhanced eyes-free web search. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 414–426.
- Pierre Wellner. 1991. The DigitalDesk calculator: tangible manipulation on a desk top display. In Proceedings of the 4th annual ACM symposium on User interface software and technology. 27–33.
- WikiTUI: leaving digital traces in physical books. In Proceedings of the international conference on Advances in computer entertainment technology. 264–265.
- Magpad: a near surface augmented reading system for physical paper and smartphone coupling. In Adjunct Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 103–104.
- ConceptEVA: Concept-Based Interactive Exploration and Customization of Document Summaries. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
- Design of Paper Book Oriented Augmented Reality Collaborative Annotation System for Science Education. In 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 417–421.
- QOOK: enhancing information revisitation for active reading with a paper book. In Proceedings of the 8th International Conference on Tangible, Embedded and Embodied Interaction. 125–132.