Metabook: A System to Automatically Generate Interactive AR Storybooks to Improve Children's Reading (2405.13701v2)
Abstract: Reading is important for children to acquire knowledge, enhance cognitive abilities, and improve language skills. However, current reading methods either offer limited visual presentation, making them less interesting to children, or lack channels for children to share insights and ask questions during reading. AR/VR books provide rich visual cues that address the issue of children's lack of interest in reading, but the high production costs and need for professional expertise limit the volume of AR/VR books and children's choices. We propose Metabook, a system to automatically generate interactive AR storybooks to improve children's reading. Metabook introduces a story-to-3D-book generation scheme and a 3D avatar that combines multiple AI models as a reading companion. We invited six primary and secondary school teachers to conduct a formative study to explore the design considerations for an ideal children's AR reading tool. In the user study, we invited relevant professionals (art, computer science professionals, and a semanticist), 44 children, and six teachers to evaluate Metabook. Our user study shows that Metabook can significantly increase children's interest in reading and deepen their impression of reading materials and vocabulary in books. Teachers acknowledged Metabook's effectiveness in facilitating reading communication and enhancing reading enthusiasm by connecting verbal and visual thinking, expressing high expectations for its future potential in education.
- Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing 22, 10 (2014), 1533–1545.
- Pedagogical agents for fostering question-asking skills in children. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
- Effectiveness of scenario-based learning and augmented reality for nursing students’ attitudes and awareness toward climate change and sustainability. BMC nursing 21, 1 (2022), 245.
- David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52–62.
- Jakki O Bailey and Isabella Schloss. 2023. “Awesomely freaky!” The impact of type on children’s social-emotional perceptions of virtual reality characters. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–10.
- Augusta Baker and Ellin Greene. 1977. Storytelling: Art and technique. (No Title) (1977).
- Communication breakdowns between families and Alexa. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.
- Understanding how children understand robots: Perceived animism in child–robot interaction. International Journal of Human-Computer Studies 69, 7-8 (2011), 539–550.
- Mark Billinghurst. 2002. Augmented reality in education. New horizons for learning 12, 5 (2002), 1–5.
- The magicbook-moving seamlessly between reality and virtuality. IEEE Computer Graphics and applications 21, 3 (2001), 6–8.
- Social robots that can sense and improve student engagement. In 2020 IEEE international conference on teaching, assessment, and learning for engineering (TALE). IEEE, 127–134.
- Lynell Burmark. 2004. Visual Presentations That Prompt, Flash & Transform Here are some great ways to have more visually interesting class sessions. Media and methods 40 (2004), 4–5.
- “My Unconditional Homework Buddy:” Exploring Children’s Preferences for a Homework Companion Robot. In Proceedings of the 22nd Annual ACM Interaction Design and Children Conference. 375–387.
- Probability learning in mathematics using augmented reality: impact on student’s learning gains and attitudes. Interactive Learning Environments 28, 5 (2020), 560–573.
- Personalized interactive characters for toddlers’ learning of seriation from a video presentation. Journal of Applied Developmental Psychology 35, 3 (2014), 148–155.
- AR-Poetry: Enhancing Children’s Motivation in Learning Classical Chinese Poetry via Interactive Augmented Reality. In Proceedings of the Ninth International Symposium of Chinese CHI. 162–166.
- ChatGPT goes to law school. J. Legal Educ. 71 (2021), 387.
- Interactive storytelling for children: A case-study of design and development considerations for ethical conversational AI. International Journal of Child-Computer Interaction 32 (2022), 100403.
- An interactive augmented reality coloring book. In SIGGRAPH Asia 2011 Emerging Technologies. 1–1.
- FImpossible Creations. 2024. eyes-animator. https://assetstore.unity.com/packages/3d/animations/eyes-animator-137246. Accessed: (2024-05-21).
- Gabriel De Ioannes Becker and Eva Hornecker. 2021. Sally&Molly: A children’s book with real-time multiplayer mobile augmented reality. In Extended Abstracts of the 2021 Annual Symposium on Computer-Human Interaction in Play. 80–86.
- Giggle gauge: a self-report instrument for evaluating children’s engagement with technology. In Proceedings of the Interaction Design and Children Conference. 614–623.
- Hughes M Doherty J. 2009. Child development, theory into practice 0-11. Pearson, Harlow, Essex (2009).
- Creating interactive physics education books with augmented reality. In Proceedings of the 24th Australian computer-human interaction conference. 107–114.
- Big Buddy: A Simulated Embodied Moderating System to Mitigate Children’s Reaction to Provocative Situations within Social Virtual Reality. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7.
- Radhika Garg and Subhasree Sengupta. 2020. He is just like me: a study of the long-term use of smart speakers by parents and children. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1–24.
- How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Medical Education 9, 1 (2023), e45312.
- Video analysis of approach-avoidance behaviors of teenagers speaking with virtual agents. In Proceedings of the 15th ACM on International conference on multimodal interaction. 189–196.
- The design of a mixed-reality book: Is it still a real book?. In 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE, 99–102.
- Edutainment with a mixed reality book: a visually augmented illustrative childrens’ book. In Proceedings of the 2008 international conference on advances in computer entertainment technology. 292–295.
- A papier-mâché approach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 216–224.
- Stuart Hargreaves. 2023. ’Words Are Flowing out Like Endless Rain into a Paper Cup’: ChatGPT & Law School Assessments. Legal Educ. Rev. 33 (2023), 69.
- Aligning ai with shared human values. arXiv preprint arXiv:2008.02275 (2020).
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
- Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023).
- Donald Horton and R Richard Wohl. 1956. Mass communication and para-social interaction: Observations on intimacy at a distance. psychiatry 19, 3 (1956), 215–229.
- Shapeclipper: Scalable 3d shape learning from single-view images via geometric and clip-based consistency. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12912–12922.
- ZeroShape: Regression-based Zero-shot Shape Reconstruction. arXiv preprint arXiv:2312.14198 (2023).
- JN Kaderavek and A Hunt. 2009. Children’s Orientation to Book Reading (COB) Scale. Available from author (2009).
- S Karpagavalli and Edy Chandra. 2016. A review on automatic speech recognition architecture and approaches. International Journal of Signal Processing, Image Processing and Pattern Recognition 9, 4 (2016), 393–404.
- Development of an interactive book with augmented reality for teaching and learning geometric shapes. In 7th Iberian Conference on Information Systems and Technologies (CISTI 2012). IEEE, 1–6.
- Augmentation not duplication: Considerations for the design of digitally-augmented comic books. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
- Darko Kovačević. 2023. Use of chatgpt in ESP teaching process. In 2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH). IEEE, 1–5.
- leastsquares. 2024. Overtone - Realistic AI Offline Text to Speech (TTS). https://leastsquares.io/docs/unity/overtone. Accessed: (2024-05-21).
- Dapie: Interactive step-by-step explanatory dialogues to answer children’s why and how questions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–22.
- Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214 (2023).
- Turning your book into a game: improving motivation through tangible interaction and diegetic feedback in an AR mathematics game for children. In Proceedings of the annual symposium on computer-human interaction in play. 73–85.
- Exploring an augmented reality social learning game for elementary school students. In Proceedings of the interaction design and children conference. 508–518.
- Advances in 3D Generation: A Survey. arXiv preprint arXiv:2401.17807 (2024).
- Chung Kwan Lo and Khe Foon Hew. 2017. A critical review of flipped classroom challenges in K-12 education: Possible solutions and recommendations for future research. Research and practice in technology enhanced learning 12 (2017), 1–22.
- Design and validation of an augmented book for spatial abilities development in engineering students. Computers & Graphics 34, 1 (2010), 77–91.
- Children’s attitudes toward reading: A national survey. Reading research quarterly (1995), 934–956.
- A Mehrabian. 1972. Nonverbal communication. Chicago: A1dine Atherton.
- Children’s behavior toward and understanding of robotic and living dogs. Journal of Applied Developmental Psychology 30, 2 (2009), 92–102.
- Oculus. 2024. Oculus Lipsync for Unity Development. https://developer.oculus.com/documentation/unity/audio-ovrlipsync-unity/. Accessed: (2024-05-21).
- GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- The Child’s Conception of the World.(Translated by Joan and Andrew Tomlinson.). Kegan Paul & Company.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
- Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV]
- Jun Rekimoto. 1998. Matrix: A realtime object identification and registration method for augmented reality. In Proceedings. 3rd Asia Pacific Computer Human Interaction (Cat. No. 98EX110). IEEE, 63–68.
- Melissa N Richards and Sandra L Calvert. 2017. Measuring young US children’s parasocial relationships: Toward the creation of a child self-report survey. Journal of Children and Media 11, 2 (2017), 229–240.
- Bernard R Robin. 2015. The effective uses of digital storytelling as a teaching and learning tool. In Handbook of Research on Teaching Literacy Through the Communicative and Visual Arts, Volume II. Routledge, 457–468.
- Bookbuddy: Turning digital materials into interactive foreign language lessons through a voice chatbot. In Proceedings of the sixth (2019) ACM conference on learning@ scale. 1–4.
- The haunted book. In 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE, 163–164.
- Faisal Shafait and Ray Smith. 2010. Table detection in heterogeneous documents.. In Document Analysis Systems (2010-07-07) (ACM International Conference Proceeding Series), David S. Doermann, Venu Govindaraju, Daniel P. Lopresti, and Premkumar Natarajan (Eds.). ACM, 65–72. http://dblp.uni-trier.de/db/conf/das/das2010.html#ShafaitS10
- Brett E Shelton. 2002. Augmented reality and education: Current projects and the potential for classroom learning. New Horizons for Learning 9, 1 (2002).
- Brett E Shelton and Nicholas R Hedley. 2004. Exploring a cognitive basis for learning spatial relationships with augmented reality. Technology, Instruction, Cognition and Learning 1, 4 (2004), 323.
- Aw Kien Sin and Halimah Badioze Zaman. 2010. Live Solar System (LSS): Evaluation of an Augmented Reality book-based educational tool. In 2010 International symposium on information technology, Vol. 1. IEEE, 1–6.
- 3D augmented reality comic book and notes for children using mobile phones. In Proceedings of the 2004 conference on Interaction design and children: building a community. 149–150.
- Ray Smith. 2007. An Overview of the Tesseract OCR Engine. In ICDAR ’07: Proceedings of the Ninth International Conference on Document Analysis and Recognition. IEEE Computer Society, Washington, DC, USA, 629–633. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/33418.pdf
- Ray Smith. 2009. Hybrid Page Layout Analysis via Tab-Stop Detection. In ICDAR ’09: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition. IEEE Computer Society, Washington, DC, USA, 241–245. https://doi.org/10.1109/ICDAR.2009.257
- Adapting the Tesseract Open Source OCR Engine for Multilingual OCR.. In MOCR ’09: Proceedings of the International Workshop on Multilingual OCR (Barcelona, Spain, 2009-07-25) (ACM International Conference Proceeding Series), Venu Govindaraju, Premkumar Natarajan, Santanu Chaudhury, and Daniel P. Lopresti (Eds.). ACM, 1–8. https://doi.org/10/1145/1577802.1577804
- ” Whom would you like to talk with?” exploring conversational agents for children’s linguistic assessment. In Proceedings of the interaction design and children conference. 262–272.
- LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054 (2024).
- Anuj Tewari and John Canny. 2014. What did spot hide? a question-answering game for preschool children. In Proceedings of the SIGCHI conference on human factors in computing systems. 1807–1816.
- TripoSR: Fast 3D Object Reconstruction from a Single Image. arXiv:2403.02151 [cs.CV]
- Alexandre Trilla. 2009. Natural language processing techniques in text-to-speech synthesis and automatic speech recognition. Departament de Tecnologies Media (2009), 1–5.
- Ranjith Unnikrishnan and Ray Smith. 2009. Combined Orientation and Script Detection using the Tesseract OCR Engine. In MOCR ’09: Proceedings of the International Workshop on Multilingual OCR (Barcelona, Spain), Venu Govindaraju, Premkumar Natarajan, Santanu Chaudhury, and Daniel P. Lopresti (Eds.). ACM, New York, NY, USA, 1–7. https://doi.org/10.1145/1577802.1577809
- MathBuilder: A collaborative AR math game for elementary school students. CHI PLAY 2019-Extended Abstracts of the Annual Symposium on Computer-Human Interaction in Play, 731–738.
- Hierarchical multi-task natural language understanding for cross-domain conversational AI: HERMIT NLU. arXiv preprint arXiv:1910.00912 (2019).
- Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV). 52–67.
- Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024 (2023).
- My science tutor: A conversational multimedia virtual tutor for elementary school science. ACM Transactions on Speech and Language Processing (TSLP) 7, 4 (2011), 1–29.
- “Elinor’s Talking to Me!”: Integrating Conversational AI into Children’s Narrative Science Programming. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16.
- Ying Xu and Mark Warschauer. 2020. What are you talking to?: Understanding children’s perceptions of conversational agents. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.
- Xiaoming Zhai. 2022. ChatGPT user experience: Implications for education. Available at SSRN 4312418 (2022).
- StoryDrawer: a child–AI collaborative drawing system to support children’s creative visual storytelling. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–15.
- MathForest: A Tangible Collaborative Game for Developing Children’s Spatial Skills. In 2023 16th International Symposium on Computational Intelligence and Design (ISCID). IEEE, 209–213.
- Augmented creativity: bridging the real and virtual worlds to enhance creative play. In SIGGRAPH Asia 2015 Mobile Graphics and Interactive Applications. 1–7.
- Interactive narration with a child: impact of prosody and facial expressions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 23–31.