SAGE: Smart home Agent with Grounded Execution (2311.00772v2)
Abstract: The common sense reasoning abilities and vast general knowledge of LLMs make them a natural fit for interpreting user requests in a Smart Home assistant context. LLMs, however, lack specific knowledge about the user and their home limit their potential impact. SAGE (Smart Home Agent with Grounded Execution), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM's capabilities to support some of the most critical requirements for a Smart Home assistant. These include: flexible and scalable user preference management ("is my team playing tonight?"), access to any smart device's full functionality without device-specific code via API reading "turn down the screen brightness on my dryer", persistent device state monitoring ("remind me to throw out the milk when I open the fridge"), natural device references using only a photo of the room ("turn on the light on the dresser"), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 75% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).
- L. Wang, C. Ma, X. Feng, Z. Zhang, H. ran Yang, J. Zhang, Z.-Y. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J. rong Wen, “A survey on large language model based autonomous agents,” ArXiv, vol. abs/2308.11432, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261064713
- E. King, H. Yu, S. Lee, and C. Julien, “Sasha: creative goal-oriented reasoning in smart homes with large language models,” arXiv preprint arXiv:2305.09802, 2023.
- Zapier, “Zapier,” https://actions.zapier.com/.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2022.
- H. Chase, “Langchain,” https://github.com/langchain-ai/langchain.
- Y. Zhu, H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, Z. Dou, and J.-R. Wen, “Large language models for information retrieval: A survey,” arXiv preprint arXiv:2308.07107, 2023.
- V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering,” arXiv preprint arXiv:2004.04906, 2020.
- W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers,” CoRR, vol. abs/2002.10957, 2020.
- W. Zhong, L. Guo, Q. Gao, and Y. Wang, “Memorybank: Enhancing large language models with long-term memory,” arXiv preprint arXiv:2305.10250, 2023.
- J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, p. 107–113, jan 2008.
- N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,” arXiv preprint arXiv:2307.03172, 2023.
- “Vosk speech recognition toolbox,” https://github.com/alphacep/vosk-api, accessed: 2023-10-27.
- A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04356
- “Langchain openapi toolkit,” https://python.langchain.com/docs/integrations/toolkits/openapi, accessed: 2023-10-26.
- “Smartthings api,” https://developer.smartthings.com/docs/api/public, accessed: 2023-10-26.
- G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt, “Openclip,” Jul. 2021, if you use this software, please cite it as below. [Online]. Available: https://doi.org/10.5281/zenodo.5143773
- Y. Xu, H. Su, C. Xing, B. Mi, Q. Liu, W. Shi, B. Hui, F. Zhou, Y. Liu, T. Xie, Z. Cheng, S. Zhao, L. Kong, B. Wang, C. Xiong, and T. Yu, “Lemur: Harmonizing natural language and code for language agents,” 2023.
- C.-W. C. Ki, E. Cho, and J.-E. Lee, “Can an intelligent personal assistant (ipa) be your friend? para-friendship development mechanism between ipas and their users,” Computers in Human Behavior, vol. 111, p. 106412, 2020.
- R. D. Manu, S. Kumar, S. Snehashish, and K. Rekha, “Smart home automation using iot and deep learning,” International Research Journal of Engineering and Technology, vol. 6, no. 4, pp. 1–4, 2019.
- P. J. Rani, J. Bakthakumar, B. P. Kumaar, U. P. Kumaar, and S. Kumar, “Voice controlled home automation system using natural language processing (nlp) and internet of things (iot),” in 2017 Third International Conference on Science Technology Engineering & Management (ICONSTEM), 2017, pp. 368–373.
- E. Luger and A. Sellen, “”Like having a really bad PA”: The gulf between user expectation and experience of conversational agents,” in Proceedings of CHI 2016, 2016.
- H. Yu, J. Hua, and C. Julien, “Dataset: Analysis of IFTTT recipes to study how humans use internet-of-things (iot) devices,” CoRR, vol. abs/2110.00068, 2021.
- “Smartthings smartapp documentation,” https://developer.smartthings.com/docs/connected-services/create-a-smartapp, accessed: 2023-10-27.
- D. Dalal and B. V. Galbraith, “Evaluating sequence-to-sequence learning models for if-then program synthesis,” CoRR, vol. abs/2002.03485, 2020. [Online]. Available: https://arxiv.org/abs/2002.03485
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” Advances in neural information processing systems, vol. 35, pp. 22 199–22 213, 2022.
- S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” arXiv preprint arXiv:2305.10601, 2023.
- S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive apis,” arXiv preprint arXiv:2305.15334, 2023.
- Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang, B. Qian, S. Zhao, R. Tian, R. Xie, J. Zhou, M. Gerstein, D. Li, Z. Liu, and M. Sun, “Toolllm: Facilitating large language models to master 16000+ real-world apis,” 2023.
- Y. Ge, W. Hua, J. Ji, J. Tan, S. Xu, and Y. Zhang, “Openagi: When llm meets domain experts,” arXiv preprint arXiv:2304.04370, 2023.