AutoDroid: LLM-powered Task Automation in Android (2308.15272v4)

Published 29 Aug 2023 in cs.AI and cs.SE

Abstract: Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of LLMs in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified LLM. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}.

PDF Abstract

AutoDroid: LLM-powered Task Automation in Android

The paper "AutoDroid: LLM-powered Task Automation in Android" introduces a novel system designed to automate tasks on Android applications using LLMs. The system aims to solve the challenge of scalability inherent in existing mobile task automation approaches by leveraging the advanced language comprehension and reasoning abilities of LLMs.

Overview

AutoDroid shifts the perspective from developer-centered task preparation to a model-centric approach. This methodology involves handling task comprehension and execution through a unified LLM. The system integrates both general knowledge from LLMs and application-specific information via automated dynamic analysis, facilitating the automation of any task within any Android app without human intervention.

The primary components of AutoDroid comprise a functionality-aware UI representation method that links the UI to the LLM, exploration-based memory injection to enhance app-specific domain knowledge within the LLM, and a multi-granularity query optimization module aimed at reducing inference costs.

Numerical Results

The empirical evaluation of AutoDroid was conducted using a new benchmark focused on memory-augmented Android task automation. This benchmark includes 158 commonly asked tasks across various applications. AutoDroid showcased an action generation precision of 90.9% and a task completion success rate of 71.3%. Notably, it outperformed GPT-4-powered baselines by 36.4% and 39.7% in task completion rates and reduced the cost of querying LLMs by 51.7%.

Methodological Insights

GUI Representation: AutoDroid employs a simplified HTML-style representation of the UI, enabling the LLM to process UI states and actions textually. This approach optimizes the use of LLM reasoning by structuring graphical information into a format that LLMs inherently understand.
Exploration-Based Memory Injection: By dynamically exploring app UIs and constructing UI Transition Graphs (UTGs), AutoDroid extracts domain-specific knowledge through LLMs to compile simulated tasks. This knowledge augments LLMs with insights into UI functionalities, aiding decision-making during task automation.
Cost Optimization: To address the computational intensity of querying LLMs, AutoDroid strategically simplifies and organizes prompts, thereby minimizing token length and reducing query frequency. This not only enhances efficiency but also makes the approach economically feasible for both on-device and cloud-based LLMs.

Implications and Future Developments

The integration of LLMs within mobile task automation systems like AutoDroid represents a significant advancement in intelligent system capabilities, opening avenues for enhanced personal assistant functionalities on mobile devices. It highlights a promising trajectory toward leveraging LLMs in diverse domains, potentially broadening their application beyond language-based tasks to more complex interaction-driven roles.

In terms of future developments, researchers could further investigate the memory injection mechanisms to enrich LLM knowledge bases, explore improvements in LLMs' ability to interpret complex GUIs, and optimize the balance between local processing and cloud-based LLM invocation to enhance service speed and reliability.

Overall, AutoDroid showcases the evolving potential of LLMs in bridging the gap between application-specific demands and generalized machine intelligence, paving the way for future innovations in AI-driven task automation systems.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Hao Wen (52 papers)
Yuanchun Li (37 papers)
Guohong Liu (2 papers)
Shanhui Zhao (4 papers)
Tao Yu (282 papers)
Toby Jia-Jun Li (57 papers)
Shiqi Jiang (27 papers)
Yunhao Liu (35 papers)
Yaqin Zhang (14 papers)
Yunxin Liu (58 papers)

Citations (39)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/ComputerPapers/status/1767666735547629770

YouTube

Show All Videos