Android in the Wild (AitW) Dataset
- Android in the Wild is a benchmark resource comprising two distinct datasets for studying natural language device-control and mobile steganalysis in realistic scenarios.
- The device-control dataset features 715,142 multi-modal Android interaction episodes, aiding the development of agents that can execute natural language instructions on GUIs.
- The mobile steganalysis dataset provides image-based data from Android stego apps, enabling the training of classifiers to detect hidden messages under non-stationary conditions.
Android in the Wild (AitW) is a designation applied to two large-scale datasets for Android systems research in distinct domains: (1) device-control via natural language and GUI manipulation, introduced by Rawles et al. (Rawles et al., 2023), and (2) mobile steganalysis through image-data generated from Android stego apps (Chen et al., 2018). Each dataset represents a benchmark-scale resource designed to study and evaluate robust, generalizable agents or classifiers operating under realistic, non-stationary, “in-the-wild” conditions. Their construction methodologies, data modalities, and use cases differ fundamentally, but both have shaped research trajectories and evaluation protocols for their respective fields.
1. Device-Control AitW: Dataset Overview and Structure
The device-control AitW dataset (Rawles et al., 2023) is a multi-modal corpus of Android device interactions collected to advance the development of agents capable of executing natural language instructions by manipulating graphical user interfaces (GUIs). It comprises 715,142 episodes, each documenting the execution