Memory equalization for CUA grounding
Establish whether memory of prior user-interface interactions (e.g., UI element locations, navigation paths, successful and failed actions) equalizes the grounding performance of small vision–language models relative to large vision–language models in Computer Use Agent tasks; specifically, determine if a small (approximately 7B-parameter) VLM augmented with UI-layout memory attains grounding accuracy comparable to a larger VLM without memory, within a small tolerance as formalized by the Memory Equalization definition.
References
For CUA grounding, we conjecture that memory of UI layouts partially equalizes the models—a warm 7B VLM that “remembers” where the Save button is in Photoshop does not need to re-ground it from scratch. This conjecture remains untested.