Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces

Published 19 Apr 2025 in cs.HC and cs.AI | (2504.14320v2)

Abstract: Text-based prompting remains the predominant interaction paradigm in generative AI, yet it often introduces friction for novice users such as small business owners (SBOs), who struggle to articulate creative goals in domain-specific contexts like advertising. Through a formative study with six SBOs in the United Kingdom, we identify three key challenges: difficulties in expressing brand intuition through prompts, limited opportunities for fine-grained adjustment and refinement during and after content generation, and the frequent production of generic content that lacks brand specificity. In response, we present ACAI (AI Co-Creation for Advertising and Inspiration), a multimodal generative AI tool designed to support novice designers by moving beyond traditional prompt interfaces. ACAI features a structured input system composed of three panels: Branding, Audience and Goals, and the Inspiration Board. These inputs allow users to convey brand-relevant context and visual preferences. This work contributes to HCI research on generative systems by showing how structured interfaces can foreground user-defined context, improve alignment, and enhance co-creative control in novice creative workflows.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces ACAI, a multimodal generative AI tool that uses structured prompting to reduce cognitive load and align outputs with user goals.
It employs a three-panel interface that merges textual and visual inputs to accurately capture creative intent in advertising and branding.
By leveraging interactive style selection and iterative processing, ACAI offers practical solutions for co-creative design in novice user environments.

Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces

Introduction

The paper "Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces" explores the limitations of current generative AI systems, particularly for novice users such as small business owners (SBOs) attempting to leverage these technologies for advertising and branding. The authors identify persistent issues with text-based prompting that frequently lead to cognitive burdens and the generation of generic outputs misaligned with user expectations. To address these, the paper introduces ACAI (AI Co-Creation for Advertising and Inspiration), a multimodal generative AI tool designed to enhance user experience and output alignment.

Figure 1: Formative Study: 6 UK SBOs identify visual elements they would like to incorporate into their branding.\vspace{-10pt

Formative Study

The authors conducted a formative study involving semi-structured interviews with six SBOs in Manchester, UK. This study highlighted the cognitive effort required by SBOs to formulate effective prompts and the dissatisfaction with AI-generated outputs that were generic and not grounded in the brand tone or audience needs. SBOs also expressed concern about losing creative control, signaling a need for AI systems that enable co-creation and preserve user agency.

Design Requirements and ACAI Architecture

From this study, design requirements were derived, emphasizing the importance of multimodal inputs and interactive style selection to effectively express creative intent. ACAI was thus developed featuring structured interfaces composed of three main panels:

Branding Asset Panel
Audience Goals Panel
Inspiration Board Panel

These panels allow users to specify color palettes, typography preferences, visual elements, and campaign objectives, which are then processed into a "super prompt" using a Multimodal LLM (MLLM). This structured prompt ensures alignment with user-defined goals.

Figure 2: ACAI Architecture \vspace{-15pt

Implementation Details

ACAI's architecture supports co-creation by synthesizing user inputs into multimodal prompts. The system processes these inputs through three layers: User Input Layer, Prompt Processing Layer, and Output Generation Layer. The system's core innovation lies in its interface, which combines textual and visual inputs, supporting novice users in effectively capturing and expressing their branding vision through structured steps.

By leveraging Gemini 1.5 Pro, ACAI offers interactive style selection, supports user agency, and facilitates iterative design processes, significantly reducing the friction associated with conventional text-based prompting.

Discussion

The ACAI tool demonstrates a shift from traditional text-based interactions to multimodal interfaces that cater specifically to novice users like SBOs. By structuring user inputs, ACAI enhances semantic alignment and engagement, thus addressing cognitive and usability barriers identified during interviews.

The system is poised to inform future developments in AI interface design by emphasizing inclusivity and task alignment, addressing common promptability challenges encountered by non-expert users.

Conclusion and Future Work

By foregrounding user-defined context, ACAI enhances alignment and promptability in creative workflows for SBOs. The study and tool contribute to co-creative tooling within HCI, presenting an inclusive approach that may be adapted to other domains requiring contextual grounding. There remains potential for expanding ACAI's functionalities, exploring further adaptations, and ensuring broader applicability across diverse user backgrounds and challenges in brand-aligned creative tasks.

Overall, the paper delivers a structured approach to generative AI interactions, offering a strong foundation for enhancing novice user interfaces in generative design domains. Future research could focus on usability studies and interface optimizations for greater adaptability and efficiency.

Markdown Report Issue