LLM Applications in HCI
- LLM applications in HCI are innovative methods integrating generative AI into interfaces, research tools, and personalized user experiences.
- They utilize diverse architectures like LAUI and system-wide shortcut layers to offer context-rich, real-time interaction across multiple domains.
- Methodological advances include automated qualitative analysis and standardized evaluation metrics, ensuring rigorous, scalable, and ethical HCI research.
LLMs have become pivotal in advancing human–computer interaction (HCI) by augmenting interface capabilities, automating research methodologies, supporting personalized and accessible user experiences, and introducing sophisticated challenges at the intersection of AI, usability, culture, and ethics. This article provides an authoritative overview of leading research efforts and state-of-the-art applications of LLMs within HCI, drawing on recent empirical studies, systematic reviews, and technical frameworks.
1. Domains and Roles of LLMs in HCI
A systematic review of CHI literature indicates that LLMs are integrated across ten distinct HCI application domains, including communication and writing, education, accessibility, responsible computing, programming, reliability/validity assessment, well-being/health, design, creativity, and augmenting human capabilities (Pang et al., 22 Jan 2025). The roles of LLMs in HCI can be categorized as:
- System engines: Providing generative or computational capabilities within user-facing artifacts (e.g., automatic text generation, multimodal input processing).
- Research tools: Automating tasks such as literature review, qualitative coding, or synthetic data generation.
- Simulated participants/users: Serving as stand-ins for users in research scenarios or persona-driven prototyping.
- Objects of paper: LLMs themselves being audited for bias, non-determinism, and error propagation.
- User perception capture: Measuring users’ trust, satisfaction, and interactions with LLM-driven systems.
This taxonomy facilitates a nuanced understanding of LLM adoption and impact throughout the HCI research pipeline, exposing both the diversity of use cases and methodological implications.
2. Architectural Integration and Application Patterns
LLMs are embedded in diverse architectural patterns supporting both interface and research applications:
- Human-Centered LLM-Agent User Interface (LAUI): Systems like Flute X GPT integrate an LLM agent, a prompt manager, and multimodal sensors (audio, haptic, visual) to proactively facilitate user learning and dynamic workflow adaptation. The LLM agent operates within a state machine and controls hardware/software subsystems, leveraging function-calling APIs and session management (e.g., ), enabling the interface to improvise real-time interaction schemes tailored to each user (Chin et al., 19 May 2024).
- Common sense knowledge architectures: Project OMCS-Br collects structured cultural knowledge via web templates, normalizes and infers semantic relationships, and exposes APIs for HCI applications (e.g., for culturally aware feedback or analogy generation). While the system does not use LLMs directly, LLMs can augment both extraction and NLG modules by enhancing fluency and personalization in context-aware responses (Anacleto et al., 2010).
- System-wide shortcut layers: LLM-for-X attaches a lightweight popup dialog to any application (Office, VSCode, Acrobat, Overleaf), leveraging OS-level hooks and browser extensions for seamless text selection, prompt augmentation, and in-place response insertion. The system uses structured templates and UI Automation to maintain native input stacks and undo/redo functionality (Teufelberger et al., 31 Jul 2024).
Such integration strategies prioritize minimal disruption to existing workflows, context-rich prompting, and support both domain-specific and agnostic LLM interaction.
3. Methodological Advances: Automated Qualitative and Quantitative Evaluation
LLMs have redefined qualitative research workflows in HCI:
- LLM-driven qualitative analysis: By constructing a unified prompt incorporating paper summary, raw data, and context, LLMs perform fully automated qualitative analysis (e.g., open coding), preserving reproducibility and scalability. The SBART cosine similarity metric is used to evaluate semantic correspondence between LLM-generated outputs and traditional human-expert excerpts:
Empirical results show that GPT-4 produces outputs with small standard deviations (SD < 0.02) and matches or exceeds traditional analysis in coherence and integration of multiple themes, though it underperforms with direct quotes or numerical coding (Torii et al., 7 Jan 2024).
- Automatic evaluation of conversational assistants: Bridging HCI and AI, frameworks combine simulated LLM-based user personas with LLM-as-a-judge for large-scale, rapid automatic evaluation of developer-oriented conversational SE agents. The setup captures both qualitative usability feedback and quantitative scoring in multi-turn, user-context-sensitive scenarios, promoting inclusivity and iterative design (Richards et al., 11 Feb 2025).
- Information retrieval in research: Systems leveraging LLMs and structured text analysis (NER, keyword extraction) can extract experimental parameters (participant counts, recruitment methods, etc.) from CHI papers with up to 58% accuracy and a mean absolute error of 7.0–7.63, supporting streamlined question answering and literature review automation. The formal evaluation uses mean absolute error:
with as an indicator function (Serajeh et al., 27 Mar 2024).
This methodological innovation enables both scale and rigor in HCI research pipelines, though challenges persist in semantic nuance capture, context dependence, and error propagation.
4. Personalization, Accessibility, and Cultural Sensitivity
LLMs advance personalized, culturally sensitive, and accessible HCI through several mechanisms:
- Cultural customization: Semantic networks constructed from common sense knowledge enable HCI systems to filter responses based on region, age, education, and other contributor profiles. LLMs supplement these networks by generating alternative phrasings and explanations tailored to individual cultural backgrounds, enriching feedback in systems (e.g., email clients warning of cross-cultural misunderstanding, educational authoring tools recommending culturally relevant narratives) (Anacleto et al., 2010).
- Supporting older adults and accessibility: LLMs enable conversational interfaces to simplify access for users with cognitive and sensory limitations. Applications include natural language tutoring, health monitoring (medication reminders, preliminary guidance), and digital security education. The softmax formulation
depicts the foundational mechanism of output generation. Barriers such as hallucinations, privacy, and prompt literacy are mitigated with user-centered design, transparency, and continuous domain tuning (Kaliappan et al., 12 Nov 2024).
- Adaptive interfaces and emergent workflows: In LAUI models, LLMs continuously paper user performance and preferences, improvising configuration changes (e.g., for haptic feedback, audio overlays) and guiding the user through personalized workflows not pre-defined at design time. The combinatoric configuration space is formalized:
where denotes settings per parameter (Chin et al., 19 May 2024).
These personalization mechanisms expand HCI’s reach to diverse populations while demanding robust privacy, reliability, and transparency frameworks.
5. Usability, Security, and Responsible Computing
Ensuring secure, transparent, and ethical operation is central to the integration of LLMs into HCI:
- Information security: Mixed encryption systems combine symmetric cipher speed with public key management and digital signatures. Applied to HCI, only controlled processes (fingerprinted and verified) can decrypt sensitive documents, ensuring confidentiality and non-repudiation. The encryption process is illustrated via matrix multiplication:
where is the message vector, the key matrix, and the ciphertext. Enhanced methods use tensor products to improve matrix security. Transparent encryption, format protection, and robust key management are essential when LLMs process sensitive interaction data (Yang et al., 2016).
- Responsibility and ethics: Researchers recognize risks of harmful outputs, privacy violations, intellectual integrity concerns, and environmental impact. Strategies include limited disclosure, restricting LLM usage, iterative human validation, and proactive engagement with IRBs and regulatory bodies. There is a plea for systematic ethical frameworks, transparent documentation, and community-driven guidelines (Kapania et al., 28 Mar 2024, Pang et al., 22 Jan 2025).
- Validity and reproducibility: LLM-induced non-determinism, black-box dependency, and insufficient prompt or model version disclosure undermine internal and external validity. Standardization of evaluation metrics and careful documentation are repeatedly called for to support scientific rigor.
A plausible implication is that responsible HCI-LLM research will require a combination of technical measures, governance, and ongoing community reflection to address emergent challenges from widespread LLM deployment.
6. Future Directions and Open Challenges
Research recognises several promising trajectories and unsolved problems:
- Critical and reflective use of LLMs: Systems like EvAlignUX support scholars in developing UX evaluation plans by prompting self-critique and fostering clarity, specificity, and completeness through LLM-driven knowledge graphs, cosine similarity-based metric recommendations, and outcome/risk synthesis. User studies show statistically significant improvements () in evaluation concreteness and clarity (Zheng et al., 23 Sep 2024).
- Methodological refinement: Extending evaluation frameworks to combine reference-free, human-centered automated methods with traditional user studies (especially in conversational systems) is seen as key to overcoming bias, coverage, and scalability bottlenecks (Richards et al., 11 Feb 2025).
- Interdisciplinary synthesis: Integrating cognitive and biomechanical models enables embodied user simulations capable of predicting intent, strategies, and physical movement, informing personalized UI/UX adaptation, automated system testing, and ergonomic benchmarking. The minimization of variational free energy
and muscle control differential equations
exemplify connections between cognitive and physical user modeling (Fleig et al., 19 Aug 2025).
- Benchmarking and transparency: The need for standardized benchmarks, expanded interdisciplinary knowledge repositories, and exhaustive prompt/model disclosure is emphasized in systematic reviews (Pang et al., 22 Jan 2025, Zheng et al., 23 Sep 2024).
Continued development in these directions is expected to produce more robust, scalable, ethically sound, and critically reflective LLM-powered HCI systems and methodologies.
7. Summary Table: Representative LLM Applications in HCI
Domain | Application Pattern | Key Paper |
---|---|---|
Education | LLM reading companions, adaptive feedback | (Chen et al., 28 Mar 2024) |
Accessibility | Conversational support for older adults | (Kaliappan et al., 12 Nov 2024) |
Meta-Research | Automated qualitative analysis, data retrieval | (Torii et al., 7 Jan 2024, Serajeh et al., 27 Mar 2024) |
UX Evaluation | LLM-driven metric suggestion, critical self-review | (Zheng et al., 23 Sep 2024) |
Productivity | Application-agnostic shortcut integration | (Teufelberger et al., 31 Jul 2024) |
Security | Mixed encryption for LLM-driven data workflows | (Yang et al., 2016) |
Cultural | Common sense-based, profile-sensitive feedback | (Anacleto et al., 2010) |
Conversational UI | Human-centered agent interfaces, LAUI | (Chin et al., 19 May 2024) |
Each application leverages LLMs for domain-specific objectives, demonstrating their operational versatility while also introducing unique challenges for validity, security, personalization, and inclusivity.
In sum, LLMs are reshaping HCI by enabling automation, personalization, and adaptive interfaces, coupled with rigorous methodological and ethical scrutiny. The evolution of best practices, evaluation frameworks, and cross-disciplinary approaches will likely define the next decade of LLM-driven human–computer interaction research.