Artistic Chatbots in Creative AI

Updated 6 September 2025

Artistic chatbots are AI agents that use conversational, visual, and audio modalities to foster creative expression and art appreciation in museums, galleries, and digital learning environments.
They integrate advanced LLMs, retrieval-augmented generation, and persona conditioning within modular pipelines to deliver dynamic, contextually rich, and interactive dialogues.
Deployments in public spaces and digital platforms demonstrate their value in overcoming creative blocks, enhancing cultural engagement, and supporting co-creative artistic processes.

An artistic chatbot is an artificial intelligence–driven conversational agent specifically architected and deployed to facilitate, augment, or generate artistic expression, dialogue, appreciation, or collaboration. These agents exhibit a spectrum of technical and creative capabilities—from emulating human-like artistic personalities and creative suggestion networks to multimodal interactive installations in educational or cultural environments—grounded in natural language, visual, and audio modalities. Artistic chatbots often integrate advanced LLMs, retrieval-augmented generation (RAG), persona and style conditioning, and interactive topic or event frameworks, aiming to foster engagement, co-creation, and learning within domains such as museums, galleries, creative writing, and digital art education.

1. System Architecture and Data Integration

The technical foundation of recent artistic chatbots consists of modular pipelines that orchestrate knowledge acquisition, retrieval, language generation, and user interface mechanisms.

Artistic Chatbot for museums (Kucia et al., 30 Aug 2025) employs a two-stage pipeline: data preprocessing (cleaning, multilingual translation with GPT-4o, segmentation into overlapping text chunks of 5000 characters with 200-character overlap, and embedding with paraphrase-multilingual-MiniLM-L12-v2 indexed by FAISS) and an inference pipeline that retrieves relevant context using a fast vector search plus CrossEncoder re-ranking, sends the top retrieved passages and query to the LLM (GPT-4o-mini), and synthesizes audible responses through TTS.
In other artistic agent deployments such as AVIN-Chat (Park et al., 15 Aug 2024), the architecture integrates speech-to-text (Whisper), in-context emotional prompting of ChatGPT, emotion-parameterized TTS (EmotiVoice), and real-time 3D avatar animation (EmoTalk) via precomputed blendshapes (linear blend: $S^t = B_0 + w^t \cdot (B-B_0)$ ).

The knowledge base in content-grounded artistic chatbots is curated from heterogeneous sources—faculty information, books, magazines, curatorial essays—preprocessed for language normalization and relevance in domain-specific deployments. Dense retrieval–augmented generation underlies high-quality, contextually grounded response generation.

2. Dialogue Modeling, Persona, and Artistic Conditioning

Dialogic quality and engagement in artistic chatbots are driven by layered strategies for conditioning and output shaping:

Persona-rich frameworks like Sketch-Fill-A-R (Shum et al., 2019) explicitly encode dialogue history and persona into dynamic "sketch" responses with open slots to be filled via persona-memory (rare words extracted from agent descriptions), with final output selection governed by a LLM–based perplexity reranker.
Event-driven chatbots such as HonkaiChat (Liu et al., 5 Jan 2025) achieve higher interactivity by incorporating situational “life events” (curated from a domain-specific event database, e.g., 1300 events for Honkai: Star Rail characters) into the conversational prompt, with fine-tuned LLaMA 3.1-8B models preserving character-specific response consistency.
Multimodal models for education and appreciation (e.g. LLaVA-Docent (Lee et al., 9 Feb 2024)) encode images via CLIP and concatenate visual and language embeddings ( $H = W \cdot Z$ ; $x = f([H:H’])$ ), guiding students through critical stages of art appreciation by dynamically staging reflective prompts and feedback.

These techniques ensure both alignment with a desired stylistic "voice" and adaptability to context, supporting both predetermined personalities and responsive artistry.

3. Multimodal and Interactive Capabilities

Recent artistic chatbots are characterized by their multimodal interaction paradigms:

AVIN-Chat (Park et al., 15 Aug 2024) supports real-time audio-visual face-to-face conversation, enabling emotional nuance via user-controlled emotional state tuning mapped onto both speech prosody and 3D avatar animation.
PortfolioMentor (Long et al., 2023) integrates text, code, sketch, and audio modalities within the creative coding IDE, generating code snippets, visuals (text-to-image), and music clips (MusicLM, MusicGen) in response to natural language prompts, and providing DOM-aware Q&A and guidance.
LLaVA-Docent (Lee et al., 9 Feb 2024) processes both text and images to scaffold learners through sequential art critique, relying on vision encoders and instruction-tuned MLLM dialogue to generate questions and feedback specific to the artwork at hand.

These multimodal affordances strongly influence engagement, immersion, and the depth of facilitated appreciation or co-creation.

4. User Interaction Design and Deployment Contexts

Deployment of artistic chatbots is conditioned by physical and digital interaction constraints:

Museum deployments (Kucia et al., 30 Aug 2025) operate in live public spaces with ceiling-mounted microphones, speech-activated turn-taking, and ambient audio delivery, requiring robust solutions to background noise, turn segmentation, and incomplete queries.
Browser-deployed dialogue systems (e.g., LSTM seq2seq chatbots (Ilić et al., 2019)) rely on JavaScript-based overlays for real-time feedback, focusing on crafting personality-rich interactions even with compact training corpora.
Creative writing tools (e.g., CharacterChat (Schmitt et al., 2021); ORIBA (Sun et al., 2023)) use dual-mode interfaces with guided attribute suggestion and open-ended neural interaction, supporting iterative character development or OC (original character) exploration.

These interaction schemas are evaluated through user studies tracking ratings of relevance, engagement, and satisfaction, with well-calibrated scoring systems (LLM-based judges, Likert scales) and automated assessment of question completeness or response grounding.

5. Performance, Grounding, and Evaluation

Artistic chatbots are evaluated on response grounding, user engagement, creativity, and domain adherence:

Domain relevance in museum chatbots achieved a 60.52% grounded response rate in free-form spoken queries with only ∼20% of questions fully on-topic (Kucia et al., 30 Aug 2025).
Sketch-Fill-A-R (Shum et al., 2019) demonstrated a 10-point lower perplexity (24.99 vs. 34.54) versus memory network baselines, with human preference for response consistency and engagement.
AVIN-Chat (Park et al., 15 Aug 2024) outperformed text- or speech-only baselines on user-rated immersion, empathy, and satisfaction.
Empathic AI Painter (Yalcin et al., 2020) produced distinct stylized portraits mapped to 17 personality categories after conversationally eliciting Big-5 dimensions, with high speech recognition and categorization accuracy (>82%).

Techniques leveraging cross-encoder re-ranking, instruction tuning, RAG, and event or persona conditioning support domain fidelity and response expressivity, though limitations persist regarding hallucination avoidance, conversation incompleteness, and diversity-consistency trade-offs.

6. Applications, Limitations, and Prospects

Artistic chatbots are deployed or envisaged in a wide array of scenarios:

Museum guides and educational exhibits (voice-to-voice agents with contextual retrieval) (Kucia et al., 30 Aug 2025).
Digital portfolio companions for creative coding and art students (Long et al., 2023).
Character-driven role-play and interactive fiction (event-driven, persona-consistent dialogue) (Liu et al., 5 Jan 2025, Sun et al., 2023).
Empathic creative systems producing personalized visual or narrative art (Yalcin et al., 2020, Chang et al., 2023).
Systems for aiding in the alleviation of creative blocks and augmenting creative brainstorming (Lewis, 2023, Haase et al., 2023).

Limitations include challenges in robustly handling unpredictable public input, managing hallucinations, ensuring cultural and ethical appropriateness, and balancing expressiveness with factual grounding. Addressing these demands advanced dialog state tracking, adaptive persona and event frameworks, improved retrieval and LLM alignment, and comprehensive user studies.

Emergent directions include: (i) extension to deeper multi-turn and multi-agent dialogue, (ii) generalization across multilingual and multicultural settings via cross-lingual embeddings and retrieval, (iii) dynamic memory and event pipelines for persistent character evolution, and (iv) integration of real-time gestural and affective cues to further enrich artistic and empathic expression.

7. Significance in Computational Creativity and Art

Artistic chatbots represent a convergence of NLP, computer vision, and HCI to augment, democratize, and reconceptualize creative practices:

They offer mechanisms for personalized art appreciation and accessible informal learning in cultural contexts (Lee et al., 9 Feb 2024).
Their integration of event-driven or persona-grounded methods facilitates the emergence of co-creative narratives and lifelike digital characters (Liu et al., 5 Jan 2025, Sun et al., 2023, Schmitt et al., 2021).
They support procedural creativity and role redefinition of digital agents—from utilitarian dialog systems to unwitting actors and creative collaborators (Perrone et al., 2019).

Evidence from statistical evaluation, human–AI creativity comparison (e.g., no significant difference in AUT originality ratings between GAIs and human participants in (Haase et al., 2023)), and practical exhibition deployments underscores both the current impact and the ongoing research challenges in realizing conversational agents capable of genuine, context-adaptive, and artistically resonant interaction.