Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval

Published 9 May 2026 in cs.HC | (2605.08630v1)

Abstract: Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces Sycamore, a systematic probe using a three-condition evaluation protocol comparing ungrounded and grounded synthetic personas with expert feedback.
The paper's methodology leverages 3,270 interview-coded excerpts to ground synthetic evaluators, uncovering significant divergences in modality preferences between synthetic and real experts.
The paper finds that while grounded synthetic evaluators enhance exploratory evaluations and interface debugging, they cannot replace the nuanced feedback of domain experts.

Sycamore: Synthetic Persona Characterization for Evaluating Genomics Visualization Retrieval

Motivation and Context

Evaluating visualization systems in specialized domains like genomics is subject to practical limitations inherent to user studies: domain experts are scarce and cumbersome to recruit, and available participant pools rarely represent the full diversity of user personas. The proliferation of LLM-based synthetic personas offers a potential path to scale evaluation, but there is skepticism regarding their substitutability for real user feedback, particularly in HCI contexts. The work introduces Sycamore, a systematic probe to analyze not whether synthetic personas should replace user studies, but what these LLM-enabled evaluators actually produce when engaging with a real system—specifically Geranium, a multimodal genomics visualization retrieval engine.

Methodological Framework

Sycamore employs a three-condition evaluation protocol:

Ungrounded Synthetic Personas: Instantiated directly from LLM priors, without any anchoring in empirical user data.
Grounded Synthetic Personas: Instantiated using the PersonaCite approach, with each synthetic persona constrained by voice-of-customer artifacts extracted from a comprehensive interview-based characterization of genomics visualization users [23]. Four archetypes (Biologists, Computational Biologists, Bioinformaticians, Software Engineers) are represented.
Domain Expert Reference: The published Geranium user study (seven domain experts, with themes and modality preferences extracted) serves as the human benchmark [16].

All three conditions use a consistent protocol derived from the Geranium user study: initial workflow description, tool demonstration, hands-on exploration with feedback, and a final modality preference ranking.

The pipeline for grounded personas involves (1) detailed persona profiling, (2) embedding of 3,270 interview-coded excerpts, (3) top-k retrieval for persona-relevant evidence, and (4) agentic response generation with source citation and abstention in the absence of evidence.

Insights from Cross-Condition Analysis

Thematic Convergence and Divergence

Synthetic persona feedback—especially when grounded—adopts the concerns and language evident in documented user studies, substantially mitigating the "hallucinated" technicalities that ungrounded LLMs tend to prioritize. Conversely, ungrounded evaluators frequently fixate on operational minutiae (e.g., data-binding internals, API integration issues) that actual genomics researchers did not emphasize.

A pronounced, contradictory empirical finding emerges in modality preference: both synthetic conditions strongly favor the specification modality (Gosling spec), whereas real experts favor image-based queries for their casual, exploratory affordances. Synthetic evaluators converge on a "runnable template" conception, neglecting the value of image browsing that was salient in human feedback.

Across all conditions, data-binding emerges as a universal friction point—real and synthetic personas repeatedly cite concerns regarding the mapping of user data into retrieved visualization templates, including specific reference to genomics formats (BAM, VCF, BED files).

Depth, Onboarding, and Workflow Realism

Both synthetic conditions identify high-level user intent and language mismatch as sources of friction. However, only human evaluators ask for structured onboarding support and guidance, while synthetics—particularly grounded ones—focus on advanced features and developer-oriented extensions. Notably, the grounded agent protocol is able to abstain in the absence of evidence, providing calibration against confabulation and revealing the interpretive bias inherent in the artifact coding process.

Novel Synthetic Signals and Latent Hypotheses

Interestingly, certain design requirements not articulated by any single human participant—but present in aggregate across synthetic personas—emerge consistently. For example, integration with canonical genome browsers (IGV, UCSC), and plugin-based workflows are surfaced as recurrent needs. These thematically synthesized outputs point to the ability of LLMs, when grounded appropriately, to aggregate latent requirements which can then be prioritized for targeted follow-up with real users.

Practical and Theoretical Implications

Sycamore's probe indicates that LLM-based synthetic evaluators, particularly when grounded in empirical user characterization, can function as valuable complements (not replacements) within the visualization evaluation workflow. Their primary utility lies in:

Protocol and interface debugging before investing limited domain expert time
Expanding the evaluative horizon to underrepresented or unobservable personas
Generating exploratory hypotheses and query patterns at scale for downstream validation

However, synthetic evaluators are fundamentally limited for research questions centered on adoption, trust, or nuanced qualitative experience; their lack of interactive realism—no actual clicking, hesitance, or interface exploration—creates an authenticity gap.

Moreover, the dependency of grounded evaluators on the specific artifacts used for grounding (and the potential biases in how these artifacts are coded or excerpted) underscores the need for transparent, well-documented workflows in persona instantiation.

Future Directions

Further research is warranted to (a) quantify run-to-run evaluator variance for each grounded persona, (b) extend Sycamore to other genomics or highly specialized data domains, and (c) develop methodologies closer to think-aloud protocols via agentic LLM interaction. Whether inconsistencies between synthetic and expert reference feedback (e.g., modality preference) reflect systematic LLM limitations or gaps in synthetic persona design remains an open question.

Techniques for dynamic, scenario-based persona simulation and integration with live interface logging could refine the fidelity and external validity of synthetic user evaluation pipelines.

Conclusion

Sycamore systematically interrogates the content and value of synthetic LLM personas for evaluating genomics visualization retrieval. Grounded synthetic evaluators more closely approximate documented user perspectives than ungrounded ones, though both diverge from expert reference in important respects such as modality preference. The framework provides an operationally rigorous, replicable approach for dissecting both the strengths and the caveats of synthetic persona-based evaluation in domain-specific HCI, advancing the methodological toolkit for visualization research pipelines (2605.08630).

Markdown Report Issue