Papers
Topics
Authors
Recent
2000 character limit reached

Interactive Viva Voce Simulation

Updated 5 December 2025
  • Interactive viva voce simulation is a computational framework that replicates oral exams using multi-turn, schema-driven dialogue systems.
  • These systems integrate LLM agents, dialogue managers, and evaluation modules to deliver dynamic, phase-based assessments in domains like medicine and academia.
  • Modular architectures and automated feedback loops enhance realism and provide measurable outcomes for reflective learning and exam integrity.

Interactive viva voce simulation refers to computational frameworks that emulate oral examination (viva voce) processes via interactive, dialogic interactions with human users or artificial agents. These systems leverage LLMs or other conversational agents to generate, manage, and evaluate dynamic, multi-turn oral examinations in both educational and professional assessment contexts. Key application domains include medical education, standardized clinical assessment, academic integrity verification in higher education, and reflective interview training.

1. Core System Architectures

Interactive viva voce simulation platforms are typically structured as modular, multi-component systems integrating front-end user interfaces, back-end LLM orchestration, turn management logic, and evaluation modules.

A canonical architecture comprises:

  • Agent (A): LLM or conversational agent (e.g., GPT-3.5, Claude 3.7, Gemini 2.5 Flash).
  • Examiner Module (E): Deterministic environment module serving domain-grounded information (e.g., clinical vignettes).
  • Mapper/Parser (M/P): Convert user/agent queries to schema-relevant fields and render responses in natural language.
  • Dialogue Manager: Manages multi-turn, phase-specific exchanges, enforces action constraints, and maintains conversational context.
  • Data Storage & Logging: Persistent transcript and metadata capture for post-hoc analysis, auditing, and human examiner review.

These platforms support both speech- and text-based modalities, integrating tools such as speech-to-text (STT), text-to-speech (TTS), and persistent dialogue state buffers for consistency and traceability (Chiu et al., 11 Oct 2025, Botero et al., 1 Nov 2025, Daryanto et al., 8 Oct 2024, Church et al., 29 Oct 2025).

2. Simulation Workflows and Interaction Protocols

The workflows of interactive viva voce simulations are defined by structured, multi-phase session flows designed to replicate human examiner processes:

Phase-Based Turn Management

  1. Initialization: The examiner module presents a scenario stem (e.g., clinical case, essay excerpt, interview question) and establishes the system prompt and persona.
  2. Questioning Loop: The agent or examiner iteratively poses probing questions. User (or agent) responses are evaluated in real time, with follow-up questions dynamically generated based on content and prior turns.
  3. Transitional Phases: In specialized domains (e.g., medicine), deliberate phase transitions enforce task realism (e.g., history-taking, provisional diagnosis, investigations, final diagnosis), each constrained by action-type caps and global turn limits (Chiu et al., 11 Oct 2025).
  4. Dialogue Closure and Assessment: Upon reaching predetermined criteria (turn limit, phase completion), the system shifts to assessment: scalar scores, free-text assessments, or structured feedback are returned.

An excerpted pseudocode workflow for clinical viva simulation:

1
2
3
4
5
6
7
8
9
10
11
12
13
Algorithm VivaBench_Examination(Case C, Agent A)
    present C.stem to A
    // Review Phase
    repeat
        action  A.next_action()
        ...
    until turn_limit or provisional_submitted
    // Investigation Phase
    repeat
        action  A.next_action()
        ...
    until turn_limit or final_submitted
    return (d_P, conf_P, d_D, conf_D, trace)
(Chiu et al., 11 Oct 2025)

3. Data Schemas and Persona Encoding

Interactive simulations rely on domain-aligned, schema-driven representations to ensure realism and evaluation validity.

  • Clinical Settings: Cases are structured with components (H, P, I, L, D), representing history, physical, imaging, labs, and diagnoses; mappings to SNOMED-CT/LOINC/ICD-10 enable semantic interoperability (Chiu et al., 11 Oct 2025).
  • Psychiatric Assessments: Persona generation algorithms sample item scores (e.g., MADRS), demographics, and communication styles to fully specify virtual patient behavior, encoded in immutable session prompts (Botero et al., 1 Nov 2025).
  • Higher Education: Student-uploaded essays serve as grounding texts; LLM examiners extract non-trivial claims to scaffold contextually-relevant, open-ended questions (Church et al., 29 Oct 2025).

All frameworks employ context windows and session-level state management, with immutable prompts and rolling history buffers to maintain consistency and prevent semantic drift.

4. Evaluation Metrics and Analytical Frameworks

Assessment in interactive viva voce simulation encompasses both performance metrics and failure mode analysis, employing domain-specific and domain-agnostic constructs.

Diagnostic and Authenticity Metrics

  • Top-k Diagnostic Accuracy: Fraction of cases where any prediction matches the reference, reported at multiple stages (provisional, final, full-information) and hierarchical proximity (exact, approximate) (Chiu et al., 11 Oct 2025).
  • Confidence Calibration Scores: Quantitative aggregation of agent self-assessed confidence (cj[0,1]c_j \in [0,1]), with normalization and partition over correct, approximate, and unmatched predictions.
  • Information-Seeking Efficiency: Precision and recall of queries/actions, computed with respect to relevance of extracted schema keys.
  • Authenticity Scoring (Essay Assessment): Final scalar “confidence_score” (0–100), representing the LLM’s judgment of genuine authorship, optionally aggregated as

AuthenticityScore=100×i=1Nwici\mathrm{AuthenticityScore} = 100 \times \sum_{i=1}^N w_i \, c_i

with wiw_i normalized difficulty weights (Church et al., 29 Oct 2025).

  • Qualitative Realism (Persona Simulations): Likert-scale ratings (1–5) for profile consistency, dialogue realism, and character cohesion, aggregated to measure simulation fidelity (Botero et al., 1 Nov 2025).

Failure Mode Taxonomies

Automated classifiers label agent errors, including fixation/anchoring, premature closure, investigation inefficiency, and omission of critical conditions. Each failure class guides corresponding mitigation strategies (e.g., enforced differential diagnosis submissions; penalization for information-seeking inefficiency) (Chiu et al., 11 Oct 2025).

Qualitative user studies assess affective dimensions (anxiety, engagement) and dialogic feedback efficacy (Daryanto et al., 8 Oct 2024).

5. Dialogic Feedback and Reflective Learning

Beyond single-turn assessment, advanced interactive viva voce simulations incorporate two-way, dialogic feedback loops to scaffold reflective learning.

This paradigm, as implemented in systems such as Conversate, enables:

  • AI-Hinted Highlighting: Automated analysis of user responses marks segments “needs improvement,” supporting targeted feedback.
  • User Annotation and Self-Reflection: Users link free-form reflections to specific transcript spans, feeding into subsequent mentoring cycles.
  • Dialogic Feedback Chat: Iterative two-way interaction where an LLM mentor provides bullet-point suggestions, STAR breakdowns, and phrasing recommendations in response to user queries and self-assessments.
  • Revision and Re-evaluation: Students can submit revised responses, triggering new AI evaluations and closing the feedback-action loop (Daryanto et al., 8 Oct 2024).

This framework aligns with established learning-science principles, emphasizing emotional support, opportunity for user agency, and iterative improvement.

6. Domain-Specific Applications and Case Studies

Medical and Clinical Training

  • VivaBench: Multi-phase, hypothesis-driven simulations for clinical reasoning, with agent performance benchmarks, failure mode identification, and reproducible open-source architectures (Chiu et al., 11 Oct 2025).
  • Virtual Patient Assessment: Voice-enabled, persona-grounded LLMs power patient actors with high clinical profile fidelity and conversational realism, validated against expert raters using structured psychiatric scales (e.g., MADRS) (Botero et al., 1 Nov 2025).

Academic Integrity in Higher Education

  • Essay-Driven Viva Simulation: LLM-examiner engages the student in contextually-linked oral questions to probe conceptual depth and authorship evidence, delivering an explicit authenticity/confidence score for examiner review. This method is proposed as an alternative or adjunct to static plagiarism detection (Church et al., 29 Oct 2025).

Interview Preparation and Reflective Practice

  • Conversate: Interactive job interview simulation platform with AI-powered, context-aware questioning, highlight-based annotation tools, and dialogic mentor chat, empirically shown to support realism, agency, and reduced social anxiety in formative settings (Daryanto et al., 8 Oct 2024).

7. Best Practices, Limitations, and Future Directions

Research-derived implementation guidelines emphasize:

  • Schema-driven, modular architectures aligned to domain ontologies.
  • Deterministic and LLM-based query mappers with audit trails for provenance and reliability.
  • Transparent phase management via policy layers and finite-state machines.
  • Comprehensive logging for transcript analysis and calibration studies.
  • Human-in-the-loop validation at dataset curation and periodic review stages to minimize error propagation.
  • Open tooling, fixed seeds, and documentation for reproducibility and auditing (Chiu et al., 11 Oct 2025).

Limitations reported include on occasional deviations from strict calibration, behavioral inconsistencies, impact of speech recognition noise, lack of non-verbal cue modeling in clinical/psychiatric simulations, and the risk of sycophantic agreement or low-pressure realism in job interview emulation (Botero et al., 1 Nov 2025, Daryanto et al., 8 Oct 2024). In high-stakes assessment, equity and security concerns motivate invigilated deployment and mitigation of adversarial inputs (Church et al., 29 Oct 2025).

Outlook: Current research targets extension to more nuanced domains, multimodal avatar integration, expanded feedback analytics, and rigorous, large-scale quantitative validation to further close the fidelity and utility gap with human-administered viva voce examination.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Interactive Viva Voce Simulation.