Interactive Criteria Development
- Interactive criteria development is a process where experts iteratively generate, adapt, and weight evaluation criteria using structured tool guidance and real-time feedback.
- It leverages architectures such as blackboard systems, micro-task streams, and LLM-powered panels to reduce cognitive load and streamline decision analysis.
- The approach is applied across various domains—including R&D, MCDA, and clinical trials—leading to robust, context-aware criteria sets with measurable performance improvements.
Interactive criteria development is a paradigm for generating, refining, and operationalizing evaluation criteria or modeling rubrics “in the loop,” such that expert users or stakeholders iteratively elicit, adapt, and weight criteria with direct system support. This process is characterized by combining structured tool guidance, real-time feedback mechanisms, and dynamic aggregation of subjective judgments, with the aim of producing robust, context-aware criteria sets for decision analysis, machine learning evaluation, clinical trials, or multi-criteria decision analysis (MCDA).
1. Conceptual Foundations and Motivation
The primary motivation for interactive criteria development is to overcome the limitations of static, expert-driven protocols, which are often too brittle, opaque, or slow to adapt in high-dimensional, subjective, or evolving domains. Instead, interactive approaches introduce real-time, user-guided workflows where system architecture (e.g., blackboard control, interactive panels, algorithmic suggestion modules) continuously incorporates user proposals, backtracks, refinements, and consensus-building steps. This is essential in domains such as influence-diagram construction for R&D portfolio evaluation (Regan et al., 2013), MCDA objective hierarchy building through “elementary interactions” (Söbke et al., 2019), user-defined evaluation for LLM prompts (Kim et al., 2023), and data-driven criteria learning in visual databases (Tompkin et al., 2017).
The interactive paradigm addresses:
- Elicitation of context- and application-specific requirements that static rubrics or off-the-shelf metrics cannot cover
- Reduction in organizational overhead and cognitive load by distributing effort into atomic or “micro” tasks processed asynchronously
- Accelerated convergence to effective, mutually exclusive, and exhaustive criteria sets through iteration, feedback, and automated consistency checks
2. Architectures and Mechanisms
Several architectural patterns underlie interactive criteria development systems:
Blackboard Systems and Opportunistic Reasoning
In systems such as R&D Analyst (Regan et al., 2013), model construction unfolds on a “blackboard” — a dynamic data structure representing the model artifact (e.g., an influence diagram) and annotated metadata (assessments, options offered, provenance of choices). Knowledge Sources (KSs) implement modular expertise: some encapsulate domain decomposition rules; others perform consistency and bookkeeping; control specialists mediate focus and eligibility. The control cycle:
- Identifies eligible KSs (nodes for elaboration/refinement)
- Screens out ineligible or auto-handled nodes
- Prompts the user to select focus (“What do you want to do next?”) A “focus node stack” tracks current working context, supporting iterative elaboration, backtracking, and opportunistic, on-the-fly criteria refinement or restructuring.
Micro-task Streams and Model Aggregators
“Elementary Interactions” (EIs) decompose MCDA criteria development into self-contained, low-cognitive-load micro-tasks—e.g., naming, confirming, ranking, or weighting criteria (Söbke et al., 2019). A stream generator delivers EIs by detecting information gaps in the current model and tailoring the EI type to participant competency profiles. A model aggregator fuses responses, updates SOO (Set of Objectives) hierarchies, determines element validity, triggers milestone events (e.g., reaching acceptance thresholds), and manages weight propagation.
Interactive Evaluation Panels and LLM-powered Suggestion
In evaluation tools for LLM prompts (Kim et al., 2023), criteria are input as free-form natural language, iteratively refined via LLM-based “criteria reviewers,” and operationalized in real-time through batch evaluations. The UI exposes side-by-side outputs, inline criterion scoring, and evidence highlights, and supports history/versioning and validation within a panelized architecture.
3. Algorithmic and Mathematical Underpinnings
Specific algorithms enable propagation, updating, and active refinement of criteria:
| Mechanism | Core Formula or Principle | Use Context |
|---|---|---|
| Label propagation/energy minim. | Continuous/slider criteria learning (Tompkin et al., 2017) | |
| AHP-style weighting | , normalized by | MCDA local criterion preference (Söbke et al., 2019) |
| Bayesian covariance update | Uncertainty sampling, active selection | |
| Validity scoring (EIs) | EI element acceptance (Söbke et al., 2019) | |
| Majority aggregation | Evaluator panel scoring (Kim et al., 2023) |
In MCDA and interaction-aware criteria systems, formal relationships such as the linkage between Sobol’ indices, Banzhaf and Fourier transforms for capacity-based aggregation (Choquet, multilinear extensions) underpin global sensitivity rankings and the quantitative assessment of cross-criterion synergies (Grabisch et al., 2016).
4. Domain-Specific Implementations
R&D and Influence-Diagram Model Construction
R&D Analyst exemplifies a consultation-style, blackboard-driven approach for interactively constructing and refining influence diagrams for project evaluation (Regan et al., 2013). The user develops objectives, alternatives, and uncertainties "on the fly," with layered specialist KSs guiding model elaboration, enforcing structural rules, and recommending decomposition or reassessment based on dialog and model state.
MCDA and Objective Hierarchies via Micro-interactions
The EI-driven platform for MCDA (Söbke et al., 2019) operationalizes interactive SOO development as a series of asynchronous, micro-elicitation steps. The approach avoids burdensome workshops, algorithmically aggregates input, and provides milestone-based progression. Validity thresholds and role-based participant segmentation are used to guarantee both rigor and traceability.
Data-driven and Continuous Criteria Formation
“Criteria Sliders” support interactive construction of continuous scaling criteria in high-dimensional databases by combining real-time label propagation with active information-gain-based querying (Tompkin et al., 2017). Users iteratively supply ranking or placement labels; the system adaptively proposes the most informative points to maximize convergence on smooth, interpretable criteria axes.
LLM and Prompt Evaluation Frameworks
EvalLM provides an architectural blueprint where criteria are iteratively defined, reviewed, and instantiated in system-supported, LLM-evaluated sessions (Kim et al., 2023). Users co-evolve criteria (e.g., “faithfulness”) and prompt designs, with rich support for reliability assessment (test–retest, inter-rater agreement), automated suggestions (refine/merge/split), and provenance/version-tracking.
5. Evaluation Methodologies and Metrics
Interactive criteria development introduces multi-level evaluation, including:
- Real-time measurement of convergence (e.g., mean absolute error vs. ground-truth rankings (Tompkin et al., 2017))
- Usage analytics: number of criteria edited, outputs evaluated, cycles to convergence (Kim et al., 2023)
- Validity/acceptance: thresholds on confirmation rates or consensus, with competency-weighted aggregation (Söbke et al., 2019)
- Statistical reliability: Fleiss' κ for inter-rater agreement, test–retest stability, validation accuracy if gold labels exist (Kim et al., 2023)
- Efficiency gains: reduction in organizational time/cost, number of participant hours required (Söbke et al., 2019)
- Domain metrics: application-dependent, such as correct propagation of clinical-trial design parameters (Fisher et al., 2014), expected variance explained in sensitivity indices (Grabisch et al., 2016), or scenario-linked operational metrics in interactive decision-making (Tian et al., 3 Jan 2025).
6. Best Practices, Challenges, and Future Directions
Several best practices and open challenges have emerged:
- Use of micro-tasks and role-based customization lowers entry barriers, scales to large and diverse participant pools, and supports transparency and reproducibility (Söbke et al., 2019).
- Layered architecture with opportunistic, context-sensitive suggestion systems enhances adaptability and user engagement in model building (Regan et al., 2013).
- LLM-powered refinement and automated diagnostics can rapidly steer criteria sets towards coverage, clarity, and non-redundancy (Kim et al., 2023).
- Traceable audit trails and milestone snapshots improve reproducibility and enable validation by third parties (Söbke et al., 2019).
- Scenario-dependent weighting and multi-objective balancing is crucial, as illustrated in scenario-based DRL evaluations (Tian et al., 3 Jan 2025).
- Open challenges include handling cross-criterion interactions (non-additivity), validating threshold and weighting schemes empirically, tool support for rapid decision mode selection (Irons et al., 2024), and evolving rubrics in response to domain and technology changes.
A plausible implication is that as AI systems become more capable, interactive criteria development methods will need to become even more adaptive, transparent, and capable of capturing complex, context-specific desiderata while maintaining rigorous traceability and statistical underpinnings across domains.