Human-AI Collaboration Index
- Human-AI Collaboration Index is a set of quantitative measures quantifying the quality of joint decision-making and mutual adaptation between human and AI agents.
- It integrates decision theory, dialogue analysis, and multi-affordance models to differentiate simple assistance from true collaborative synergy.
- Empirical evaluations rely on interaction logs, expert annotations, and statistical validations to ensure measurable, cross-domain collaboration benchmarks.
The Human-AI Collaboration Index (HAC) is a class of quantitative measures designed to assess the effectiveness or quality of collaboration between human and artificial agents. HAC frameworks provide rigorous, operationalized constructs with explicit scoring functions and measurement protocols, seeking to move beyond simplistic notions of automation or human-in-the-loop operation. Instead, they quantify the degree to which a joint human-AI system achieves genuine collaborative affordances, complementary information use, and mutual adaptation. Multiple distinct formalizations of HAC have been proposed, reflecting different foundational paradigms: decision-theoretic information value (Guo et al., 2024), dialogue and grounding in shared tasks (Poelitz et al., 24 Feb 2026), and multi-level teaming affordances (Cukurova, 13 Jun 2026). Each approach prioritizes different metrics, system prerequisites, and experimental designs but converges on the core aim: distinguishing simple assistance from robust collaboration.
1. Theoretical Foundations and Taxonomies
A central theme in HAC research is the rigorous distinction between collaboration and related modalities such as delegation, consultation, or governance. Cukurova et al. (Cukurova, 13 Jun 2026) introduce a five-level diagnostic taxonomy of human–AI teaming that underpins collaboration assessment:
- Transactional: The AI system executes discrete requests without modeling user state or goals.
- Situational: The system maintains and surfaces a shared situational model, providing interpretability and contestability.
- Operational: Users can set explicit goals or plans governing system behavior over tasks.
- Praxical: The AI adapts its internal models based on ongoing user feedback but does not challenge the human.
- Synergistic: The system maintains an inspectable epistemic state, challenges and reasons with the user under shared goals, and demonstrates mutual modeling and social regulation.
Collaboration, in the strict sense, emerges only at the praxical and (especially) synergistic levels. HAC thus encodes not just task completion but deep reciprocal understanding and adaptive coordination.
2. Decision-Theoretic HAC Formulation
Decision-theoretic HAC formally quantifies the incremental value gained by combining human and AI judgments beyond what either achieves independently. Let denote the true task state, denote basic features, and , the human and AI decisions. For a proper scoring rule (e.g., Brier score), the marginal value of feature for each agent and jointly is computed as follows:
The unexploited value of information is
The Human-AI Collaboration Index is then normalized as
0
1 implies redundancy (no information synergy); 2 signals strong complementarity, i.e., that effective collaboration unlocks substantial value unachievable by either agent alone. This approach requires detailed audit of feature/decision usage and often involves Shapley-value decomposition across all observed features (Guo et al., 2024).
3. HAC as Group Process: Common Ground and Dialogue
An alternative paradigm constructs HAC from linguistic theory, focusing on joint action and common ground maintenance during interactive problem solving (Poelitz et al., 24 Feb 2026). In this framework, collaboration is quantified not only by task outcomes but also by referential and conversational coordination. Using structured collaborative puzzle benchmarks, four principal components are normalized and aggregated:
- Task Success (3): Average puzzle completion rate.
- Efficiency (4): Inverse normalized word count per trial.
- Referential Coordination (5): Ratio of shared noun-phrase vocabulary between human and AI.
- Grounding Engagement (6): Rate of clarification and repair acts.
The composite HAC is
7
This structure prioritizes outcome but penalizes inefficiency, ambiguous references, and lack of repair. Empirically, higher HAC values correlate with more robust alignment, efficient negotiation, and deeper mutual understanding between human and AI agents in controlled collaborative tasks (Poelitz et al., 24 Feb 2026).
4. Multi-Affordance and Sub-Index Decomposition
Cukurova et al. (Cukurova, 13 Jun 2026) advance the field by defining HAC through a vector of affordance-sensitive sub-indices, each measuring a specific cognitive or system function critical for collaboration:
| Sub-Index | Definition/Example Metric | Formula (where given) |
|---|---|---|
| Grounding Accuracy (G) | Match between system’s user representation and actual user intent | 8 |
| Goal Negotiation (N) | Alignment of system/user goal vectors post-negotiation | 9 |
| Adaptation Responsiveness (A_r) | Speed of incorporating explicit corrections | 0 |
| Update Fidelity (F) | Correlation between user-requested and actual model change magnitude | 1 |
| Mutual Modelling (M_d) | Deep inference of user knowledge, preferences, strategies | 2 |
| Shared Regulation (R) | Success rate of joint planning/monitoring/meta-acts | 3 |
| Co-Reasoning Synergy (S) | Density and success of joint argumentation | 4 |
These indices are aggregated, commonly as
5
or by level-specific aggregation reflecting the five-level taxonomy. The sub-index methodology allows targeted diagnosis of collaborative deficits, supporting refinement of AI affordances and interaction protocols (Cukurova, 13 Jun 2026).
5. Empirical Measurement and Evaluation Protocols
HAC computation depends on both high-fidelity interaction data and reliable process annotation pipelines:
- Interaction logs: Transcript-level capture of exchanges, system states, and correction events.
- User annotations: Explicit labeling of intent, goals, and satisfaction.
- Expert coding: Ground-truthing of conversational acts or model updates.
- Statistical analysis: Mixed-effects modeling, interrater reliability checks, and validation against external benchmarks (e.g., learning gains, user satisfaction).
- Benchmark tasks: Task designs that demand referential coordination, negotiation, and repair are essential for valid HAC discrimination (Poelitz et al., 24 Feb 2026).
Weighting of sub-indices or components requires either normative justification or outcome-driven tuning (e.g., via regression on performance data), and must be transparent to permit interpretability and cross-system comparisons.
6. Comparisons, Limitations, and Relationship to Adjacent Constructs
Not all frameworks using "collaboration" terminology in human-AI contexts address the stringent requirements formalized in HAC. Many systems historically labeled as collaborative exhibit only transactional or consultative affordances and fail to meet the criteria of shared, negotiable goals, mutual modeling, or symmetric regulation (Cukurova, 13 Jun 2026). Conceptually adjacent aggregates such as the Artificial Intelligence Quotient (AIQ) (Ganuthula et al., 13 Feb 2025) propose multidimensional profiling of human capacity for AI interaction but do not provide a formal sub-index or explicit aggregation formula for HAC. The relationship between such broad frameworks and rigorous HAC quantification remains an open area for further psychometric and empirical development.
7. Future Directions and Open Challenges
Current HAC frameworks face several limitations:
- Validation: Few published empirical studies have validated HAC formulas against longitudinal real-world collaboration outcomes.
- Task generality: Most metrics are benchmark/task-specific; transferability across domains is not assured.
- Ethical, privacy, and data governance: Systematic measurement requires detailed logging and content analysis, raising user consent and privacy issues (Cukurova, 13 Jun 2026).
- Rapid AI evolution: Collabative affordances and necessary measurement criteria shift as system capabilities change, necessitating continual updating of HAC definitions and benchmarks.
Nevertheless, HAC provides a rigorous scaffold for analyzing, diagnosing, and improving human–AI teaming across decision-making, education, creative synthesis, and other domains where collaborative intelligence offers the potential for true hybrid performance gains (Guo et al., 2024, Cukurova, 13 Jun 2026, Poelitz et al., 24 Feb 2026).