Arbitrated Indirect Treatment Comparisons

Updated 22 October 2025

The paper introduces an arbitration step that resolves conflicting MAIC outcomes by targeting a well-defined overlap population.
It details a methodological framework using propensity score modeling and overlap weighting to align disparate clinical trial populations.
The study highlights practical implications for HTA and improved reproducibility in comparative effectiveness research.

Arbitrated indirect treatment comparisons are a methodological response to challenges arising from the use of matching-adjusted indirect comparison (MAIC) in comparative effectiveness research and health technology assessment (HTA), particularly when multiple data-generating stakeholders independently analyze clinical trial data. The core innovation is the formalization of an arbitration step—anchored by an impartial party—which targets inference to a common, explicitly defined overlap population, thereby resolving the MAIC paradox wherein different analysts reach conflicting conclusions due to implicit, sponsor-specific target populations (Fang et al., 20 Oct 2025). This article reviews the statistical and methodological foundations, expounds on the MAIC paradox, describes arbitrated methods and target estimands, and situates arbitrated indirect treatment comparisons within the landscape of recent simulation studies, real-world implementation, and regulatory science.

1. Background and Motivation

Indirect treatment comparisons arise when head-to-head randomized trials are unavailable, necessitating cross-paper comparisons of active treatments, often via a common comparator. MAIC has become a popular tool in this context, leveraging individual participant data (IPD) from one trial and aggregate (AgD) summaries from another to construct population-adjusted treatment effect estimators (Cheng et al., 2019, Remiro-Azócar et al., 2020, Serret-Larmande et al., 16 Jul 2025).

The “MAIC paradox” crystallized by Jiang et al. (2025) (Fang et al., 20 Oct 2025) underscores a core limitation—analyses performed by different sponsors, each with access to their own trial’s IPD, target different populations and, despite following identical MAIC methodology, may arrive at discordant conclusions regarding the comparative efficacy of the same interventions. This inconsistency threatens the interpretability and utility of HTA submissions when multiple sponsors, each with partial data, implicitly select distinct target populations in their weighting stage.

2. The MAIC Paradox: Conflicting Targets in Conventional MAIC

The MAIC paradox is a consequence of the population-targeting step in standard MAIC. Suppose sponsor A has IPD for the AC trial and accesses AgD from the BC trial, while sponsor B is the reverse. When sponsor A fits weights to align its IPD to the BC AgD covariate profile, estimation targets the BC population; vice versa for sponsor B. Due to between-trial differences and the presence of effect modifiers, the marginal treatment effect for A vs. B computed in the AC population and that computed in the BC population may differ substantially—both due to true effect modification and as an artifact of the covariate distribution. This population misalignment produces the paradox: identical analytic pipelines yield contradictory results as each sponsor implicitly estimates effects in a different target population (Fang et al., 20 Oct 2025).

3. Arbitrated Indirect Treatment Comparisons: Methodological Framework

Arbitrated indirect treatment comparisons resolve this inconsistency by introducing an arbitration mechanism that enforces estimation in a common, objectively defined target population, most often called the “overlap population.” This approach proceeds as follows:

Propensity Score Modeling: An arbitrator (e.g., regulatory agency or vetted third party) either accesses or creates (from simulations based on published summaries) IPD-level covariate data from both contributing trials. A propensity score model estimates the probability of trial membership as $\epsilon(x) = P(T = 1 \mid X = x)$ for an individual with baseline covariates $X=x$ .
Overlap Weighting: For the AC trial (trial 1), the overlap weight for each subject is $\omega_1(x) = 1 - \epsilon(x)$ ; for the BC trial (trial 0), $\omega_0(x) = \epsilon(x)$ . These weights select the subset of the joint covariate space where both studies have sufficient representation, maximizing statistical efficiency within the region of common support (Fang et al., 20 Oct 2025).
Sponsor-Side Analysis: Each sponsor applies the prescribed overlap weights in their own IPD-based analysis, conducting a usual MAIC that now explicitly targets the overlap population.
Aggregation of Results: The arbitrator combines these MAIC-derived estimates across drugs via:

$\theta^{AB}_{ATO} = \theta^{AC}_{ATO} - \theta^{BC}_{ATO}$

with variance estimated as the sum of squared standard errors.

Estimand: The target estimand is the average treatment effect in the (overlap) population, written as:

$\theta^{AB}_{ATO} = \frac{\int \big(E(Y \mid Z=A, x) - E(Y \mid Z=B, x)\big) \phi(x) \epsilon(x)(1-\epsilon(x)) dx}{\int \phi(x)\epsilon(x)(1-\epsilon(x)) dx},$

where $\phi(x)$ is the (empirical or estimated) covariate density.

This methodology can be applied even in the absence of directly sharable IPD by simulating pseudo-IPD based on covariate summaries. A key distinguishing feature is that all parties are compelled to target the same estimand, and no analytic flexibility remains for selective targeting of favorable populations.

4. Target Population Alignment: The Overlap Population

The “overlap population” forms the objective reference population for arbitrated indirect comparisons. Its defining property is maximal overlap in the distribution of effect modifiers and key prognostic variables between the AC and BC trials. This population maximizes the product $\epsilon(x)(1-\epsilon(x))$ and corresponds to the region in the covariate space where both trials contribute informative support. By construction, overlap weighting yields the “most balanced” population for which robust inference on comparative effectiveness is possible (Fang et al., 20 Oct 2025). In settings where there is nontrivial lack of overlap, this approach avoids the inflated variance or bias that occurs when comparing highly non-overlapping populations.

Examples in the literature use discrete effect modifiers (e.g., race) to illustrate how overlap-weighted analyses force a compromise between trial populations, leading to rebalanced covariate distributions and consistent estimated treatment effects when both sponsors follow the arbitrated analytic protocol.

5. Applications and Implications for Health Technology Assessment

Arbitrated indirect treatment comparisons are most relevant in regulatory and HTA processes where multiple sponsors submit comparative effectiveness evidence generated from disparate IPD sources. By instituting an arbitration procedure that targets inference to the overlap population, HTA authorities receive directly comparable evidence and avoid the interpretational ambiguities introduced by post-hoc, sponsor-selected populations.

The overlap weighting approach ensures that the final estimate corresponds to a population actually present in the data, facilitating both interpretability and statistical validity. When sponsors cannot, or will not, share IPD, arbitrated imputation of pseudo-IPD via simulation from AgD can extend this framework. This protocol guards against the MAIC paradox and enhances the legitimacy of reimbursement or policy decisions built on indirect evidence (Fang et al., 20 Oct 2025).

6. Comparison with Other Population-Adjusted Methods

Conventional MAIC and simulated treatment comparison (STC) methods adjust for baseline covariate imbalances but leave the target population definition to each analyst’s discretion. Population adjustment procedures that do not explicitly reconcile target populations can therefore magnify discrepancies in inference due to different effect modifier distributions or trial design features (Cheng et al., 2019, Remiro-Azócar et al., 2020, Remiro-Azócar et al., 2020). Arbitrated methods distinguish themselves by embedding the overlap population definition into the analytic pipeline—an approach also recognizable in statistical literature on “average treatment effect in overlap” (ATO) estimands.

The arbitrated process is distinctive in mandating sponsor alignment on the estimand, a property lacking in standard pairwise or network meta-analytic population adjustment. While this requires greater coordination (and, in some instances, access to pooled or simulated IPD), the benefit is the elimination of contradictory evidence due to population misalignment.

7. Limitations and Future Directions

The arbitrated indirect treatment comparison framework’s primary operational limitation is the need for either pooled IPD or highly reliable simulation of IPD from AgD. Incomplete or low-fidelity representation of effect modifier distributions may bias the overlap weights or limit the generalizability of findings. Furthermore, this approach presupposes the existence of a meaningful overlap population; in cases of minimal or negligible overlap, estimands may become ill-defined or inferences unstable. Implementation requires coordination and neutrality in the arbitration process, which may not be trivial in contentious or proprietary data settings.

A future direction includes extending the arbitrated methodology to multi-arm settings and further theoretical exploration of overlap estimands under complex network structures and limited data availability. Integration with doubly robust and calibration-based methods may further enhance inferential robustness (Campbell et al., 30 Apr 2025, Zhu et al., 28 Sep 2025).

Arbitrated indirect treatment comparisons provide a rigorous and principled solution to inferential inconsistencies in population-adjusted indirect comparison, anchored by the overlap population. This methodology strengthens comparative effectiveness research and evidence submission for HTA, promising more reproducible and transparent cross-sponsor estimates that are less susceptible to the ambiguities of traditional MAIC or STC approaches (Fang et al., 20 Oct 2025).