- The paper introduces a novel framework that formalises message orchestration as a sequential decision problem with modular action sets.
- It leverages Difference-in-Differences for individual treatment effect estimation and uses contextual Thompson sampling for optimal action selection.
- Empirical results from over 150 million users show significant uplifts in engagement and conversion, validating scalable personalisation.
Agentic Personalisation of Cross-Channel Marketing Experiences
This paper introduces a generalisable framework for user-level personalisation in cross-channel marketing communications, formalising message orchestration as a sequential decision-making problem. The proposed methodology leverages modular action sets, individual treatment effect (ITE) estimation via Difference-in-Differences, and Thompson sampling within a contextual bandit framework. The system is deployed at production scale, orchestrating personalised marketing actions for over 150 million users, and empirical results confirm significant improvements in both engagement (upper-funnel metrics) and business outcomes (conversion, GMV).
Methodological Overview
The authors begin by identifying inherent inefficiencies and limitations in current Customer Relationship Management (CRM) strategies, which predominantly rely on rules-based segmentation and manual message orchestration. This status quo inhibits fine-grained personalisation, especially as user preferences are both high-dimensional and dynamic.
To address these challenges, the paper formulates marketing orchestration as a modular sequential decision problem. Each message opportunity is decomposed into an action space encompassing dimensions such as channel, timing, frequency, content tone, emoji usage, and promotional offers. This modularisation enables tractable exploration of large action spaces, circumventing combinatorial explosion associated with treating each possible message as a discrete action.
A critical advancement in the methodology is in the reward construction and causal estimation:
- Outcome Modelling: User responses are encoded as weighted event streams, allowing flexible specification of both goal and proxy events. Rewards are constructed as temporally weighted sums, approximating the log-likelihood ratios of downstream goal event probabilities, thereby capturing both immediacy and informativeness akin to Granger causality.
- Incrementality and Causal Inference: An adapted Difference-in-Differences approach, with sparse Interrupted Time Series analysis, is used for estimating Individual Treatment Effects (ITE) on a per-user basis. Careful control group selection mitigates bias-variance trade-offs, with nearest-neighbour matching for high-fidelity counterfactual estimation.
- Action Selection: Contextual Thompson sampling drives the agentic system, optimising the exploration-exploitation trade-off. Empirical Bayes priors facilitate robust parameter estimation even in sparse-data regimes, drawing from collaborative filtering paradigms.
Notably, the system maintains a human-in-the-loop for copywriting refinements, messaging guardrails, and action space curation, ensuring that automation amplifies—rather than replaces—professional marketing expertise.
Decomposition of Action Spaces
The mapping from modular action sets to actual marketing message variants is a direct application of the reinforcement learning Wolpertinger architecture, supporting large-scale, structured action spaces without necessitating explicit enumeration of all action combinations. Each agent selects a viable configuration, which is then matched to eligible messages for sending. This extends actionability while maintaining business and compliance constraints.
Empirical Results and Deployment
A large-scale randomised controlled field experiment in a multi-service application validates the impact of the proposed framework. Key findings:
- Across four product features, statistically significant uplifts are observed in both intent (e.g., page views, add-to-cart rates) and conversion metrics, with 99% confidence intervals reflecting absolute increases ranging from +0.70% to +2.45% for intent signals, and up to +0.48% for conversions.
- Gross Merchandise Value (GMV) shows substantial relative increases: +14.12% to +43.39%, depending on the product feature.
The methodology enables a substantial expansion in the number of message variants marketers can test, driving operational efficiency and business value. The framework is now deployed across 150 million users, with agentic orchestration supporting both established and emerging product marketing strategies.
Empirically, no degradation in lower-funnel events is observed; instead, improvements are consistent across all layers of the engagement funnel, substantiating the incremental nature of the measured effects.
Implications and Future Directions
The theoretical contribution lies in effectively integrating techniques from econometrics, causal inference, and contextual bandits into a practical, production-grade CRM orchestration engine. The Difference-in-Differences ITE estimator, adjusted for temporal/seasonal confounding and high-dimensional event streams, offers a template for incremental effect modelling in other user-behaviour domains.
Practically, this work demonstrates that agentic personalisation is feasible and beneficial at true industrial scale. By modularising action spaces and removing marketers from the critical path for per-user decisions, the system substantially reduces operational bottlenecks and enables large-scale personalisation along dimensions that were previously infeasible.
Open avenues for research include:
- Extending the framework to capture long-term effects and delayed outcomes through reinforcement learning or multi-objective policy learning.
- Investigating robustness to dynamic user preference drift and feedback loops, possibly through adaptive contextual bandits or continual learning paradigms.
- Benchmarking variants of causal effect estimators for proxy/reward construction, including potential sensitivity to unmeasured confounding.
- Exploring the integration of generative models within modular action frameworks, while maintaining business/brand control and mitigating risks of hallucinated or non-compliant outputs.
In conclusion, this agentic approach represents a concrete step forward in scalable, effective personalisation for cross-channel marketing, with strong empirical validation and architectural innovations applicable to broader classes of sequential decision-making problems in consumer applications.