Agentic Personalisation of Cross-Channel Marketing Experiences (2506.16429v1)

Published 19 Jun 2025 in cs.AI, cs.IR, and cs.LG

Abstract: Consumer applications provide ample opportunities to surface and communicate various forms of content to users. From promotional campaigns for new features or subscriptions, to evergreen nudges for engagement, or personalised recommendations; across e-mails, push notifications, and in-app surfaces. The conventional approach to orchestration for communication relies heavily on labour-intensive manual marketer work, and inhibits effective personalisation of content, timing, frequency, and copy-writing. We formulate this task under a sequential decision-making framework, where we aim to optimise a modular decision-making policy that maximises incremental engagement for any funnel event. Our approach leverages a Difference-in-Differences design for Individual Treatment Effect estimation, and Thompson sampling to balance the explore-exploit trade-off. We present results from a multi-service application, where our methodology has resulted in significant increases to a variety of goal events across several product features, and is currently deployed across 150 million users.

Summary

The paper introduces a novel framework that formalises message orchestration as a sequential decision problem with modular action sets.
It leverages Difference-in-Differences for individual treatment effect estimation and uses contextual Thompson sampling for optimal action selection.
Empirical results from over 150 million users show significant uplifts in engagement and conversion, validating scalable personalisation.

Agentic Personalisation of Cross-Channel Marketing Experiences

This paper introduces a generalisable framework for user-level personalisation in cross-channel marketing communications, formalising message orchestration as a sequential decision-making problem. The proposed methodology leverages modular action sets, individual treatment effect (ITE) estimation via Difference-in-Differences, and Thompson sampling within a contextual bandit framework. The system is deployed at production scale, orchestrating personalised marketing actions for over 150 million users, and empirical results confirm significant improvements in both engagement (upper-funnel metrics) and business outcomes (conversion, GMV).

Methodological Overview

The authors begin by identifying inherent inefficiencies and limitations in current Customer Relationship Management (CRM) strategies, which predominantly rely on rules-based segmentation and manual message orchestration. This status quo inhibits fine-grained personalisation, especially as user preferences are both high-dimensional and dynamic.

To address these challenges, the paper formulates marketing orchestration as a modular sequential decision problem. Each message opportunity is decomposed into an action space encompassing dimensions such as channel, timing, frequency, content tone, emoji usage, and promotional offers. This modularisation enables tractable exploration of large action spaces, circumventing combinatorial explosion associated with treating each possible message as a discrete action.

A critical advancement in the methodology is in the reward construction and causal estimation:

Outcome Modelling: User responses are encoded as weighted event streams, allowing flexible specification of both goal and proxy events. Rewards are constructed as temporally weighted sums, approximating the log-likelihood ratios of downstream goal event probabilities, thereby capturing both immediacy and informativeness akin to Granger causality.
Incrementality and Causal Inference: An adapted Difference-in-Differences approach, with sparse Interrupted Time Series analysis, is used for estimating Individual Treatment Effects (ITE) on a per-user basis. Careful control group selection mitigates bias-variance trade-offs, with nearest-neighbour matching for high-fidelity counterfactual estimation.
Action Selection: Contextual Thompson sampling drives the agentic system, optimising the exploration-exploitation trade-off. Empirical Bayes priors facilitate robust parameter estimation even in sparse-data regimes, drawing from collaborative filtering paradigms.

Notably, the system maintains a human-in-the-loop for copywriting refinements, messaging guardrails, and action space curation, ensuring that automation amplifies—rather than replaces—professional marketing expertise.

Decomposition of Action Spaces

The mapping from modular action sets to actual marketing message variants is a direct application of the reinforcement learning Wolpertinger architecture, supporting large-scale, structured action spaces without necessitating explicit enumeration of all action combinations. Each agent selects a viable configuration, which is then matched to eligible messages for sending. This extends actionability while maintaining business and compliance constraints.

Empirical Results and Deployment

A large-scale randomised controlled field experiment in a multi-service application validates the impact of the proposed framework. Key findings:

Across four product features, statistically significant uplifts are observed in both intent (e.g., page views, add-to-cart rates) and conversion metrics, with 99% confidence intervals reflecting absolute increases ranging from +0.70% to +2.45% for intent signals, and up to +0.48% for conversions.
Gross Merchandise Value (GMV) shows substantial relative increases: +14.12% to +43.39%, depending on the product feature.

The methodology enables a substantial expansion in the number of message variants marketers can test, driving operational efficiency and business value. The framework is now deployed across 150 million users, with agentic orchestration supporting both established and emerging product marketing strategies.

Empirically, no degradation in lower-funnel events is observed; instead, improvements are consistent across all layers of the engagement funnel, substantiating the incremental nature of the measured effects.

Implications and Future Directions

The theoretical contribution lies in effectively integrating techniques from econometrics, causal inference, and contextual bandits into a practical, production-grade CRM orchestration engine. The Difference-in-Differences ITE estimator, adjusted for temporal/seasonal confounding and high-dimensional event streams, offers a template for incremental effect modelling in other user-behaviour domains.

Practically, this work demonstrates that agentic personalisation is feasible and beneficial at true industrial scale. By modularising action spaces and removing marketers from the critical path for per-user decisions, the system substantially reduces operational bottlenecks and enables large-scale personalisation along dimensions that were previously infeasible.

Open avenues for research include:

Extending the framework to capture long-term effects and delayed outcomes through reinforcement learning or multi-objective policy learning.
Investigating robustness to dynamic user preference drift and feedback loops, possibly through adaptive contextual bandits or continual learning paradigms.
Benchmarking variants of causal effect estimators for proxy/reward construction, including potential sensitivity to unmeasured confounding.
Exploring the integration of generative models within modular action frameworks, while maintaining business/brand control and mitigating risks of hallucinated or non-compliant outputs.

In conclusion, this agentic approach represents a concrete step forward in scalable, effective personalisation for cross-channel marketing, with strong empirical validation and architectural innovations applicable to broader classes of sequential decision-making problems in consumer applications.