ClickX Models: Predicting User Clicks

Updated 13 November 2025

ClickX Models are predictive frameworks that forecast user clicks on ranked lists by modeling user-item interactions and position bias.
They integrate probabilistic models, neural architectures, and graph-based techniques to capture sequential dependencies and non-monotonic behaviors.
These models enable robust offline evaluation, system optimization, and counterfactual estimation across search, advertising, and recommendation systems.

ClickX Models are a broad class of models that predict user clicks on ranked lists, slates, or recommendation outputs in web search, advertising, and recommender systems. Their primary focus is understanding and modeling the mechanisms underlying user-item interaction and position bias, thereby enabling principled offline evaluation, system optimization, and robust counterfactual estimation. The term "ClickX" has been adopted as a shorthand encompassing diverse model variants, including probabilistic graphical models, neural architectures, explicit-feature interaction frameworks, and graph-based enhancements.

1. Foundations: Probabilistic Click Models and User Behavior

Classic click models are formulated as probabilistic graphical models (PGMs) over observed click indicators $c_{i,j}\in\{0,1\}$ and latent behavioral variables, most notably ‘examination’ $e_{i,j}$ and ‘attractiveness’ $a_{i,j}$ . The central ‘examination hypothesis’ (Richardson, 2007) asserts that a user clicks an item if and only if she both examines it and finds it attractive: $c_{i,j}=1\;\Longleftrightarrow\;e_{i,j}=1\ \land\ a_{i,j}=1.$ Accordingly, the marginal click probability factorizes as $P(c_{i,j}=1)=P(e_{i,j}=1)\times P(a_{i,j}=1)$ . This abstraction allows many concrete instantiations:

Position-Based Model (PBM): $P(e_{i,j}\!=\!1)$ depends solely on rank.
Cascade/Dependent Click Model (CM/DCM/DBN): Models richer dependencies, e.g., satisfaction-dependent stopping or continued examination.
Document/Item Models (DCTR/IP): Model click probability as functions of item identity, position, and possibly context.

These models have a clear hierarchical structure: RCM⊆RCTR⊆DCTR⊆PBM⊆IP, where RCM is the simplest, collapsing all clicks to a single constant.

2. Modern Extensions: Neural and Graph-Structured ClickX Models

Neural click models and graph-enhanced models extend the limitations of basic PGMs. The "ClickX" family (Shirokikh et al., 2024) encapsulates a variety of architectures:

ClickX–GRU (NCM): Sequentially models click dependencies in slates using a recurrent (GRU) network, concatenating each item embedding $e^{(j)}_i$ with the embedding of the previous-clicked item. This enables the model to learn short-range dependencies in user behavior.
ClickX–Adv (AdvNCM): Adds an adversarial discriminator atop ClickX–GRU, combining cross-entropy loss with an adversarial loss that encourages generated click sequences to be indistinguishable from ground-truth, employing Gumbel-Softmax relaxation for differentiability.
ClickX–RANCM: Allows non-monotonic (random-access) exploration of slates, using a recurrent state to output a query vector that 'attends' over all items plus a halt token, modeling more realistic user paths beyond left-to-right scans.
ClickX–SCOT: Reduces quadratic attention cost by restricting Transformer's self-attention to only the set of previously clicked items, aggregating sparse implicit feedback at the session level while maintaining low computational complexity.
ClickX–TGRU: Employs a hierarchical scheme (Transformer at the slate level, GRU across sessions), enabling the capture of both intra-slate and inter-slate dependencies efficiently.

These variants are trained by maximum likelihood, typically using binary cross-entropy, with adversarial and auxiliary losses introduced where appropriate.

3. Feature-Interaction Models and Unified Frameworks

Beyond user browsing and attention biases, click prediction for advertising and recommendation (CTR prediction) is governed by the structure of feature interactions. The IPA framework (Kang et al., 2024) unifies explicit feature-interaction models into three components:

Component	Function	Examples
Interaction Function	Computes interaction between two embeddings $t_i, t_j$	Inner-product, outer-product, Hadamard, MLP
Layer Pooling	Aggregates interactions over multiple fields/layers	Field-wise, global pooling
Layer Aggregator	Combines outputs of all layers for final prediction	Direct sum, weighted sum, MLP

This framework encompasses classic models:

FM, NFM, PNN, DeepFM, DCN V2, xDeepFM, FwFM, FiBiNet: Can all be cast as choices within the IPA abstraction, varying in interaction, pooling, and aggregation strategy.

Empirical results demonstrate that models employing projected-product interactions, field-wise pooling, and layer-wise aggregation (e.g., PFL) outperform diagonal or identity-based interaction models and exhibit robust performance in both offline and online settings, including statistically significant GMV lifts in large-scale production (Kang et al., 2024).

4. State-of-the-art Architectures: Graphs, Cross-Networks, and Boosted DNNs

Graph-Enhanced Click Model (GraphCM) (Lin et al., 2022): Targets data sparsity and cold-start limitations by constructing homogeneous graphs over queries ( $G_q$ ) and documents ( $G_d$ ), leveraging both intra-session (e.g., reformulation, SERP similarity) and inter-session (collaborative) edges. Graph attention networks (GAT) encode node embeddings; a neighbor interaction module propagates auxiliary signals. GraphCM predicts attractiveness (document-level) and examination (session-level) probabilities, combining them via learned functions ("expmul" yields optimal accuracy). Experimental results show GraphCM achieves the best log-likelihood and lowest perplexity on three major benchmarks, especially excelling in extreme sparsity and cold-start splits. Component ablations underscore the necessity of graph-based propagation for robust performance.

Fusing Cross Network (FCN/DCNv3) (Li et al., 2024): Advances explicit feature-crossing architectures by fusing a Linear Cross Network (LCN, linear order growth) and Exponential Cross Network (ECN, exponential order growth), trained with Tri-BCE loss (separate, adaptively-upweighted supervision). The Self-Mask module strictly filters noise, pruning 50% of cross-network parameters. FCN achieves higher AUC and lower logloss than explicit-only and hybrid models (DeepFM, DCNv2, xDeepFM, etc.) on six CTR benchmarks, with ECN alone outperforming prior explicit baselines and the full FCN setting new SOTA.

XDBoost (Iterative Boosting DNN) (Livne et al., 2020): Iteratively augments learned DNN classifiers with stage-wise error-correcting residual features regressed by dedicated subnets. Rather than summing outputs à la GBDT, XDBoost re-injects learned residuals as new features, retraining the classifier. This design yields significant gains in data-scarce and cold-start settings, with up to +6% AUC over standard DNNs at low data rates, and remains statistically superior to tree-based boosters (CatBoost, XGBoost) in large-scale offline tests.

5. Optimization, Scalability, and Open-Source Libraries

Traditional EM-based estimation of click PGMs imposes scaling and convergence bottlenecks. Recent advances adopt modern gradient-based optimization:

CLAX (Click Models in JAX) (Hager et al., 5 Nov 2025): Implements classic PGMs (PBM, CM, UBM, DCM, CCM, DBN, mixtures) as differentiable computation graphs in JAX/Flax. This enables end-to-end mini-batch training, GPU or TPU acceleration, and joint parameter learning for core components (embeddings, neural networks, custom modules). Numerically stable log-domain arithmetic ensures robustness. Empirically, CLAX trains all classic click PGMs in under two hours on billion-scale data (Baidu-ULTR), outperforming or matching EM solvers while supporting model extension and integration with neural architectures.

Best practices for implementation include minibatch SGD or AdamW, log-space computation for underflow/overflow prevention, and modular interfaces supporting composable model structures. Sparse embedding support and distributed training are current areas for further development.

6. Applications: Evaluation, Policy Optimization, and Auctions

ClickX models are foundational for:

Offline Evaluation via Importance Weighting (Li et al., 2018): Model-based estimators (item-position, PBM, DCTR) enable unbiased and lower-variance counterfactual click estimation, critical for ranking policy evaluation under static logs. Hierarchy of models balances variance reduction and assumption robustness; in large-scale or rich-context settings, structured but flexible models (IP/PBM) provide superior RMSE and theoretical guarantees.
Position Auctions with Externalities and Brand Effects (Hummel et al., 2014): When click probabilities depend on the identities of other shown ads (externalities) or brand effects modulate slot sensitivity, axiomatic extensions of the standard separable model are needed. Binary-search algorithms solve for optimal slotting under negative externalities, and explicit combinatorial assignment (enumeration/pruning) is necessary under brand effects. Naive greedy algorithms can suffer up to 50% welfare loss in extreme brand-effect scenarios.

7. Outlook, Limitations, and Best Practices

Best practices converge on:

Employing full projected-product interactions and field-wise pooling for robust, expressive feature modeling.
Leveraging graph topology or neural backbones (GAT, GRU, Transformer) matched to domain structure and data regime.
Tuning model depth larger than presumed optimum—layer-weighted aggregation auto-selects effective orders.
Prioritizing numerically stable, GPU-optimized gradients for both model expressiveness and industrial-scale efficiency.

While ClickX models are SOTA in leveraging user implicit feedback, key limitations remain: sensitivity to latent variable assumptions, reliance on high-quality embeddings, challenges in cold-start for rare fields/entities without collaborative signals, and ongoing tuning requirements for large production settings. Incorporating richer user signals (dwell, hover), joint modeling of position/examination, and tractable sparsity remains an open challenge and direction for future research.