CROSS-JEM: Joint Modeling for Ranking & UHE Neutrinos
- CROSS-JEM is a dual-concept approach that jointly encodes candidate short texts, significantly reducing computational latency in neural ranking.
- It employs a novel listwise Ranking Probability Loss and token union strategy to achieve state-of-the-art performance on public and proprietary benchmarks.
- In astroparticle physics, CROSS-JEM leverages quasi-horizontal air shower data to constrain models of warped extra-dimensional gravity via UHE cosmic neutrinos.
CROSS-JEM refers to two technically distinct concepts within the research literature: (1) Cross-Encoders with Joint Efficient Modeling in the context of neural short-text ranking for search and recommendation (Paliwal et al., 2024), and (2) the use of cross-section measurements at JEM-EUSO to probe warped extra-dimensional gravity models via ultra-high-energy cosmic neutrinos (Mladenov et al., 2015). While unrelated in immediate subject matter, both leverage “cross-” or “joint” modeling to advance state-of-the-art performance or sensitivity within their domains.
1. Cross-Encoders with Joint Efficient Modeling (CROSS-JEM) for Short-Text Ranking
CROSS-JEM is a Transformer-based neural ranking architecture designed to efficiently and accurately rank sets of short text items (such as ad keywords, web page titles, tag phrases) based on query relevance. Unlike standard cross-encoders, which process each query-item pair independently and thus incur high computational cost and ignore listwise interactions, CROSS-JEM jointly encodes an entire candidate set in a single Transformer pass by exploiting token redundancies across items. This joint modeling significantly reduces inference latency and enables direct listwise optimization (Paliwal et al., 2024).
2. Design and Methodology of CROSS-JEM
2.1. Token Union and Joint Encoding
For each query and associated short-text item set :
- Tokenize to .
- Tokenize each item to , then construct the set-union:
- Form a single concatenated input sequence:
- Encode jointly:
where .
2.2. Selective Pooling and Joint Scoring
For each item , the model identifies token positions corresponding to , (as realized in ), and the separating token (SEP), and pools their encoded vectors:
Pooled vector for item :
All item logits are then computed in a single matrix multiplication:
where is a shared linear projection.
2.3. Training Objective: Ranking Probability Loss
CROSS-JEM introduces the listwise Ranking Probability Loss (RPL) to optimize over the correct global ranking order: Let denote the ground-truth relevance for query , item , and their predicted logit. For position , define:
Then,
This loss is mathematically equivalent to minimizing the KL-divergence to the ground-truth top-1 distribution as in the ListNet top-1 objective.
3. Computational Efficiency via Token Redundancy
Empirical studies reveal that candidate sets of short-text items typically exhibit a 5–10 overlap in subword tokens, so . As a result, the computational complexity drops from for standard cross-encoders to for CROSS-JEM. This results in approximately 4 lower latency in practice when .
| Model | Inference Latency (700 items, A100) | Throughput (pairs/sec) |
|---|---|---|
| monoBERT | 41.3 ms | 3.35K |
| CROSS-JEM | 9.8 ms | 17.2K |
| CPU Sparse Models | ~300 ms | — |
4. Empirical Performance and Evaluation
CROSS-JEM demonstrates state-of-the-art accuracy and substantial efficiency gains on both public and proprietary datasets.
- Public Benchmarks (N=10):
- SODQ: MAP@5 = 52.40% (CROSS-JEM) vs. 48.31% (ANCE), 46.79% (monoBERT)
- MS MARCO-Titles: MRR@10 = 35.45% (CROSS-JEM), 32.47% (monoBERT), 30.55% (INSTRUCTOR)
- Sponsored Search (N=700, proprietary):
- MAP@100 = 97.48% (CROSS-JEM) vs. 84.38% (MEB), 78.39% (ANCE)
- Negative Accuracy (retaining 80% positives): 99.45%
- Live A/B: quick-back-rate reduced by 1.8%, judged relevance improved by 10.2%
- Ablations show RPL achieves superior MRR@10 (35.45%) compared to BCE (31.46%), CE-listwise (32.03%), or vanilla ListNet (30.27%).
5. Implications for Production Systems
CROSS-JEM’s joint encoding and listwise training admit several properties critical for high-throughput, real-time ranking scenarios:
- One-pass encoding for all candidates eliminates the multiplicative cost with present in standard architectures.
- Retains full parameter efficiency and item list calibration, absent from dual-encoder or late-interaction methods.
- Achieves sub-10 ms latency for hundreds of candidates on commodity accelerators.
- Compatible with small BERT-base backbones, obviating the need for LLM-scale parameter counts.
- Results in direct improvements to user engagement and advertiser ROI in commercial applications due to higher accuracy and reduced quick-back rates (Paliwal et al., 2024).
6. CROSS-JEM in Astroparticle Physics: JEM-EUSO and Model-Dependent Cross-Section Enhancement
In a separate context, the term "CROSS-JEM" is used as a narrative label for leveraging JEM-EUSO’s quasi-horizontal air shower data (CRoss-section at JEM-EUSO, Editor's term) to constrain or discover signatures of warped extra-dimension gravity as described by the Randall–Sundrum (RS) model (Mladenov et al., 2015).
- Theoretical Background:
- The RS model with small 5D curvature () predicts an almost continuous spectrum of light reggeized Kaluza-Klein gravitons.
- Ultra-high-energy neutrino–nucleon () interactions at are dominated by -channel gravi-Reggeon exchanges, resulting in strongly enhanced cross-sections at eV.
- Event Rate Prediction at JEM-EUSO:
- For TeV, GeV, the predicted number of quasi-horizontal air showers exceeds a few events annually, significantly above the SM expectation (0.06 yr).
- Example event rate table (one year, GeV):
| (TeV) | Expected Events (yr) |
|---|---|
| 3 | 6.7 |
| 4 | 1.2 |
| 5 | 0.31 |
- A null result () would set a lower bound TeV ( CL), while any significant upward deviation from the SM would signal new trans-Planckian physics.
- Sensitivity and Systematics:
- Statistical uncertainty is Poissonian; 5-year exposure improves reach over ground arrays by an order of magnitude.
- Dominant systematics: flux normalization (factor of 2), exposure uncertainty (15%), model uncertainties in high- gravity (20%).
A plausible implication is that the “CROSS-JEM” analysis exemplifies the synergy of collider-inspired BSM theory and space-based cosmic ray observatories for probing fundamental quantum gravity effects at unprecedented energies (Mladenov et al., 2015).
7. Summary and Outlook
CROSS-JEM, in the neural ranking setting, bridges the gap between the high accuracy of cross-encoders and the low latency required for practical deployment in short-text ranking by (1) jointly encoding all items in one Transformer pass using a token union, (2) selectively pooling per-item embeddings, and (3) training with a novel listwise ranking loss. In the context of astroparticle physics, "CROSS-JEM" describes a methodology for extracting strong constraints on extra-dimensional gravity by linking reggeized graviton exchange modifications in cross-sections to observed air shower rates at JEM-EUSO. Both research lines demonstrate the power of joint, cross-item modeling to surpass prior limitations, either in computational efficiency, system accuracy, or new physics sensitivity (Paliwal et al., 2024, Mladenov et al., 2015).