FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction (2310.19453v4)

Published 30 Oct 2023 in cs.IR and cs.AI

Abstract: Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained LLMs(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained LLMs(FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/LLMing task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.

References (91)

Authors (8)

Hangyu Wang (6 papers)
Jianghao Lin (47 papers)
Xiangyang Li (58 papers)
Bo Chen (309 papers)
Chenxu Zhu (14 papers)
Ruiming Tang (171 papers)
Weinan Zhang (322 papers)
Yong Yu (219 papers)

Citations (4)

View on Semantic Scholar

Summary

An Insightful Overview of "FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained LLMs for CTR Prediction"

The paper "FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained LLMs for CTR Prediction" presents a novel approach to improve click-through rate (CTR) predictions by integrating ID-based models with Pretrained LLMs (PLMs). This integration seeks to leverage the distinct yet complementary strengths of both paradigms while addressing their limitations.

Approach and Methodology

The authors introduce a model-agnostic framework named FLIP, which emphasizes fine-grained feature-level alignment between ID-based models, such as DeepFM and AutoInt, and PLMs like BERT. The motivation behind FLIP lies in the inherent shortcoming of ID-based models discarding semantic text information, which PLMs can retain. Conversely, PLMs often overlook collaborative signals embedded in feature interactions captured well by ID-based models.

Key elements of the FLIP framework include:

Modality Transformation: Conversion of raw tabular data into textual format using hard prompt templates. This transformation ensures that semantic information embedded in feature text is preserved.
Modality Alignment Pretraining: This critical phase involves:
- Joint Masked LLMing (MLM): Recovers masked tokens in textual data using tabular data signals.
- Masked Tabular Modeling (MTM): Recovers masked features in tabular data using textual information.
- Instance-level Contrastive Learning (ICL): Aligns text and tabular data representations at an instance level while augmenting feature-level alignment achieved through MLM and MTM.
Supervised Finetuning: Both the ID-based model and PLM are jointly finetuned for downstream CTR prediction tasks, allowing the framework to exploit strengths from both modalities.

Experimental Evaluation

The paper demonstrates the efficacy of FLIP through extensive experiments on datasets such as MovieLens-1M and BookCrossing. FLIP consistently outperforms state-of-the-art baselines by a significant margin in terms of both AUC and Logloss metrics. The performance improvement is consistent across various ID-based models and PLMs, reflecting FLIP's flexibility and robustness.

Key numerical outcomes include:

Strong AUC improvements against leading models such as CTRL, showcasing an uplift of up to 0.007 points on the BookCrossing dataset.
Demonstrated compatibility with different underlying models, confirming the framework's versatility.

Implications and Future Perspectives

The integration and alignment strategy proposed in FLIP allow both types of models to benefit from each other's strengths while mitigating their weaknesses. This presents a significant step forward for enhancing CTR prediction models, suggesting broader implications for other prediction tasks where both semantic information and field-wise collaborative signals are crucial.

Practically, the fine-grained alignment approach offers a framework that could easily be extended to incorporate even larger LLMs, potentially opening up opportunities for real-time bidding and dynamic recommendation systems in marketing and e-commerce applications.

Theoretically, FLIP underscores the importance of leveraging dual-modality data to yield high-quality context-rich user interaction predictions. As larger PLMs become increasingly prevalent, exploring efficient ways to integrate and align them with traditional data modalities may set the stage for innovations in AI-driven decision-making and personalization.

The research invites further exploration into more sophisticated field-level masking strategies, deeper integration techniques, and real-time adjustment mechanisms that leverage continuously updated user interaction data to refine recommendation engines further.

In conclusion, the FLIP framework represents a proficient interface between ID-based and LLMs. Its implications stretch beyond CTR prediction, potentially influencing the paradigms governing personalized digital solutions across industries. Future developments could focus on expanding this framework to handle even more complex data interdependencies within expansive digital ecosystems.

PDF Markdown