Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-view user representation learning for user matching without personal information (2312.14533v1)

Published 22 Dec 2023 in cs.IR, cs.AI, and cs.LG

Abstract: As the digitization of travel industry accelerates, analyzing and understanding travelers' behaviors becomes increasingly important. However, traveler data frequently exhibit high data sparsity due to the relatively low frequency of user interactions with travel providers. Compounding this effect the multiplication of devices, accounts and platforms while browsing travel products online also leads to data dispersion. To deal with these challenges, probabilistic traveler matching can be used. Most existing solutions for user matching are not suitable for traveler matching as a traveler's browsing history is typically short and URLs in the travel industry are very heterogeneous with many tokens. To deal with these challenges, we propose the similarity based multi-view information fusion to learn a better user representation from URLs by treating the URLs as multi-view data. The experimental results show that the proposed multi-view user representation learning can take advantage of the complementary information from different views, highlight the key information in URLs and perform significantly better than other representation learning solutions for the user matching task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. H. Cao and E. Thomas, “Destination similarity based on implicit user interest,” arXiv preprint arXiv:2102.06687, 2021.
  2. C. Karakaya, H. Toguc, R. Salih, and A. Buyuklu, “Survey of cross device matching approaches with a case study on a novel database,” pp. 139–144, 09 2018.
  3. J. Brookman, P. Rouge, A. Alva, and C. Yeung, “Cross-device tracking: Measurement and disclosures.,” Proc. Priv. Enhancing Technol., vol. 2017, no. 2, pp. 133–148, 2017.
  4. F. Y.-S. Lin, C.-H. Hsiao, S.-Y. Zhang, Y.-P. Rung, and Y.-X. Chen, “Cross-device matching approaches: word embedding and supervised learning,” Cluster Computing, vol. 24, no. 4, pp. 3043–3053, 2021.
  5. M. Jiang, P. Cui, N. J. Yuan, X. Xie, and S. Yang, “Little is much: Bridging cross-platform behaviors through overlapped crowds,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  6. CodaLab-Competition, “Cikm cup 2016 track 1: Cross-device entity linking challenge.”
  7. M. Phan, Y. Tay, and T.-A. Pham, “Cross device matching for online advertising with neural feature ensembles : First place solution at cikm cup 2016,” 10 2016.
  8. N. K. Tran, “Classification and learning-to-rank approaches for cross-device matching at cikm cup 2016,” 12 2016.
  9. A. M. Dai, C. Olah, and Q. V. Le, “Document embedding with paragraph vectors,” arXiv preprint arXiv:1507.07998, 2015.
  10. J. Lian and X. Xie, “Cross-device user matching based on massive browse logs: The runner-up solution for the 2016 cikm cup,” 10 2016.
  11. F. Lin, C.-H. Hsiao, S.-Y. Zhang, Y.-P. Rung, and Y.-X. Chen, “Cross-device matching approaches: word embedding and supervised learning,” Cluster Computing, vol. 24, 12 2021.
  12. W. Zhang, T. Yoshida, and X. Tang, “A comparative study of tf*idf, lsi and multi-words for text classification,” Expert Systems with Applications, vol. 38, no. 3, pp. 2758–2765, 2011.
  13. G. R. Koch, “Siamese neural networks for one-shot image recognition,” 2015.
  14. T. Chen, Z. Lu, Y. Yang, Y. Zhang, B. Du, and A. Plaza, “A siamese network based u-net for change detection in high resolution remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2357–2369, 2022.
  15. X. Rafael-Palou, A. Aubanell, I. Bonavita, M. Ceresa, G. Piella, V. Ribas, and M. A. G. Ballester, “Re-identification and growth detection of pulmonary nodules without image registration using 3d siamese neural networks,” Medical Image Analysis, vol. 67, p. 101823, 2021.
  16. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” pp. 3973–3983, 01 2019.
  17. D. Chicco, “Siamese neural networks: An overview,” Artificial neural networks, pp. 73–94, 2021.
  18. U. Tanielian, A.-M. Tousch, and F. Vasile, “Siamese cookie embedding networks for cross-device user matching,” 03 2018.
  19. R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742, 2006.
  20. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
  21. H. Cao, S. Bernard, L. Heutte, and R. Sabourin, “Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images,” in International conference image analysis and recognition, pp. 779–787, Springer, 2018.
  22. H. Cao, S. Bernard, R. Sabourin, and L. Heutte, “Random forest dissimilarity based multi-view learning for radiomics application,” Pattern Recognition, vol. 88, pp. 185–197, 2019.
  23. H. Cao, S. Bernard, R. Sabourin, and L. Heutte, “A novel random forest dissimilarity measure for multi-view learning,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 1344–1351, IEEE, 2021.
  24. E. LeDell and S. Poirier, “H2O AutoML: Scalable automatic machine learning,” 7th ICML Workshop on Automated Machine Learning (AutoML), July 2020.
  25. H. Cao, S. Bernard, L. Heutte, and R. Sabourin, “Dynamic voting in multi-view learning for radiomics applications,” in Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2018, Beijing, China, August 17–19, 2018, Proceedings 9, pp. 32–41, Springer, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.