Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems (2401.02450v1)
Abstract: Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived from privacy-preserving data processing systems. In this paper, we present a collaborative deep learning framework for fraud prevention, designed from a privacy standpoint, and awarded at the recent PETs Prize Challenges. We leverage latent embedded representations of varied-length transaction sequences, along with local differential privacy, in order to construct a data release mechanism which can securely inform externally hosted fraud and anomaly detection models. We assess our contribution on two distributed data sets donated by large payment networks, and demonstrate robustness to popular inference-time attacks, along with utility-privacy trade-offs analogous to published work in alternative application domains.
- United Nations, “Money Laundering,” accessed on Jun 1st, 2023. [Online]. Available: www.unodc.org/unodc/en/money-laundering/overview.html
- Financial Action Task Force, “Stocktake on Data Pooling, Collaborative Analytics and Data Protection,” accessed on Jun 1st, 2023. [Online]. Available: www.fatf-gafi.org/publications/digitaltransformation/documents/data-pooling-collaborative-analytics-data-protection.html
- “U.K.-U.S. Privacy Enhancing Technologies (PETs) prize challenges,” accessed on Jun 1st, 2023. [Online]. Available: petsprizechallenges.com
- A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, “Deep learning detecting fraud in credit card transactions,” in Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2018.
- B. Branco, P. Abreu, A. S. Gomes, M. S. Almeida, J. T. Ascensão, and P. Bizarro, “Interleaved sequence RNNs for fraud detection,” in 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020.
- K. Wong, D. Sutton, I. Perez, and A. Barns-Graham, “Training a machine learning system for transaction data processing,” 2023, US Patent App. 17/420,159.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, 1997.
- Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural networks: Lstm cells and network architectures,” Neural computation, vol. 31, 2019.
- Y. Xie, G. Liu, C. Yan, C. Jiang, M. Zhou, and M. Li, “Learning transactional behavioral representations for credit card fraud detection,” Transactions on Neural Networks and Learning Systems, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, NIPS, 2017.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, 2015.
- R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in 22nd ACM SIGSAC conference on computer and communications security, 2015.
- M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning,” in Symposium on Security and Privacy (SP). IEEE, 2018.
- N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks.” in USENIX Security Symposium, vol. 267, 2019.
- L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” Advances in neural information processing systems, NIPS, 2019.
- X. Yin, Y. Zhu, and J. Hu, “A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions,” ACM Computing Surveys, vol. 54, 2021.
- V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Generation Computer Systems, vol. 115, 2021.
- L. Lyu, H. Yu, J. Zhao, and Q. Yang, “Threats to federated learning,” Federated Learning: Privacy and Incentive, 2020.
- M. Coavoux, S. Narayan, and S. B. Cohen, “Privacy-preserving neural representations of text,” arXiv preprint arXiv:1808.09408, 2018.
- A. Mahendran and A. Vedaldi, “Understanding deep image representations by inverting them,” in Conference on computer vision and pattern recognition. IEEE, 2015.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Conference on computer vision and pattern recognition. IEEE, 2018.
- C. Song and A. Raghunathan, “Information leakage in embedding models,” in 2020 ACM SIGSAC conference on computer and communications security, 2020.
- C. Dwork, “Differential privacy: A survey of results,” in Theory and Applications of Models of Computation: 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings 5, 2008.
- M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in ACM SIGSAC conference on computer and communications security, 2016.
- Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., “Privacy-preserving deep learning via additively homomorphic encryption,” Transactions on Information Forensics and Security, vol. 13, 2017.
- B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, and L. van der Maaten, “Crypten: Secure multi-party computation meets machine learning,” Advances in Neural Information Processing Systems, NIPS, 2021.
- “Society for Worldwide Interbank Financial Telecommunication,” accessed on Jun 1st, 2023. [Online]. Available: www.swift.com
- Y. Liu, X. Zhang, and L. Wang, “Asymmetrical vertical federated learning,” arXiv preprint arXiv:2004.07427, 2020.
- D. Romanini, A. J. Hall, P. Papadopoulos, T. Titcombe, A. Ismail, T. Cebere, R. Sandmann, R. Roehm, and M. A. Hoeh, “Pyvertical: A vertical federated learning framework for multi-headed splitnn,” arXiv preprint arXiv:2104.00489, 2021.
- L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting unintended feature leakage in collaborative learning,” in Symposium on security and privacy (SP). IEEE, 2019.
- “Featurespace,” accessed on Jun 1st, 2023. [Online]. Available: www.featurespace.co.uk
- R. J. Bolton and D. J. Hand, “Statistical fraud detection: A review,” Statistical science, vol. 17, 2002.
- Universal Financial Industry Message Scheme, “ISO 20022 Message Definitions,” accessed on Jun 1st, 2023. [Online]. Available: www.iso20022.org/iso-20022-message-definitions
- International Organization for Standardization, “ISO 8583 Message Definitions,” accessed on Jun 1st, 2023. [Online]. Available: www.iso.org/obp/ui/#iso:std:iso:8583
- J. T. Hancock and T. M. Khoshgoftaar, “Survey on categorical data for neural networks,” Journal of Big Data, 2020.
- K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big data, 2016.
- Z. Li, J. Han, E. Weinan, and Q. Li, “Approximation and optimization theory for linear continuous-time recurrent neural networks.” Journal of Machine Learning Research, vol. 23, 2022.
- K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečnỳ, S. Mazzocchi, B. McMahan et al., “Towards federated learning at scale: System design,” Proceedings of machine learning and systems, vol. 1, 2019.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE signal processing magazine, vol. 37, 2020.
- C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey on federated learning,” Knowledge-Based Systems, vol. 216, 2021.
- M. Naehrig, K. Lauter, and V. Vaikuntanathan, “Can homomorphic encryption be practical?” in ACM workshop on Cloud computing security workshop, 2011.
- C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends in Theoretical Computer Science, vol. 9, 2014.
- J. Heurix, P. Zimmermann, T. Neubauer, and S. Fenz, “A taxonomy for privacy enhancing technologies,” Computers & Security, vol. 53, 2015.
- F. K. Dankar and K. El Emam, “Practicing differential privacy in health care: A review.” Transactions on Data Privacy, vol. 6, 2013.
- L. Wang, D. Zhang, D. Yang, B. Y. Lim, and X. Ma, “Differential location privacy for sparse mobile crowdsensing,” in International Conference on Data Mining (ICDM). IEEE, 2016.
- R. Xu, N. Baracaldo, and J. Joshi, “Privacy-preserving machine learning: Methods, challenges and directions,” arXiv preprint arXiv:2108.04417, 2021.
- S. Darbha and R. Arora, “Privacy in cbdc technology,” Bank of Canada, Tech. Rep., 2020.
- S. Almuhammadi, N. T. Sui, and D. McLeod, “Better privacy and security in e-commerce: using elliptic curve-based zero knowledge proofs,” in International Conference on e-Commerce Technology. IEEE, 2004.
- S. Kanamori, T. Abe, T. Ito, K. Emura, L. Wang, S. Yamamoto, T. P. Le, K. Abe, S. Kim, R. Nojima et al., “Privacy-preserving federated learning for detecting fraudulent financial transactions in japanese banks,” Journal of Information Processing, vol. 30, 2022.
- R. Canillas, R. Talbi, S. Bouchenak, O. Hasan, L. Brunie, and L. Sarrat, “Exploratory study of privacy preserving fraud detection,” in 19th International Middleware Conference Industry, 2018.
- J. Lin, J. Niu, X. Liu, and M. Guizani, “Protecting your shopping preference with differential privacy,” Transactions on Mobile Computing, vol. 20, 2020.
- H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, and J. Han, “Large-scale embedding learning in heterogeneous event data,” in International Conference on Data Mining (ICDM). IEEE, 2016.
- N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
- L. Logeswaran and H. Lee, “An efficient framework for learning sentence representations,” in International Conference on Learning Representations, ICLR, 2018.
- A. Nguyen, N. Karampatziakis, and W. Chen, “Meet in the middle: A new pre-training paradigm,” arXiv preprint arXiv:2303.07295, 2023.
- D. Babaev, N. Ovsov, I. Kireev, M. Ivanova, G. Gusev, I. Nazarov, and A. Tuzhilin, “Coles: Contrastive learning for event sequences with self-supervision,” in International Conference on Management of Data, 2022.
- M. J. Wainwright, M. Jordan, and J. C. Duchi, “Privacy aware learning,” Advances in Neural Information Processing Systems, NIPS, 2012.
- N. Phan, X. Wu, H. Hu, and D. Dou, “Adaptive laplace mechanism: Differential privacy preservation in deep learning,” in International Conference on Data Mining (ICDM). IEEE, 2017.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, New York, 2006.
- Y. Huang, Y. Cheng, A. Bapna, O. Firat, D. Chen, M. Chen, H. Lee, J. Ngiam, Q. V. Le, Y. Wu et al., “Gpipe: Efficient training of giant neural networks using pipeline parallelism,” Advances in neural information processing systems, NIPS, 2019.
- D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. R. Devanur, G. R. Ganger, P. B. Gibbons, and M. Zaharia, “Pipedream: Generalized pipeline parallelism for dnn training,” ser. SOSP ’19. Association for Computing Machinery, 2019.
- D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in neural information processing systems, NIPS, vol. 28, 2015.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in Symposium on security and privacy. IEEE, 2017.
- M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in 22nd ACM SIGSAC conference on computer and communications security, 2015.
- P. Liu, X. Xu, and W. Wang, “Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives,” Cybersecurity, 2022.
- I. Perez, P. Skalski, A. Barns-Graham, J. Wong, and D. Sutton, “Attribution of predictive uncertainties in classification models,” in Uncertainty in Artificial Intelligence (UAI), 2022.
- UK Financial Conduct Authority, “Strong Customer Authentication and Common and Secure Methods of Communication,” accessed on Jun 1st, 2023. [Online]. Available: www.handbook.fca.org.uk/techstandards/PS/2021/
- D. Micci-Barreca, “A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems,” ACM SIGKDD Explorations Newsletter, vol. 3, 2001.
- A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,” Expert Systems with Applications, vol. 51, 2016.
- X. Pan, M. Zhang, S. Ji, and M. Yang, “Privacy risks of general-purpose language models,” in Symposium on Security and Privacy (SP). IEEE, 2020.