Privacy-Preserved Neural Graph Databases (2312.15591v5)
Abstract: In the era of LLMs, efficient and accurate data retrieval has become increasingly crucial for the use of domain-specific or private data in the retrieval augmented generation (RAG). Neural graph databases (NGDBs) have emerged as a powerful paradigm that combines the strengths of graph databases (GDBs) and neural networks to enable efficient storage, retrieval, and analysis of graph-structured data which can be adaptively trained with LLMs. The usage of neural embedding storage and Complex neural logical Query Answering (CQA) provides NGDBs with generalization ability. When the graph is incomplete, by extracting latent patterns and representations, neural graph databases can fill gaps in the graph structure, revealing hidden relationships and enabling accurate query answering. Nevertheless, this capability comes with inherent trade-offs, as it introduces additional privacy risks to the domain-specific or private databases. Malicious attackers can infer more sensitive information in the database using well-designed queries such as from the answer sets of where Turing Award winners born before 1950 and after 1940 lived, the living places of Turing Award winner Hinton are probably exposed, although the living places may have been deleted in the training stage due to the privacy concerns. In this work, we propose a privacy-preserved neural graph database (P-NGDB) framework to alleviate the risks of privacy leakage in NGDBs. We introduce adversarial training techniques in the training stage to enforce the NGDBs to generate indistinguishable answers when queried with private information, enhancing the difficulty of inferring sensitive information through combinations of multiple innocuous queries.
- Complex Query Answering with Neural Link Predictors. In International Conference on Learning Representations.
- Knowledge graph reasoning over entities and numerical values. arXiv preprint arXiv:2306.01399 (2023).
- Query2Particles: Knowledge Graph Reasoning with Particle Embeddings. Findings of the Association for Computational Linguistics: NAACL 2022-Findings (2022).
- Sequential query encoding for complex query answering on knowledge graphs. arXiv preprint arXiv:2302.13114 (2023).
- Dbpedia-a crystallization point for the web of data. Journal of web semantics 7, 3 (2009), 154–165.
- Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 1247–1250.
- Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
- Probabilistic entity representation model for reasoning over knowledge graphs. Advances in Neural Information Processing Systems 34 (2021), 23440–23451.
- Node-level differentially private graph neural networks. arXiv preprint arXiv:2111.15521 (2021).
- Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data. 123–138.
- Xin Luna Dong. 2018. Challenges and innovations in building a product knowledge graph. In Proceedings of the 24th ACM SIGKDD International conference on knowledge discovery & data mining. 2869–2869.
- Quantifying privacy leakage in graph embedding. In MobiQuitous 2020-17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 76–85.
- AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web. 413–422.
- Neil Zhenqiang Gong and Bin Liu. 2018. Attribute inference attacks in online social networks. ACM Transactions on Privacy and Security (TOPS) 21, 1 (2018), 1–30.
- Embedding logical queries on knowledge graphs. Advances in neural information processing systems 31 (2018).
- Resisting structural re-identification in anonymized social networks. Proceedings of the VLDB Endowment 1, 1 (2008), 102–114.
- Fedgraphnn: A federated learning system and benchmark for graph neural networks. arXiv preprint arXiv:2104.07145 (2021).
- Stealing links from graph neural networks. In 30th USENIX Security Symposium (USENIX Security 21). 2669–2686.
- Cluster-based anonymization of knowledge graphs. In Applied Cryptography and Network Security: 18th International Conference, ACNS 2020, Rome, Italy, October 19–22, 2020, Proceedings, Part II 18. Springer, 104–123.
- Learning privacy-preserving graph convolutional network with partially observed sensitive attributes. In Proceedings of the ACM Web Conference 2022. 3552–3561.
- Qi Hu and Yangqiu Song. 2023. Independent Distribution Regularization for Private Graph Embedding. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 823–832. https://doi.org/10.1145/3583780.3614933
- Analyzing Graphs with Node Differential Privacy.. In TCC, Vol. 13. Springer, 457–476.
- Bhushan Kotnis and Alberto García-Durán. 2018. Learning numerical attributes in knowledge bases. In Automated Knowledge Base Construction (AKBC).
- Answering complex queries in knowledge graphs with bidirectional sequence encoders. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4968–4977.
- Adversarial privacy-preserving graph embedding against inference attack. IEEE Internet of Things Journal 8, 8 (2020), 6904–6915.
- Graph adversarial networks: Protecting information against adversarial attacks. (2020).
- Fair representation learning: An alternative to mutual information. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1088–1097.
- Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 93–106.
- Neural-answering logical queries on knowledge graphs. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1087–1097.
- Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003–1011.
- Learning Fair Representation via Distributional Contrastive Disentanglement. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1295–1305.
- Membership inference attack on graph neural networks. In 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 11–20.
- Differentially private federated knowledge graphs embedding. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1416–1425.
- Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems 133 (2020), 113303.
- Generating synthetic decentralized social graphs with local differential privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 425–438.
- Neural graph reasoning: Complex logical query answering meets graph databases. arXiv preprint arXiv:2303.14617 (2023).
- Query2box: Reasoning over knowledge graphs in vector space using box embeddings. arXiv preprint arXiv:2002.05969 (2020).
- Hongyu Ren and Jure Leskovec. 2020. Beta embeddings for multi-hop logical reasoning in knowledge graphs. Advances in Neural Information Processing Systems 33 (2020), 19716–19726.
- On the k-anonymization of time-varying and multi-layer social graphs. In Proceedings of the international AAAI conference on web and social media, Vol. 9. 377–386.
- Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018).
- Entong Shen and Ting Yu. 2013. Mining frequent graph patterns with differential privacy. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 545–553.
- Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. 697–706.
- Stealing machine learning models via prediction {{\{{APIs}}\}}. In 25th USENIX security symposium (USENIX Security 16). 601–618.
- Privacy-preserving representation learning on graphs: A mutual information perspective. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1667–1676.
- Logical Message Passing Networks with One-hop Inference on Atomic Formulas. In The Eleventh International Conference on Learning Representations.
- Linkteller: Recovering private edges from graph neural networks via influence analysis. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2005–2024.
- GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 745–760.
- Link-privacy preserving graph embedding data publication with adversarial learning. Tsinghua Science and Technology 27, 2 (2021), 244–256.
- Inference attacks against graph neural networks. In 31st USENIX Security Symposium (USENIX Security 22). 4543–4560.
- Cone: Cone embeddings for multi-hop reasoning over knowledge graphs. Advances in Neural Information Processing Systems 34 (2021), 19172–19183.
- Elena Zheleva and Lise Getoor. 2007. Preserving the privacy of sensitive relationships in graph data. In International workshop on privacy, security, and trust in KDD. Springer, 153–171.
- Neural-symbolic models for logical queries on knowledge graphs. In International Conference on Machine Learning. PMLR, 27454–27478.