Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 429 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying (2310.04145v2)

Published 6 Oct 2023 in cs.LG and cs.DB

Abstract: Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training without authorization. This issue is particularly challenging due to the absence of information and control over the training process conducted by potential attackers. In this paper, we concentrate on the domain of tabular data and introduce a novel methodology, Local Distribution Shifting Synthesis (\textsc{LDSS}), to detect leaked data that are used to train classification models. The core concept behind \textsc{LDSS} involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone, as the synthetic data injection results in a pronounced disparity in the predictions of models trained on leaked and modified datasets. \textsc{LDSS} is \emph{model-oblivious} and hence compatible with a diverse range of classification models. We have conducted extensive experiments on seven types of classification models across five real-world datasets. The comprehensive results affirm the reliability, robustness, fidelity, security, and efficiency of \textsc{LDSS}. Extending \textsc{LDSS} to regression tasks further highlights its versatility and efficacy compared with baseline methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Turning Your Weakness into a Strength: Watermarking Deep Neural Networks by Backdooring. In USENIX Security. 1615–1631.
  2. Patricia L Bellia. 2011. WikiLeaks and the institutional framework for national security disclosures. Yale LJ 121 (2011), 1448.
  3. Machine learning techniques for credit risk evaluation: a systematic literature review. Journal of Banking and Financial Technology 4 (2020), 111–138.
  4. Robust Image Watermarking based on Multiband Wavelets and Empirical Mode Decomposition. TIP 16, 8 (2007), 1956–1966.
  5. Franziska Boenisch. 2021. A Systematic Review on Model Watermarking for Neural Networks. Frontiers in Big Data 4 (2021), 729663.
  6. Cosine model watermarking against ensemble distillation. In AAAI, Vol. 36. 9512–9520.
  7. SMOTE: Synthetic Minority Over-Sampling Technique. JAIR 16 (2002), 321–357.
  8. Blackmarks: Blackbox Multibit Watermarking for Deep Neural Networks. arXiv preprint arXiv:1904.00344 (2019).
  9. Targeted Backdoor Attacks on Deep Learning Systems using Data Poisoning. arXiv preprint arXiv:1712.05526 (2017).
  10. Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 5 (2017), e1211.
  11. Digital Watermarking and Steganography. Morgan Kaufmann.
  12. Deepsigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks. In ASPLOS. 485–497.
  13. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
  14. Supervised GAN Watermarking for Intellectual Property Protection. In IEEE International Workshop on Information Forensics and Security. 1–6.
  15. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Rev. 53, 2 (2011), 217–288.
  16. Embedding Image Watermarks in DC Components. TCSVT 10, 6 (2000), 974–979.
  17. Point-to-Hyperplane Nearest Neighbor Search Beyond the Unit Hypersphere. In SIGMOD. 777–789.
  18. Feature selection and classification model construction on type 2 diabetic patients’ data. Artificial Intelligence in Medicine 41, 3 (2007), 251–262.
  19. Billion-Scale Similarity Search with GPUs. TBD 7, 3 (2021), 535–547.
  20. Donald E Knuth. 2014. The Art of Computer Programming: Seminumerical Algorithms, volume 2. Addison-Wesley Professional.
  21. Adversarial Frontier Stitching for Remote Neural Network Watermarking. Neural Computing and Applications 32, 13 (2020), 9233–9244.
  22. A Stochastic Search Approach for the Multidimensional Largest Empty Sphere Problem. (2004), 1–11.
  23. Sublinear Time Nearest Neighbor Search over Generalized Weighted Space. In ICML. 3773–3781.
  24. Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring. In SIGMOD. 2589–2599.
  25. PLMmark: a secure and robust black-box watermarking framework for pre-trained language models. In AAAI, Vol. 37. 14991–14999.
  26. A Survey of Deep Neural Network Watermarking Techniques. Neurocomputing 461 (2021), 171–193.
  27. Open-sourced Dataset Protection via Backdoor Watermarking. arXiv preprint arXiv:2010.05821 (2020).
  28. Isolation Forest. In ICDM. 413–422.
  29. Thomas P Minka. 2000. Automatic Choice of Dimensionality for PCA. In NIPS. 577–583.
  30. Ryota Namba and Jun Sakuma. 2019. Robust Watermarking of Neural Network with Exponential Weighting. In AsiaCCS. 228–240.
  31. Ramaswamy Palaniappan and Danilo P Mandic. 2007. Biometrics from brain electrical activity: A machine learning approach. TPAMI 29, 4 (2007), 738–742.
  32. Intellectual Property Protection of DNN Models. World Wide Web 26, 4 (2023), 1877–1911.
  33. A Survey of Digital Image Watermarking Techniques. In IEEE International Conference on Industrial Informatics. 709–716.
  34. A novel model watermarking for protecting generative adversarial network. Computers & Security 127 (2023), 103102.
  35. Watermarking deep neural networks in image processing. TNNLS 32, 5 (2020), 1852–1865.
  36. Mlaas: Machine learning as a service. In IEEE 14th International Conference on Machine Learning and Applications. 896–902.
  37. Hidden trigger backdoor attacks. In AAAI, Vol. 34. 11957–11965.
  38. Poison Frogs! Targeted Clean-label Poisoning Attacks on Neural Networks. In NeurIPS. 6106–6116.
  39. An approach for prediction of loan approval using machine learning algorithm. In 2020 International Conference on Electronics and Sustainable Communication Systems. 490–494.
  40. Leakiness and creepiness in app space: Perceptions of privacy and mobile app use. In SIGCHI. 2347–2356.
  41. Membership Inference Attacks Against Machine Learning Models. In S&P. 3–18.
  42. Machine Learning Models that Remember Too Much. In CCS. 587–601.
  43. Dawn: Dynamic adversarial watermarking of neural networks. In MM. 4417–4425.
  44. Michael E Tipping and Christopher M Bishop. 1999. Mixtures of Probabilistic Principal Component Analyzers. Neural Computation 11, 2 (1999), 443–482.
  45. Demystifying Membership Inference Attacks in Machine Learning As A Service. TSC 14, 06 (2021), 2073–2089.
  46. Embedding watermarks into deep neural networks. In ICMR. 269–277.
  47. Watermarking in Deep Neural Networks via Error Back-Propagation. Electronic Imaging 2020, 4 (2020), 22–1.
  48. Yumin Wang and Hanzhou Wu. 2022. Protecting the intellectual property of speaker recognition model by black-box watermarking in the frequency domain. Symmetry 14, 3 (2022), 619.
  49. Watermarking neural networks with watermarked images. TCSVT 31, 7 (2020), 2591–2601.
  50. Intellectual Property Protection for Deep Learning Models: Taxonomy, Methods, Attacks, and Evaluations. TAI 3, 6 (2021), 908–923.
  51. Robust watermarking for deep neural networks via bi-level optimization. In ICCV. 14841–14850.
  52. Exploring structure consistency for deep model watermarking. arXiv preprint arXiv:2108.02360 (2021).
  53. Model Watermarking for Image Processing Networks. In AAAI, Vol. 34. 12805–12812.
  54. Deep Model Intellectual Property Protection via Deep Watermarking. TPAMI 44, 8 (2021), 4005–4020.
  55. Protecting Intellectual Property of Deep Neural Networks with Watermarking. In AsiaCCS. 159–172.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube