Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HuRef: HUman-REadable Fingerprint for Large Language Models (2312.04828v4)

Published 8 Dec 2023 in cs.CL and cs.AI

Abstract: Protecting the copyright of LLMs has become crucial due to their resource-intensive training and accompanying carefully designed licenses. However, identifying the original base model of an LLM is challenging due to potential parameter alterations. In this study, we introduce HuRef, a human-readable fingerprint for LLMs that uniquely identifies the base model without interfering with training or exposing model parameters to the public. We first observe that the vector direction of LLM parameters remains stable after the model has converged during pretraining, with negligible perturbations through subsequent training steps, including continued pretraining, supervised fine-tuning, and RLHF, which makes it a sufficient condition to identify the base model. The necessity is validated by continuing to train an LLM with an extra term to drive away the model parameters' direction and the model becomes damaged. However, this direction is vulnerable to simple attacks like dimension permutation or matrix rotation, which significantly change it without affecting performance. To address this, leveraging the Transformer structure, we systematically analyze potential attacks and define three invariant terms that identify an LLM's base model. Due to the potential risk of information leakage, we cannot publish invariant terms directly. Instead, we map them to a Gaussian vector using an encoder, then convert it into a natural image using StyleGAN2, and finally publish the image. In our black-box setting, all fingerprinting steps are internally conducted by the LLMs owners. To ensure the published fingerprints are honestly generated, we introduced Zero-Knowledge Proof (ZKP). Experimental results across various LLMs demonstrate the effectiveness of our method. The code is available at https://github.com/LUMIA-Group/HuRef.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security 18), pp.  1615–1631, 2018.
  2. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  4. BaiChuan-Inc. https://github.com/baichuan-inc/Baichuan-7B, 2023.
  5. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp.  2397–2430. PMLR, 2023.
  6. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, pp.  7432–7439, 2020.
  7. Gpt-neox-20b: An open-source autoregressive language model. In Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, pp.  95–136, 2022.
  8. Franziska Boenisch. A systematic review on model watermarking for neural networks. Frontiers in big Data, 4:729663, 2021.
  9. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  10. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8188–8197, 2020.
  11. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  12. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
  13. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
  14. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
  15. Together Computer. Redpajama: An open source recipe to reproduce llama training dataset, 2023. URL https://github.com/togethercomputer/RedPajama-Data.
  16. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023. URL https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
  17. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023.
  18. Safe rlhf: Safe reinforcement learning from human feedback. arXiv preprint arXiv:2310.12773, 2023.
  19. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  20. Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster. arXiv preprint arXiv:2304.03208, 2023.
  21. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  320–335, 2022.
  22. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  23. Xinyang Geng. Easylm: A simple and scalable training framework for large language models, 2023. URL https://github.com/young-geng/EasyLM.
  24. Openllama: An open reproduction of llama, May 2023. URL https://github.com/openlm-research/open_llama.
  25. OpenAI GPT-4. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  26. Watermarking pre-trained language models with backdooring. arXiv preprint arXiv:2210.07543, 2022.
  27. Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  28. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2020.
  29. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  30. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  31. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8110–8119, 2020.
  32. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  17061–17084. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/kirchenbauer23a.html.
  33. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327, 2023.
  34. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  35. Race: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.  785–794, 2017.
  36. Origin tracing and detecting of llms. arXiv preprint arXiv:2304.14072, 2023.
  37. Zhongli Li. Billa: A bilingual llama with enhanced reasoning ability. https://github.com/Neutralzz/BiLLa, 2023.
  38. Mpt: Mesh pre-training with transformers for human pose and mesh reconstruction. arXiv preprint arXiv:2211.13357, 2022.
  39. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp.  24950–24962. PMLR, 2023. URL https://proceedings.mlr.press/v202/mitchell23a.html.
  40. OpenAI. Introducing chatgpt. 2022. URL https://openai.com/blog/chatgpt.
  41. OpenAI. Ai classifier. 2023. URL https://beta.openai.com/ai-text-classifier.
  42. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116, 2023. URL https://arxiv.org/abs/2306.01116.
  43. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  44. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  45. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106, 2021.
  46. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  47. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022.
  48. InternLM Team. Internlm: A multilingual language model with progressively enhanced capabilities. https://github.com/InternLM/InternLM, 2023.
  49. Edward Tian. Gptzero: An ai text detector. 2023. URL https://gptzero.me/.
  50. Together.ai. Llama-2-32k, 2023. URL https://huggingface.co/togethercomputer/LLaMA-2-7B-32K.
  51. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  52. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  53. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, jun 2017. doi: 10.1145/3078971.3078974. URL https://doi.org/10.1145%2F3078971.3078974.
  54. Eric J. Wang. https://github.com/tloen/alpaca-lora, 2023.
  55. Watermarking in deep neural networks via error back-propagation. Electronic Imaging, 2020(4):22–1, 2020.
  56. Attacks on digital watermarks for deep neural networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  2622–2626. IEEE, 2019.
  57. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  58. Llmdet: A large language models detection tool. arXiv preprint arXiv:2305.15004, 2023a.
  59. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023b.
  60. Protecting your nlg models with semantic and robust watermarks. arXiv preprint arXiv:2112.05428, 2021.
  61. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023a.
  62. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196, 2023b.
  63. Robust black-box watermarking for deep neural network using inverse document frequency. In 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp.  574–581. IEEE, 2021.
  64. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  4791–4800, 2019.
  65. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  586–595, 2018.
  66. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  67. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023a.
  68. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. In KDD, 2023b.
  69. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Boyi Zeng (4 papers)
  2. Chenghu Zhou (55 papers)
  3. Xinbing Wang (98 papers)
  4. Zhouhan Lin (57 papers)
  5. Lizheng Wang (7 papers)
  6. Yuncong Hu (3 papers)
  7. Yi Xu (302 papers)
  8. Yu Yu (88 papers)
Citations (4)