Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation (2312.00645v2)
Abstract: There is a growing need to gain insight into LLM capabilities that relate to sensitive topics, such as bioterrorism or cyberwarfare. However, traditional open source benchmarks are not fit for the task, due to the associated practice of publishing the correct answers in human-readable form. At the same time, enforcing mandatory closed-quarters evaluations might stifle development and erode trust. In this context, we propose hashmarking, a protocol for evaluating LLMs in the open without having to disclose the correct answers. In its simplest form, a hashmark is a benchmark whose reference solutions have been cryptographically hashed prior to publication. Following an overview of the proposed evaluation protocol, we go on to assess its resilience against traditional attack vectors (e.g. rainbow table attacks), as well as against failure modes unique to increasingly capable generative models.
- Holistic Evaluation of Language Models, October 2023. URL http://arxiv.org/abs/2211.09110. arXiv:2211.09110 [cs].
- Measuring Mathematical Problem Solving With the MATH Dataset, November 2021a. URL http://arxiv.org/abs/2103.03874. arXiv:2103.03874 [cs].
- ANALYSING MATHEMATICAL REASONING ABILITIES OF NEURAL MODELS. 2019.
- Measuring Massive Multitask Language Understanding, January 2021b. URL http://arxiv.org/abs/2009.03300. arXiv:2009.03300 [cs].
- The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge. Scientific Reports, 13(1):7240, May 2023. ISSN 2045-2322. doi: 10.1038/s41598-023-33607-z. URL https://www.nature.com/articles/s41598-023-33607-z. Number: 1 Publisher: Nature Publishing Group.
- PubMedQA: A Dataset for Biomedical Research Question Answering, September 2019. URL http://arxiv.org/abs/1909.06146. arXiv:1909.06146 [cs, q-bio].
- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge, March 2018. URL http://arxiv.org/abs/1803.05457. arXiv:1803.05457 [cs].
- Aligning AI With Shared Human Values, February 2023. URL http://arxiv.org/abs/2008.02275. arXiv:2008.02275 [cs].
- UnifiedQA: Crossing Format Boundaries With a Single QA System, October 2020. URL http://arxiv.org/abs/2005.00700. arXiv:2005.00700 [cs].
- What Will it Take to Fix Benchmarking in Natural Language Understanding?, October 2021. URL http://arxiv.org/abs/2104.02145. arXiv:2104.02145 [cs].
- SCROLLS: Standardized CompaRison Over Long Language Sequences, October 2022. URL http://arxiv.org/abs/2201.03533. arXiv:2201.03533 [cs, stat].
- QuALITY: Question Answering with Long Input Texts, Yes!, May 2022. URL http://arxiv.org/abs/2112.08608. arXiv:2112.08608 [cs].
- Evaluate, a. URL https://huggingface.co/docs/evaluate/index.
- Model evaluation for extreme risks, September 2023. URL http://arxiv.org/abs/2305.15324. arXiv:2305.15324 [cs].
- Password Storage - OWASP Cheat Sheet Series, b. URL https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html.
- M. Naor and M. Yung. Universal one-way hash functions and their cryptographic applications. In Proceedings of the twenty-first annual ACM symposium on Theory of computing - STOC ’89, pages 33–43, Seattle, Washington, United States, 1989. ACM Press. ISBN 978-0-89791-307-2. doi: 10.1145/73007.73011. URL http://portal.acm.org/citation.cfm?doid=73007.73011.
- Argon2: New Generation of Memory-Hard Functions for Password Hashing and Other Applications | IEEE Conference Publication | IEEE Xplore, c. URL https://ieeexplore.ieee.org/document/7467361.
- Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine, 37(3):50–60, May 2020. ISSN 1053-5888, 1558-0792. doi: 10.1109/MSP.2020.2975749. URL https://ieeexplore.ieee.org/document/9084352/.
- Layer-wised Model Aggregation for Personalized Federated Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10082–10091, June 2022. doi: 10.1109/CVPR52688.2022.00985. URL https://ieeexplore.ieee.org/document/9880164/. Conference Name: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ISBN: 9781665469463 Place: New Orleans, LA, USA Publisher: IEEE.
- Calibrating Noise to Sensitivity in Private Data Analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, Lecture Notes in Computer Science, pages 265–284, Berlin, Heidelberg, 2006. Springer. ISBN 978-3-540-32732-5. doi: 10.1007/11681878_14.
- David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, February 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005. URL https://doi.org/10.1021/ci00057a005. Publisher: American Chemical Society.
- Brute-force and dictionary attack on hashed real-world passwords. In 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1161–1166, May 2018. doi: 10.23919/MIPRO.2018.8400211. URL https://ieeexplore.ieee.org/document/8400211.
- Password Cracking with Brute Force Algorithm and Dictionary Attack Using Parallel Programming. Applied Sciences, 13(10):5979, January 2023. ISSN 2076-3417. doi: 10.3390/app13105979. URL https://www.mdpi.com/2076-3417/13/10/5979. Number: 10 Publisher: Multidisciplinary Digital Publishing Institute.
- A Future-Adaptable Password Scheme.
- Colin Percival. STRONGER KEY DERIVATION VIA SEQUENTIAL MEMORY-HARD FUNCTIONS.
- Philippe Oechslin. Making a Faster Cryptanalytic Time-Memory Trade-Off. volume 2729, pages 617–630, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. ISBN 978-3-540-40674-7 978-3-540-45146-4. doi: 10.1007/978-3-540-45146-4_36. URL http://link.springer.com/10.1007/978-3-540-45146-4_36. Book Title: Advances in Cryptology - CRYPTO 2003 Series Title: Lecture Notes in Computer Science.
- Evaluating Large Language Models Trained on Code, July 2021. URL http://arxiv.org/abs/2107.03374. arXiv:2107.03374 [cs].
- Hierarchical Neural Story Generation, May 2018. URL http://arxiv.org/abs/1805.04833. arXiv:1805.04833 [cs].
- The Curious Case of Neural Text Degeneration, February 2020. URL http://arxiv.org/abs/1904.09751. arXiv:1904.09751 [cs].
- Cryptographic Accumulators: Definitions, Constructions and Applications. 2002. URL https://www.semanticscholar.org/paper/Cryptographic-Accumulators%3A-Definitions%2C-and-Fazio-Nicolosi/a611cef6f0391bd5a8eec61b5cf0e1e1896a0dae.
- TruthfulQA: Measuring How Models Mimic Human Falsehoods, May 2022. URL http://arxiv.org/abs/2109.07958. arXiv:2109.07958 [cs].
- Discovering Latent Knowledge in Language Models Without Supervision, December 2022. URL http://arxiv.org/abs/2212.03827. arXiv:2212.03827 [cs].
- The alignment problem from a deep learning perspective, September 2023. URL http://arxiv.org/abs/2209.00626. arXiv:2209.00626 [cs].
- Activation Addition: Steering Language Models Without Optimization, November 2023. URL http://arxiv.org/abs/2308.10248. arXiv:2308.10248 [cs].
- Representation Engineering: A Top-Down Approach to AI Transparency, October 2023. URL http://arxiv.org/abs/2310.01405. arXiv:2310.01405 [cs].
- A Review of zk-SNARKs, October 2023. URL http://arxiv.org/abs/2202.06877. arXiv:2202.06877 [cs].
- Scalable, transparent, and post-quantum secure computational integrity.
- Amit Sabne. XLA : Compiling Machine Learning for Peak Performance, 2020.
- N. Bostrom. INFORMATION HAZARDS: A TYPOLOGY OF POTENTIAL HARMS FROM KNOWLEDGE. 2011. URL https://www.semanticscholar.org/paper/INFORMATION-HAZARDS%3A-A-TYPOLOGY-OF-POTENTIAL-HARMS-Bostrom/274c17084e5373a854b13a39c45d072e2b09970e.
- S. C. Jansen and B. Martin. The Streisand Effect and Censorship Backfire. International Journal of Communication, February 2015. URL https://www.semanticscholar.org/paper/The-Streisand-Effect-and-Censorship-Backfire-Jansen-Martin/626538c63976db5d87a3da081c1ea83671e3bacc.
- Alan Cullison. Inside Al-Qaeda’s Hard Drive. The Atlantic, September 2004. ISSN 2151-9463. URL https://www.theatlantic.com/magazine/archive/2004/09/inside-al-qaeda-s-hard-drive/303428/. Section: Global.
- Have I Been Pwned: Check if your email has been compromised in a data breach, d. URL https://haveibeenpwned.com/.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.