AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking (2403.05628v2)
Abstract: Curating high quality datasets that play a key role in the emergence of new AI applications requires considerable time, money, and computational resources. So, effective ownership protection of datasets is becoming critical. Recently, to protect the ownership of an image dataset, imperceptible watermarking techniques are used to store ownership information (i.e., watermark) into the individual image samples. Embedding the entire watermark into all samples leads to significant redundancy in the embedded information which damages the watermarked dataset quality and extraction accuracy. In this paper, a multi-segment encoding-decoding method for dataset watermarking (called AMUSE) is proposed to adaptively map the original watermark into a set of shorter sub-messages and vice versa. Our message encoder is an adaptive method that adjusts the length of the sub-messages according to the protection requirements for the target dataset. Existing image watermarking methods are then employed to embed the sub-messages into the original images in the dataset and also to extract them from the watermarked images. Our decoder is then used to reconstruct the original message from the extracted sub-messages. The proposed encoder and decoder are plug-and-play modules that can easily be added to any watermarking method. To this end, extensive experiments are preformed with multiple watermarking solutions which show that applying AMUSE improves the overall message extraction accuracy upto 28% for the same given dataset quality. Furthermore, the image dataset quality is enhanced by a PSNR of $\approx$2 dB on average, while improving the extraction accuracy for one of the tested image watermarking methods.
- AWS. 2023. What is AWS Data Exchange? , Access on 30/01/23.
- Digital image watermarking techniques: a review. Information, 11(2): 110.
- Digital watermarking and steganography. Morgan kaufmann.
- Active image indexing. arXiv preprint arXiv:2210.10620.
- Watermarking images in self-supervised latent spaces. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3054–3058. IEEE.
- TrailChain: Traceability of data ownership across blockchain-enabled multiple marketplaces. Journal of Network and Computer Applications, 203: 103389.
- A robust, distortion minimizing technique for watermarking relational databases using once-for-all usability constraints. IEEE Transactions on Knowledge and Data Engineering, 25(12): 2694–2707.
- Untargeted backdoor watermark: Towards harmless and stealthy dataset copyright protection. arXiv preprint arXiv:2210.00875.
- Open-sourced Dataset Protection via Backdoor Watermarking. In NeurIPS Workshop.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983.
- A hierarchical protection scheme for intellectual property of semi-open source datasets based on double watermarking. Optik, 269: 169931.
- Maesen, P. 2023. Image watermarking for Machine Learning datasets. https://repository.tudelft.nl/islandora/object/uuid:0fd384aa-c0c1-42bb-ac7d-b8a091a02c33?collection=education.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, 722–729. IEEE.
- NFT-Based Data Marketplace with Digital Watermarking. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, 4756–4767. New York, NY, USA: Association for Computing Machinery. ISBN 9798400701030.
- A Robust Database Watermarking Scheme That Preserves Statistical Characteristics. IEEE Transactions on Knowledge and Data Engineering.
- researchandmarkets. 2023. AI Training Dataset Market Size, Share and Trends Analysis Report, By Type (Text, Image/Video, Audio), By Vertical (IT, Automotive, Government, Healthcare, BFSI), By Regions, And Segment Forecasts, 2023 - 2030. https://www.researchandmarkets.com/report/artificial-intelligence-training-data.
- Radioactive data: tracing through training. In International Conference on Machine Learning, 8326–8335. PMLR.
- Watermarking relational databases using optimization-based techniques. IEEE transactions on Knowledge and Data Engineering, 20(1): 116–129.
- Rights protection for relational data. IEEE Transactions on Knowledge and Data Engineering, 16(12): 1509–1525.
- Snowflake. 2023. Snowflake Marketplace. , Access on 30/01/23.
- Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking. arXiv preprint arXiv:2303.11470.
- Data Isotopes for Data Provenance in DNNs. arXiv preprint arXiv:2208.13893.
- Digital rights management scheme based on redactable blockchain and perceptual hash. Peer-to-Peer Networking and Applications, 16(5): 2630–2648.
- Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV), 657–672.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.