Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collision Aware Data Allocation In Multi-tube DNA Storage (2403.14732v1)

Published 21 Mar 2024 in cs.ET

Abstract: DNA storage is a promising archival data storage solution to today's big data problem. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. For efficient target data retrieving, existing Polymerase Chain Reaction (PCR) based DNA storage systems apply primers as specific identifiers to tag different sets of DNA strands. However, if a primer has collisions with any payload in the same DNA tube, the primer cannot safely serve as an identifier and must be disabled in this tube. In a DNA storage system with multiple DNA tubes, the primer-payload collisions can spread over all DNA tubes, repeatedly disable many primers, and cause a significant overall capacity reduction. This paper proposes using a collision-aware data allocation scheme to allocate data with different collisions into different tubes so that a primer banned in a tube because of primer-payload collision can be reused in other tubes. This allocation helps increase the number of usable primers over all tubes thus enhancing the overall storage capacity. The executing time of our scheme is $O(n2)$ to the number of digital data chunks. The scheme serves as a pre-processing method for any DNA storage system. The evaluation of the state-of-the-art encoding scheme shows that the scheme can increase 20%-25% overall storage capacity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. ]DNAfuture [n. d.]. The Future of DNA Data Storage. https://potomacinstitute.org/images/studies/Future_of_DNA_Data_Storage.pdf. Accessed: 2022-09-12.
  2. ]InternetArchive [n. d.]. Internet Archive Public library. https://archive.org/. Accessed: 2022-09-12.
  3. ]IDC [n. d.]. Worldwide Global StorageSphere Forecast, 2021–2025: To Save or Not to Save Data, That Is the Question. https://www.idc.com/getdoc.jsp?containerId=US47509621. Accessed: 2022-09-12.
  4. DNA Data Storage Alliance. 2021. Preserving Our Digital Legacy: an Introduction To Dna Data Storage. Technical Report. tech. rep. June.
  5. Forward error correction for DNA data storage. Procedia Computer Science 80 (2016), 1011–1022.
  6. A DNA-based archival storage system. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 637–649.
  7. DNA data storage and hybrid molecular–electronic computing. Proc. IEEE 107, 1 (2018), 63–72.
  8. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
  9. Santi Garcia-Vallvé and PERE Puigbo. 2009. DendroUPGMA: a dendrogram construction utility. Universitat Rovira i Virgili (2009), 1–14.
  10. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 7435 (2013), 77–80.
  11. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie International Edition 54, 8 (2015), 2552–2555.
  12. Img-dna: approximate dna storage for images. In Proceedings of the 14th ACM International Conference on Systems and Storage. 1–9.
  13. Can We Store the Whole World’s Data in {{\{{DNA}}\}} Storage?. In 12th {normal-{\{{USENIX}normal-}\}} Workshop on Hot Topics in Storage and File Systems (HotStorage 20).
  14. Managing reliability skew in DNA storage. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 482–494.
  15. DNA stability: a central design consideration for DNA data storage systems. Nature communications 12, 1 (2021), 1–9.
  16. Ethan L Miller. 2020. The Future of the Past: Challenges in Archival Storage. (2020).
  17. Random access in large-scale DNA data storage. Nature biotechnology 36, 3 (2018), 242.
  18. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206–5210.
  19. DNA Storage: A Promising Large Scale Archival Storage? arXiv preprint arXiv:2204.01870 (2022).
  20. Portable and error-free DNA-based data storage. Scientific reports 7, 1 (2017), 1–6.
  21. A rewritable, random-access DNA-based storage system. Scientific reports 5 (2015), 14138.
  22. Bingzhe Li Yixun Wei and David Du. 2024. An Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload Collisions (under open sourced). In Proceedings of the Twenty-Nine International Conference on Architectural Support for Programming Languages and Operating Systems.

Summary

We haven't generated a summary for this paper yet.