Efficiently Enabling Block Semantics and Data Updates in DNA Storage (2212.13447v2)
Abstract: We propose a novel and flexible DNA-storage architecture, which divides the storage space into fixed-size units (blocks) that can be independently and efficiently accessed at random for both read and write operations, and further allows efficient sequential access to consecutive data blocks. In contrast to prior work, in our architecture a pair of random-access PCR primers of length 20 does not define a single object, but an independent storage partition, which is internally blocked and managed independently of other partitions. We expose the flexibility and constraints with which the internal address space of each partition can be managed, and incorporate them into our design to provide rich and functional storage semantics, such as block-storage organization, efficient implementation of data updates, and sequential access. To leverage the full power of the prefix-based nature of PCR addressing, we define a methodology for transforming the internal addressing scheme of a partition into an equivalent that is PCR-compatible. This allows us to run PCR with primers that can be variably elongated to include a desired part of the internal address, and thus narrow down the scope of the reaction to retrieve a specific block or a range of blocks within the partition with sufficiently high accuracy. Our wetlab evaluation demonstrates the practicality of the proposed ideas and a 140x reduction in sequencing cost and latency for retrieval of individual blocks within the partition.
- Random access DNA memory using Boolean search in an archival file storage system. Nature materials 20, 9 (2021), 1272–1280.
- Chhandak Basu. 2015. PCR Primer Design (Methods in Molecular Biology, 1275), 2nd edition. In Springer Protocols.
- Molecular-level similarity search brings computing to DNA data storage. Nature Communications 12, 1 (2021), 4764.
- Krishna Gopal Benerjee and Adrish Banerjee. 2022. On homopolymers and secondary structures avoiding, reversible, reversible-complement and GC-balanced DNA codes. In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 204–209.
- Trace reconstruction problems in computational biology. IEEE Transactions on Information Theory 67, 6 (2020), 3295–3314.
- A DNA-based archival storage system. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 637–649.
- Evaluation of the impact of single nucleotide polymorphisms and primer mismatches on quantitative PCR. BMC biotechnology 9 (2009), 1–15.
- Molecular digital data storage using DNA. Nature Reviews Genetics 20, 8 (2019), 456–466.
- Next-generation digital information storage in DNA. Science 337, 6102 (2012), 1628–1628.
- General concepts for PCR primer design. PCR methods appl 3, 3 (1993), S30–S37.
- DNA codes with run-length limitation and Knuth-like balancing of the GC contents. In IEEE Symposium on Information Theory and its Applications (SITA), Japan.
- Yaniv Erlich and Dina Zielinski. 2017. DNA Fountain enables a robust and efficient storage architecture. Science 355, 6328 (2017), 950–954.
- Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 7435 (2013), 77–80.
- Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie International Edition 54, 8 (2015), 2552–2555.
- Fundamental limits of DNA storage systems. In 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 3130–3134.
- PCR protocols: a guide to methods and applications. Academic press.
- The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome biology 17 (2016), 1–11.
- Simulating Noisy Channels in DNA Storage. In IEEE International Symposium on Performance Analysis of Systems and Software.
- Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nature Communications 10, 1 (2019), 2383.
- Managing reliability skew in DNA storage. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 482–494.
- Dynamic and scalable DNA-based information storage. Nature Communications 11, 1 (2020), 2981.
- Capacity-approaching constrained codes with error correction for DNA-based data storage. IEEE Transactions on Information Theory 67, 8 (2021), 5602–5613.
- Random access in large-scale DNA data storage. Nature Biotechnology 36, 3 (2018), 242–248.
- Probing the physical limits of reliable DNA data retrieval. In Nature Communications.
- Rewritable two-dimensional DNA-based data storage with machine learning reconstruction. Nature Communications 13, 1 (2022), 2984.
- Iterative coding scheme satisfying gc balance and run-length constraints for dna storage with robustness to error propagation. Journal of Communications and Networks 24, 3 (2022), 283–291.
- HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proceedings of the National Academy of Sciences 117, 31 (2020), 18489–18496.
- Clustering billions of reads for DNA data storage. Advances in Neural Information Processing Systems 30 (2017).
- Reconstruction Algorithms for DNA Storage Systems. In International Conference on DNA Computing and Molecular Programming.
- Trellis BMA: Coded trace reconstruction on IDS channels for DNA storage. In 2021 IEEE International Symposium on Information Theory (ISIT). IEEE, 2453–2458.
- A content-addressable DNA database with learned sequence encodings. In DNA Computing and Molecular Programming: 24th International Conference, DNA 24, Jinan, China, October 8–12, 2018, Proceedings 24. Springer, 55–70.
- DNA punch cards for storing data on native DNA sequences via enzymatic nicking. Nature Communications 11, 1 (2020), 1742.
- A rewritable, random-access DNA-based storage system. Scientific Reports 5, 1 (2015), 14138.
- Demonstration of end-to-end automation of DNA data storage. Scientific Reports 9, 1 (2019), 4998.
- The Length Limit of 5’ Nucleotide Additions to PCR Primers. National Academy Science Letters 41 (2018), 207–210.
- Promiscuous molecules for smarter file operations in DNA-based data storage. Nature Communications 12, 1 (2021), 3518.
- Driving the scalability of DNA-based information storage systems. ACS synthetic biology 8, 6 (2019), 1241–1248.
- Nanopore sequencing technology, bioinformatics and applications. Nature Biotechnology 39, 11 (2021), 1348–1365.
- Embracing Errors is More Efficient than Avoiding Them through Constrained Coding for DNA Data Storage. arXiv preprint arXiv:2308.05952 (2023).
- Puddle: A dynamic, error-correcting, full-stack microfluidics platform. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 183–197.
- Combinatorial PCR method for efficient, selective oligo retrieval from complex oligo pools. ACS Synthetic Biology 11, 5 (2022), 1727–1734.
- Portable and error-free DNA-based data storage. Scientific Reports 7, 1 (2017), 5011.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.