Unified coding schemes combining address constraints with error robustness

Develop coding schemes for DNA storage that simultaneously enforce address-sequence design constraints (GC-prefix balance, large mutual Hamming distance, mutual uncorrelatedness, and absence of secondary structure) and provide robustness to both synthesis and sequencing errors.

Background

DNA synthesis and sequencing introduce diverse errors, including substitutions, occasional indels, and coverage-related issues. Existing approaches address some error types (e.g., Reed–Solomon, LDPC, profile codes, single-deletion correction) but do not jointly integrate stringent address-design constraints with comprehensive error resilience.

The authors emphasize the need for integrated codes that respect the address constraints essential for random access and selection while also correcting errors typical of synthesis and sequencing workflows.

References

It remains an open problem to design codes that efficiently combine all the constraints imposed by address design considerations and at the same provide robustness to both synthesis and sequencing errors.

DNA-Based Storage: Trends and Methods  (1507.01611 - Yazdi et al., 2015) in Section “Error-Control Coding for DNA Storage”