Constrained coding upper bounds via Goulden-Jackson cluster theorem (2407.16449v1)
Abstract: Motivated by applications in DNA-based data storage, constrained codes have attracted a considerable amount of attention from both academia and industry. We study the maximum cardinality of constrained codes for which the constraints can be characterized by a set of forbidden substrings, where by a substring we mean some consecutive coordinates in a string. For finite-type constrained codes (for which the set of forbidden substrings is finite), one can compute their capacity (code rate) by the ``spectral method'', i.e., by applying the Perron-Frobenious theorem to the de Brujin graph defined by the code. However, there was no systematic method to compute the exact cardinality of these codes. We show that there is a surprisingly powerful method arising from enumerative combinatorics, which is based on the Goulden-Jackson cluster theorem (previously not known to the coding community), that can be used to compute not only the capacity, but also the exact formula for the cardinality of these codes, for each fixed code length. Moreover, this can be done by solving a system of linear equations of size equal to the number of constraints. We also show that the spectral method and the cluster method are inherently related by establishing a direct connection between the spectral radius of the de Brujin graph used in the first method and the convergence radius of the generating function used in the second method. Lastly, to demonstrate the flexibility of the new method, we use it to give an explicit upper bound on the maximum cardinality of variable-length non-overlapping codes, which are a class of constrained codes defined by an infinite number of forbidden substrings.
- Numerical linear algebra, volume 55. Springer, 2008.
- Dragana Bajic. On construction of cross-bifix-free kernel sets. 2nd MCM COST, 2100, 2007.
- Universal framework for parametric constrained coding. arXiv preprint arXiv:2304.01317, 2023.
- Non-overlapping matrices. Theoretical Computer Science, 658:36–45, 2017.
- A 2d non-overlapping code over aq-ary alphabet. Cryptography and Communications, 10:667–683, 2018.
- Non-overlapping matrices via dyck words. ENUMERATIVE COMBINATORICS AND APPLICATIONS., 1:0–0, 2021.
- Cross-bifix-free sets via motzkin paths generation. arXiv preprint arXiv:1410.4710, 2014.
- Stefano Bilotta. Variable-length non-overlapping codes. IEEE Transactions on Information Theory, 63(10):6530–6537, 2017.
- A new approach to cross-bifix-free sets. IEEE Transactions on Information Theory, 58(6):4058–4063, 2012.
- Simon R. Blackburn. Non-overlapping codes. IEEE Transactions on Information Theory, 61(9):4890–4894, 2015.
- Béla Bollobás. On generalized graphs. Acta Mathematica Hungarica, 16(3-4):447–452, 1965.
- Cross-bifix-free codes within a constant factor of optimality. IEEE Transactions on Information Theory, 59(7):4668–4674, 2013.
- Coding for racetrack memories. IEEE Transactions on Information Theory, 64(11):7094–7112, 2018.
- Efficient and explicit balanced primer codes. IEEE Transactions on Information Theory, 66(9):5344–5357, 2020.
- Michael Fekete. Über die verteilung der wurzeln bei gewissen algebraischen gleichungen mit ganzzahligen koeffizienten. Mathematische Zeitschrift, 17(1):228–249, 1923.
- Analytic combinatorics. cambridge University press, 2009.
- John G. F. Francis. The qr transformation a unitary analogue to the lr transformation-part 1. The Computer Journal, 4(3):265–271, 1961.
- John G. F. Francis. The qr transformation–part 2. The Computer Journal, 4(4):332–345, 01 1962.
- Von G. Frobenius. Über matrizen aus nicht negativen elementen. 1912.
- Locally balanced constraints. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 664–669. IEEE, 2020.
- Edgar N. Gilbert. Synchronization of binary messages. IRE Transactions on Information Theory, 6(4):470–477, 1960.
- An inversion theorem for cluster decompositions of sequences with distinguished subsequences. Journal of the London Mathematical Society, 2(3):567–576, 1979.
- A characterization of the dna data storage channel. Scientific reports, 9(1):9663, 2019.
- Weakly mutually uncorrelated codes. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 2649–2653, 2016.
- Kees A. Schouhamer Immink. Runlength-limited sequences. Proceedings of the IEEE, 78(11):1745–1759, 1990.
- Kees A. Schouhamer Immink. Efmplus: The coding format of the multimedia compact disc. IEEE Transactions on Consumer Electronics, 41(3):491–497, 1995.
- Kees A. Schouhamer Immink. A survey of codes for optical disk recording. IEEE Journal on selected areas in communications, 19(4):756–764, 2001.
- Kees A. Schouhamer Immink. Codes for mass data storage systems. Shannon Foundation Publisher, 2004.
- Codes for digital recorders. IEEE Transactions on Information Theory, 44(6):2260–2299, 1998.
- Codes for constrained periodicity. arXiv preprint arXiv:2205.03911, 2022.
- Serge Lang. Complex analysis, volume 103 of Graduate Texts in Mathematics. Springer-Verlag, New York, fourth edition, 1999.
- Vladimir Iosifovich Levenshtein. Decoding automata which are invariant with respect to their initial state. Probl. Cybern, 12:125–136, 1964.
- Mutually uncorrelated codes for dna storage. IEEE Transactions on Information Theory, 65(6):3671–3691, 2019.
- David Lubell. A short proof of sperner’s lemma. Journal of Combinatorial Theory, 1(2):299, 1966.
- An introduction to coding for constrained systems. Lecture notes, 2001.
- Lev D Meshalkin. Generalization of sperner’s theorem on the number of subsets of a finite set. Theory of Probability & Its Applications, 8(2):203–204, 1963.
- Two-dimensional weight-constrained codes for crossbar resistive memory arrays. IEEE Communications Letters, 25(5):1435–1438, 2021.
- Efficient design of capacity-approaching two-dimensional weight-constrained codes. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 2930–2935. IEEE, 2021.
- Two dimensional rc/subarray constrained codes: Bounded weight and almost balanced weight. arXiv preprint arXiv:2208.09138, 2022.
- The goulden—jackson cluster method: extensions, applications and implementations. Journal of Difference Equations and Applications, 5(4-5):355–377, 1999.
- Low complexity two-dimensional weight-constrained codes. IEEE Transactions on Information Theory, 58(6):3892–3899, 2012.
- Oskar Perron. Zur theorie der matrices. Mathematische Annalen, 64:248–263, 1907.
- On non-expandable cross-bifix-free codes. arXiv preprint arXiv:2309.08915, 2023.
- Characterizing and measuring bias in sequence data. Genome biology, 14:1–20, 2013.
- Correcting deletions in multiple-heads racetrack memories. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 1367–1371, 2019.
- Emanuel Sperner. Ein satz über untermengen einer endlichen menge. Mathematische Zeitschrift, 27(1):544–548, 1928.
- Richard P Stanley. Enumerative combinatorics volume 1 second edition. Cambridge studies in advanced mathematics, 2011.
- Introduction to numerical analysis, volume 2. Springer, 1980.
- Mutually uncorrelated primers for dna-based data storage. IEEE Transactions on Information Theory, 64(9):6283–6296, 2018.
- Dna-based storage: Trends and methods. IEEE Transactions on Molecular, Biological and Multi-Scale Communications, 1(3):230–248, 2015.
- A rewritable, random-access dna-based storage system. Scientific reports, 5(1):1–10, 2015.
- Coding schemes for locally balanced constraints. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 1342–1347, 2022.
- Geyang Wang and Qi Wang. Q-ary non-overlapping codes: A generating function approach. IEEE Transactions on Information Theory, 68(8):5154–5164, 2022.
- Geyang Wang and Qi Wang. On the maximum size of variable-length non-overlapping codes. arXiv preprint arXiv:2402.18896, 2024.
- Qi Wang. Personal communication. March, 2024.
- David S Watkins. Understanding the qr algorithm. SIAM review, 24(4):427–440, 1982.
- Koichi Yamamoto. Logarithmic order of free distributive lattice. Journal of the Mathematical Society of Japan, 6(3-4):343–353, 1954.