Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NP-Completeness for the Space-Optimality of Double-Array Tries (2403.04951v1)

Published 7 Mar 2024 in cs.DS

Abstract: Indexing a set of strings for prefix search or membership queries is a fundamental task with many applications such as information retrieval or database systems. A classic abstract data type for modelling such an index is a trie. Due to the fundamental nature of this problem, it has sparked much interest, leading to a variety of trie implementations with different characteristics. A trie implementation that has been well-used in practice is the double-array (trie) consisting of merely two integer arrays. While a traversal takes constant time per node visit, the needed space consumption in computer words can be as large as the product of the number of nodes and the alphabet size. Despite that several heuristics have been proposed on lowering the space requirements, we are unaware of any theoretical guarantees. In this paper, we study the decision problem whether there exists a double-array of a given size. To this end, we first draw a connection to the sparse matrix compression problem, which makes our problem NP-complete for alphabet sizes linear to the number of nodes. We further propose a reduction from the restricted directed Hamiltonian path problem, leading to NP-completeness even for logarithmic-sized alphabets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Jun’ichi Aoe. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering, 15(9):1066–1077, 1989.
  2. Handbook of Satisfiability, volume 185 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2009.
  3. Natural language dictionaries implemented as finite automata. In Carlos Martín-Vide, editor, Scientific Applications Of Language Methods, volume 2, chapter 4. World Scientific, 2010.
  4. Remarks on Ziegler’s method for matrix compression. unpublished, 1977.
  5. Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960.
  6. On finding minimal length superstrings. Journal of Computer and System Sciences, 20(1):50–58, 1980.
  7. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
  8. Engineering faster double-array Aho–Corasick automata. Software: Practice and Experience, 53(6):1332–1361, 2023.
  9. Practical rearrangement methods for dynamic double-array dictionaries. Software: Practice and Experience, 48(1):65–83, 2018.
  10. A compression method of double-array structures using linear functions. Knowl. Inf. Syst., 48(1):55–80, 2016. doi:10.1007/S10115-015-0873-0.
  11. Compressed double-array tries for string dictionaries supporting fast lookup. Knowledge and Information Systems, 51(3):1023–1042, 2017.
  12. Compression methods by code mapping and code dividing for chinese dictionary stored in a double-array trie. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), pages 1189–1197, 2011.
  13. A fast and compact language model implementation using double-array structures. ACM Transactions on Asian and Low-Resource Language Information Processing, 15(4):27, 2016.
  14. Fast wordpiece tokenization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2089–2103, 2021.
  15. A compact static double-array keeping character codes. Inf. Process. Manag., 43(1):237–247, 2007. doi:10.1016/j.ipm.2006.04.004.
  16. A compact static double-array keeping character codes. Information Processing & Management, 43(1):237–247, 2007.
  17. Naoki Yoshinaga. Back to patterns: Efficient Japanese morphological analysis with feature-sequence trie. In Proc. ACL, pages 13–23, 2023. doi:10.18653/V1/2023.ACL-SHORT.2.
  18. A self-adaptive classifier for efficient text-stream processing. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), pages 1091–1102, 2014.

Summary

We haven't generated a summary for this paper yet.