Higher-order Derivatives of Weighted Finite-state Machines
Abstract: Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A2 N4)$ time where $A$ is the alphabet size and $N$ is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.
- W. K. Clifford. 1871. Preliminary sketch of biquaternions. Proceedings of the London Mathematical Society, 1.
- On the computation of the relative entropy of probabilistic automata. International Journal of Foundations of Computer Science, 19.
- Modeling word forms using latent underlying morphs and phonology. Transactions of the Association for Computational Linguistics, 3.
- Jason Eisner. 2001. Expectation semirings: Flexible EM for learning finite-state transducers. In Proceedings of the European Summer School in Logic, Language and Information Workshop on Finite-state Methods in Natural Language Processing.
- Alexander Geyken and Thomas Hanneforth. 2005. TAGH: A complete morphology for German based on weighted finite state automata. In Finite-State Methods and Natural Language Processing, 5th International Workshop.
- Andreas Griewank. 1989. On automatic differentiation. Mathematical Programming: Recent Developments and Applications, 6.
- Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives–Principles and Techniques of Algorithmic Differentiation, 2nd edition. Society for Industrial and Applied Mathematics.
- Differentiable weighted finite-state transducers. CoRR, abs/2010.01003.
- Introduction to Automata Theory, Languages, and Computation, 2 edition. Addison-Wesley-Longman.
- Stephen C. Kleene. 1956. Representation of events in nerve nets and finite automata. Automata Studies.
- Kevin Knight and Jonathan Graehl. 1998. Machine transliteration. Computational Linguistics, 24.
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning.
- Daniel J. Lehmann. 1977. Algebraic structures for transitive closure. Theoretical Computer Science, 4.
- Zhifei Li and Jason Eisner. 2009. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
- HFST tools for morphology - an efficient open-source package for construction of morphological analyzers. In Proceedings of the State of the Art in Computational Morphology - Workshop on Systems and Frameworks for Computational Morphology.
- Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Linguistics, 23.
- Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16.
- The design principles of a weighted finite-state transducer library. Theoretical Computer Science, 231.
- Barak A. Pearlmutter and Jeffrey Mark Siskind. 2007. Lazy multivariate higher-order forward-mode AD. In Proceedings of the 34th Association for Computer Machinery Special Interest Group on Programming Languages and Special Interest Group on Algorithms and Computation Theory Symposium on Principles of Programming Languages.
- K. B. Petersen and M. S. Pedersen. 2008. The matrix cookbook. Version 20081110.
- Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the Institute of Electrical and Electronics Engineers, 77.
- Weighting finite-state transductions with neural context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Joan-Andreu Sánchez and Verónica Romero. 2020. Computation of moments for probabilistic finite-state automata. Information Sciences, 516.
- Bridging CNNs, RNNs, and weighted finite-state machines. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume 1.
- Efficient computation of expectations under spanning tree distributions. Transactions of the Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.