Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Dynamic WFST Decoding for Personalized Language Models

Published 23 Oct 2019 in cs.CL and cs.LG | (1910.10670v1)

Abstract: We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized LLMs. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A second layer is a private cache that caches the graph that represents the personalized LLM, which is only shared by the utterances from a particular user. We also propose two simple yet effective pre-initialization methods, one based on breadth-first search, and another based on a data-driven exploration of decoder states using previous utterances. Experiments with a calling speech recognition task using a personalized contact list demonstrate that the proposed public cache reduces decoding time by factor of three compared to decoding without pre-initialization. Using the private cache provides additional efficiency gains, reducing the decoding time by a factor of five.

Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.