Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combining Language Models For Specialized Domains: A Colorful Approach (2310.19708v3)

Published 30 Oct 2023 in cs.CL and cs.LG

Abstract: General purpose LLMs (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging to interpret mixed speech that blends general language with specialized jargon. This poses a challenge for automatic speech recognition systems operating within these specific domains. In this work, we introduce a novel approach that integrates domain-specific or secondary LM into general-purpose LM. This strategy involves labeling, or "coloring", each word to indicate its association with either the general or the domain-specific LM. We develop an optimized algorithm that enhances the beam search algorithm to effectively handle inferences involving colored words. Our evaluations indicate that this approach is highly effective in integrating jargon into language tasks. Notably, our method substantially lowers the error rate for domain-specific words without compromising performance in the general domain.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Daniel Eitan (1 paper)
  2. Menachem Pirchi (3 papers)
  3. Neta Glazer (8 papers)
  4. Shai Meital (2 papers)
  5. Gil Ayach (1 paper)
  6. Gidon Krendel (1 paper)
  7. Aviv Shamsian (23 papers)
  8. Aviv Navon (23 papers)
  9. Gil Hetz (3 papers)
  10. Joseph Keshet (42 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.