Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 75 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 131 tok/s Pro
Kimi K2 168 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Cross-Domain Content Generation with Domain-Specific Small Language Models (2409.17171v2)

Published 19 Sep 2024 in cs.CL and cs.AI

Abstract: Generating domain-specific content using small LLMs poses challenges, especially when dealing with multiple distinct datasets with minimal overlap. In this study, we explore methods to enable a small LLM to produce coherent and relevant outputs for two different domains: stories (Dataset A) and recipes (Dataset B). Our initial experiments show that training individual models on each dataset yields satisfactory results, with each model generating appropriate content within its domain. We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality compared to using a generic tokenizer. Attempts to adapt a single model to both domains using Low-Rank Adaptation (LoRA) or standard fine-tuning do not yield substantial results, often failing to produce meaningful outputs. Moreover, full fine-tuning without freezing the model's existing weights leads to catastrophic forgetting, where the model loses previously learned information and only retains knowledge from the new data. To overcome these challenges, we employ a knowledge expansion strategy: training only with additional parameters. This approach enables the model to generate both stories and recipes upon request, effectively handling multiple domains without suffering from catastrophic forgetting. Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small LLMs to generate domain-specific content across distinct datasets. This work contributes to the development of efficient multi-domain LLMs and provides insights into managing catastrophic forgetting in small-scale architectures.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 1 like.