Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model (2211.11363v2)

Published 21 Nov 2022 in cs.CL and cs.LG

Abstract: Continual pretraining is a popular way of building a domain-specific pretrained LLM from a general-domain LLM. In spite of its high efficiency, continual pretraining suffers from catastrophic forgetting, which may harm the model's performance in downstream tasks. To alleviate the issue, in this paper, we propose a continual pretraining method for the BERT-based model, named Attention-FFN Adapter. Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Furthermore, we train a domain-specific LLM named AF Adapter based RoBERTa for the Chinese biomedical domain. In experiments, models are applied to downstream tasks for evaluation. The results demonstrate that with only about 17% of model parameters trained, AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong baselines. Further experimental results show that our method alleviates the catastrophic forgetting problem by 11% compared to the fine-tuning method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Foundations of statistical natural language processing. MIT press, 1999.
  2. Elizabeth D Liddy. Natural language processing. 2001.
  3. Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics, 35(10):1745–1752, 2019.
  4. Chemical–gene relation extraction using recursive neural network. Database, 2018, 2018.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  7. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
  8. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  9. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989.
  10. Roger Ratcliff. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, 97(2):285, 1990.
  11. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
  12. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018.
  13. Mixout: Effective regularization to finetune large-scale pretrained language models. arXiv preprint arXiv:1909.11299, 2019.
  14. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  15. On the effectiveness of adapter-based tuning for pretrained language model adaptation. arXiv preprint arXiv:2106.03164, 2021.
  16. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  17. An embarrassingly simple approach for transfer learning from pretrained language models. arXiv preprint arXiv:1902.10547, 2019.
  18. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450, 2019.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
  20. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
  21. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  22. An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893, 2018.
  23. Discourse-based objectives for fast unsupervised sentence representation learning. arXiv preprint arXiv:1705.00557, 2017.
  24. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
  25. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), 2022.
  26. Knowledge enhanced contextual word representations. arXiv preprint arXiv:1909.04164, 2019.
  27. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. arXiv preprint arXiv:1903.01306, 2019.
  28. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  29. Fl-tuning: Layer tuning for feed-forward network in transformer. arXiv preprint arXiv:2206.15312, 2022.
  30. Cblue: A chinese biomedical language understanding evaluation benchmark. arXiv preprint arXiv:2106.08087, 2021.
  31. Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3504–3514, 2021.
  32. Revisiting pre-trained models for Chinese natural language processing. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 657–668, Online, November 2020. Association for Computational Linguistics.
  33. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  34. Text chunking using transformation-based learning. In Natural language processing using very large corpora, pages 157–176. Springer, 1999.
  35. Wudaocorpora: A super large-scale chinese corpora for pre-training language models. AI Open, 2:65–68, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yongyu Yan (1 paper)
  2. Kui Xue (10 papers)
  3. Xiaoming Shi (40 papers)
  4. Qi Ye (67 papers)
  5. Jingping Liu (18 papers)
  6. Tong Ruan (22 papers)
Citations (1)