BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain Abstraction (2401.14166v3)
Abstract: As a novel and effective fine-tuning paradigm based on large-scale pre-trained LLMs (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the perspective of distribution analyses, we disclose that the intrinsic issues behind the phenomenon are the over-multitudinous conceptual knowledge contained in PLMs and the abridged knowledge for target downstream domains, which jointly result in that PLMs mis-locate the knowledge distributions corresponding to the target domains in the universal knowledge embedding space. To this end, we intuitively explore to approximate the unabridged target domains of downstream tasks in a debiased manner, and then abstract such domains to generate discriminative prompts, thereby providing the de-ambiguous guidance for PLMs. Guided by such an intuition, we propose a simple yet effective approach, namely BayesPrompt, to learn prompts that contain the domain discriminative information against the interference from domain-irrelevant knowledge. BayesPrompt primitively leverages known distributions to approximate the debiased factual distributions of target domains and further uniformly samples certain representative features from the approximated distributions to generate the ultimate prompts for PLMs. We provide theoretical insights with the connection to domain adaptation. Empirically, our method achieves state-of-the-art performance on benchmarks.
- Tacred revisited: A thorough evaluation of the tacred relation extraction task. arXiv preprint arXiv:2004.14855, 2020.
- Pada: A prompt-based autoregressive approach for adaptation to unseen domains. arXiv preprint arXiv:2102.12206, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Relation extraction as open-book examination: Retrieval-enhanced prompt tuning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2443–2448, 2022a.
- Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web Conference 2022, pp. 2778–2788, 2022b.
- Semi-supervised sequence learning. Advances in neural information processing systems, 28, 2015.
- Prompt-learning for fine-grained entity typing. arXiv preprint arXiv:2108.10604, 2021.
- Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3816–3830, Online, August 2021. Association for Computational Linguistics.
- Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332, 2021.
- Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192, 2022.
- W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970.
- James J Heckman. Sample selection bias as a specification error. Econometrica: Journal of the econometric society, pp. 153–161, 1979.
- Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. arXiv preprint arXiv:1911.10422, 2019.
- Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. arXiv preprint arXiv:2108.02035, 2021.
- Instance-wise prompt tuning for pretrained language models. arXiv preprint arXiv:2206.01958, 2022.
- Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020.
- On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054, 2022.
- Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10285–10295, 2019.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
- Kipt: Knowledge-injected prompt tuning for event detection. In Proceedings of the 29th International Conference on Computational Linguistics, pp. 1943–1952, 2022a.
- How pre-trained language models capture factual knowledge? a causal-inspired analysis. arXiv preprint arXiv:2203.16747, 2022b.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Kept: Knowledge enhanced prompt tuning for event causality identification. Knowledge-Based Systems, 259:110064, 2023a.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023b.
- Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
- Xprompt: Exploring the extreme of prompt tuning. arXiv preprint arXiv:2210.04457, 2022.
- Knowledge enhanced contextual word representations. arXiv preprint arXiv:1909.04164, 2019.
- Douglas A Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009.
- Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676, 2020.
- Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
- Matching the blanks: Distributional similarity for relation learning. arXiv preprint arXiv:1906.03158, 2019.
- Re-tacred: Addressing shortcomings of the tacred dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 13843–13850, 2021.
- Knowledge-guided prompt learning for few-shot text classification. Electronics, 12(6):1486, 2023.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning. Advances in Neural Information Processing Systems, 34:16158–16170, 2021a.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021b.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Luke: Deep contextualized entity representations with entity-aware self-attention. arXiv preprint arXiv:2010.01057, 2020.
- Bayesian model-agnostic meta-learning. Advances in neural information processing systems, 31, 2018.
- Position-aware attention and supervised data improve slot filling. In Conference on Empirical Methods in Natural Language Processing, 2017.
- An improved baseline for sentence-level relation extraction. arXiv preprint arXiv:2102.01373, 2021.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Jiangmeng Li (43 papers)
- Fei Song (41 papers)
- Yifan Jin (12 papers)
- Wenwen Qiang (55 papers)
- Changwen Zheng (60 papers)
- Fuchun Sun (127 papers)
- Hui Xiong (244 papers)