Pushing the Limits of ChatGPT on NLP Tasks (2306.09719v2)
Abstract: Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors: (1) token limit in the prompt does not allow for the full utilization of the supervised datasets; (2) mismatch between the generation nature of ChatGPT and NLP tasks; (3) intrinsic pitfalls of LLMs models, e.g., hallucination, overly focus on certain keywords, etc. In this work, we propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks. Our proposed modules include (1) a one-input-multiple-prompts strategy that employs multiple prompts for one input to accommodate more demonstrations; (2) using fine-tuned models for better demonstration retrieval; (3) transforming tasks to formats that are more tailored to the generation nature; (4) employing reasoning strategies that are tailored to addressing the task-specific complexity; (5) the self-verification strategy to address the hallucination issue of LLMs; (6) the paraphrase strategy to improve the robustness of model predictions. We conduct experiments on 21 datasets of 10 representative NLP tasks, including question answering, commonsense reasoning, natural language inference, sentiment analysis, named entity recognition, entity-relation extraction, event extraction, dependency parsing, semantic role labeling, and part-of-speech tagging. Using the proposed assemble of techniques, we are able to significantly boost the performance of ChatGPT on the selected NLP tasks, achieving performances comparable to or better than supervised baselines, or even existing SOTA performances.
- David Ahn. 2006. The stages of event extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, pages 1–8.
- Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo.
- The winograd schema challenge and reasoning about correlation. In 2015 AAAI Spring Symposium Series.
- Opinion mining and sentiment analysis. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pages 452–455.
- Abcdm: An attention-based bidirectional cnn-rnn deep model for sentiment analysis. Future Gener. Comput. Syst., 115:279–294.
- T. Brants. 2000. Tnt - a statistical part-of-speech tagger. ArXiv, cs.CL/0003055.
- Eric Brill. 1992. A simple rule-based part of speech tagger. In Human Language Technology - The Baltic Perspectiv.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- e-snli: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems, 31.
- Xavier Carreras and Lluís Màrquez i Villodre. 2005. Introduction to the conll-2005 shared task: Semantic role labeling. In Conference on Computational Natural Language Learning.
- Neural natural language inference models enhanced with external knowledge. In Annual Meeting of the Association for Computational Linguistics.
- Enhanced lstm for natural language inference. In Annual Meeting of the Association for Computational Linguistics.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. ArXiv, abs/2211.12588.
- Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 167–176.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Alebachew Chiche and Betselot Yitagesu. 2022. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data, 9:1–25.
- Jason P. C. Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4:357–370.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
- Supervised learning of universal sentence representations from natural language inference data. ArXiv, abs/1705.02364.
- Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
- Case-based reasoning for natural language queries over knowledge bases. arXiv preprint arXiv:2104.08762.
- The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pages 107–124.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Cícero Nogueira dos Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In International Conference on Machine Learning.
- Timothy Dozat and Christopher D Manning. 2016. Deep biaffine attention for neural dependency parsing. arXiv preprint arXiv:1611.01734.
- Timothy Dozat and Christopher D. Manning. 2018. Simpler but more accurate semantic dependency parsing. In Annual Meeting of the Association for Computational Linguistics.
- Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075.
- Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.
- Semantic role labeling with neural network factors. In Conference on Empirical Methods in Natural Language Processing.
- Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720.
- Dependency parsing as mrc-based span-span prediction. arXiv preprint arXiv:2105.07654.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Part-of-speech tagging for twitter: Annotation, features, and experiments. In Annual Meeting of the Association for Computational Linguistics.
- Demystifying prompts in language models via perplexity estimation. arXiv preprint arXiv:2212.04037.
- The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In CoNLL Shared Task.
- Ptr: Prompt tuning with rules for text classification. arXiv preprint arXiv:2105.11259.
- Jointly predicting predicates and arguments in neural semantic role labeling. In Annual Meeting of the Association for Computational Linguistics.
- Deep semantic role labeling: What works and what’s next. In Annual Meeting of the Association for Computational Linguistics.
- Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Conference on Empirical Methods in Natural Language Processing.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Span-based semantic role labeling with argument pruning and second-order inference. In AAAI Conference on Artificial Intelligence.
- Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
- A neural layered model for nested named entity recognition. In North American Chapter of the Association for Computational Linguistics.
- Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
- Simple and accurate dependency parsing using bidirectional lstm feature representations. Transactions of the Association for Computational Linguistics, 4:313–327.
- Tassilo Klein and Moin Nabi. 2020. Contrastive self-supervised learning for commonsense reasoning. ArXiv, abs/2005.00669.
- Large language models are zero-shot reasoners. ArXiv.
- Openassistant conversations - democratizing large language model alignment. ArXiv, abs/2304.07327.
- Neural architectures for named entity recognition. In North American Chapter of the Association for Computational Linguistics.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Incremental joint extraction of entity mentions and relations. In Annual Meeting of the Association for Computational Linguistics.
- Joint event extraction via structured prediction with global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 73–82.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- A unified mrc framework for named entity recognition. ArXiv, abs/1910.11476.
- Dice loss for data-imbalanced nlp tasks. ArXiv, abs/1911.02855.
- Entity-relation extraction as multi-turn question answering. arXiv preprint arXiv:1905.05529.
- On the advance of making language models better reasoners. ArXiv, abs/2206.02336.
- Kagnet: Knowledge-aware graph networks for commonsense reasoning. In Conference on Empirical Methods in Natural Language Processing.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Event detection via gated multilingual attention mechanism. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
- Event detection via gated multilingual attention mechanism. In AAAI Conference on Artificial Intelligence.
- Towards improving neural named entity recognition with gazetteers. In Annual Meeting of the Association for Computational Linguistics.
- Learning natural language inference using bidirectional lstm model and inner-attention. ArXiv, abs/1605.09090.
- Kg-bart: Knowledge graph-augmented bart for generative commonsense reasoning. ArXiv, abs/2009.12677.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- S2orc: The semantic scholar open research corpus. arXiv preprint arXiv:1911.02782.
- The flan collection: Designing data and methods for effective instruction tuning. ArXiv, abs/2301.13688.
- Hierarchical contextualized representation for named entity recognition. In AAAI Conference on Artificial Intelligence.
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354.
- Stack-pointer networks for dependency parsing. In Annual Meeting of the Association for Computational Linguistics.
- Sentic lstm: a hybrid network for targeted aspect-based sentiment analysis. Cognitive Computation, 10:639–650.
- Learning word vectors for sentiment analysis. In Annual Meeting of the Association for Computational Linguistics.
- Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. ArXiv, abs/1703.04826.
- Building a large annotated corpus of english: The penn treebank.
- Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. ArXiv, abs/1902.01007.
- Online large-margin training of dependency parsers. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pages 91–98.
- Distant supervision for relation extraction without labeled data. In Annual Meeting of the Association for Computational Linguistics.
- Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. ArXiv, abs/1601.00770.
- Alireza Mohammadshahi and James Henderson. 2019. Graph-to-graph transformer for transition-based dependency parsing. arXiv preprint arXiv:1911.03561.
- Natural language inference by tree-based convolution and heuristic matching. arXiv: Computation and Language.
- Joint event extraction via recurrent neural networks. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 300–309.
- Thien Huu Nguyen and Ralph Grishman. 2015. Event detection and domain adaptation with convolutional neural networks. In Annual Meeting of the Association for Computational Linguistics.
- Thien Huu Nguyen and Ralph Grishman. 2016. Modeling skip-grams for event detection with convolutional neural networks. In Conference on Empirical Methods in Natural Language Processing.
- Thien Huu Nguyen and Ralph Grishman. 2018. Graph convolutional networks with argument-aware pooling for event detection. In AAAI Conference on Artificial Intelligence.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- A span selection model for semantic role labeling. In Conference on Empirical Methods in Natural Language Processing.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Improved part-of-speech tagging for online conversational text with word clusters. In North American Chapter of the Association for Computational Linguistics.
- Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2:1–135.
- A decomposable attention model for natural language inference. ArXiv, abs/1606.01933.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Deep contextualized word representations. In North American Chapter of the Association for Computational Linguistics.
- Towards robust linguistic analysis using ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning.
- Towards robust linguistic analysis using ontonotes. In Conference on Computational Natural Language Learning.
- Measuring and narrowing the compositionality gap in language models. ArXiv.
- Universal dependency parsing from scratch. In CoNLL Shared Task.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599.
- Language models are unsupervised multitask learners.
- Language models are unsupervised multitask learners. OpenAI blog.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Explain yourself! leveraging language models for commonsense reasoning. ArXiv, abs/1906.02361.
- Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361.
- Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822.
- A. Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing.
- Michael Roth and Mirella Lapata. 2016. Neural semantic role labeling with dependency path embeddings. ArXiv, abs/1605.07515.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
- Erik F Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050.
- Multitask prompted training enables zero-shot task generalization. ArXiv, abs/2110.08207.
- Introductory tutorial: Commonsense reasoning for natural language processing. ACL 2020, page 27.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Bidirectional attention flow for machine comprehension. ArXiv, abs/1611.01603.
- Ranking-enhanced unsupervised sentence representation learning. arXiv preprint arXiv:2209.04333.
- Learning named entity tagger using domain-specific dictionary. In Conference on Empirical Methods in Natural Language Processing.
- Synthetic prompting: Generating chain-of-thought demonstrations for large language models. ArXiv.
- Selective annotation makes language models better few-shot learners. arXiv preprint arXiv:2209.01975.
- Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models. ArXiv.
- Text classification via large language models. arXiv preprint arXiv:2305.08377.
- Sentence similarity based on contexts. Transactions of the Association for Computational Linguistics, 10:573–588.
- Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 34.
- Chinesebert: Chinese pretraining enhanced by glyph and pinyin information. arXiv preprint arXiv:2106.16038.
- Lexicon-based methods for sentiment analysis. Computational Linguistics, 37:267–307.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
- Deep semantic role labeling with self-attention. ArXiv, abs/1712.01586.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models.
- Kristina Toutanvoa and Christopher D. Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Conference on Empirical Methods in Natural Language Processing.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Trieu H. Trinh and Quoc V. Le. 2018. A simple method for commonsense reasoning. ArXiv, abs/1806.02847.
- Attention is all you need. Advances in neural information processing systems, 30.
- Gpt-re: In-context learning for relation extraction using large language models. arXiv preprint arXiv:2305.02105.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. ArXiv.
- An mrc framework for semantic role labeling. arXiv preprint arXiv:2109.06660.
- Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. ArXiv, abs/1510.06168.
- k𝑘kitalic_k nn-ner: Named entity recognition with nearest neighbor search. arXiv preprint arXiv:2203.17103.
- Gnn-sl: Sequence labeling based on nearest examples via gnn. arXiv preprint arXiv:2212.02017.
- Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
- Shuohang Wang and Jing Jiang. 2015. Learning natural language inference with lstm. ArXiv, abs/1512.08849.
- Gated self-matching networks for reading comprehension and question answering. In Annual Meeting of the Association for Computational Linguistics.
- Self-instruct: Aligning language model with self generated instructions. ArXiv, abs/2212.10560.
- Cleve: Contrastive pre-training for event extraction. ArXiv, abs/2105.14485.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- Recognizing contextual polarity in phrase-level sentiment analysis. In Human Language Technology - The Baltic Perspectiv.
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080.
- Dynamic coattention networks for question answering. ArXiv, abs/1611.01604.
- mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
- Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. In Annual Meeting of the Association for Computational Linguistics.
- Tener: Adapting transformer encoder for named entity recognition. ArXiv, abs/1911.04474.
- Consert: A contrastive framework for self-supervised sentence representation transfer. ArXiv, abs/2105.11741.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
- Robust multilingual part-of-speech tagging via adversarial training. In North American Chapter of the Association for Computational Linguistics.
- Complementary explanations for effective in-context learning. ArXiv.
- Qanet: Combining local convolution with global self-attention for reading comprehension. ArXiv, abs/1804.09541.
- Bidirectional transition-based dependency parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7434–7441.
- Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Chinese open instruction generalist: A preliminary release. ArXiv, abs/2304.07987.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Semantic role labeling as dependency parsing: Exploring latent tree structures inside arguments. ArXiv, abs/2110.06865.
- Yue Zhang and Jie Yang. 2018. Chinese ner using lattice lstm. ArXiv, abs/1805.02023.
- Stack-based multi-layer attention for transition-based dependency parsing. In Conference on Empirical Methods in Natural Language Processing.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
- Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
- Least-to-most prompting enables complex reasoning in large language models. ArXiv.
- Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using recurrent neural networks. In Annual Meeting of the Association for Computational Linguistics.
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27.
- Xiaofei Sun (36 papers)
- Linfeng Dong (3 papers)
- Xiaoya Li (42 papers)
- Zhen Wan (42 papers)
- Shuhe Wang (18 papers)
- Tianwei Zhang (199 papers)
- Jiwei Li (137 papers)
- Fei Cheng (46 papers)
- Lingjuan Lyu (131 papers)
- Fei Wu (317 papers)
- Guoyin Wang (108 papers)