SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials (2404.03977v1)
Abstract: This paper describes our submission to Task 2 of SemEval-2024: Safe Biomedical Natural Language Inference for Clinical Trials. The Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) consists of a Textual Entailment (TE) task focused on the evaluation of the consistency and faithfulness of Natural Language Inference (NLI) models applied to Clinical Trial Reports (CTR). We test 2 distinct approaches, one based on finetuning and ensembling Masked LLMs and the other based on prompting LLMs using templates, in particular, using Chain-Of-Thought and Contrastive Chain-Of-Thought. Prompting Flan-T5-large in a 2-shot setting leads to our best system that achieves 0.57 F1 score, 0.64 Faithfulness, and 0.56 Consistency.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Canada. Association for Computational Linguistics.
- Meditron-70b: Scaling medical pretraining for large language models.
- Contrastive chain-of-thought prompting. ArXiv, abs/2311.09277.
- Scaling instruction-finetuned language models.
- lasigeBioTM at SemEval-2023 task 7: Improving natural language inference baseline systems with domain ontologies. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 10–15, Toronto, Canada. Association for Computational Linguistics.
- Mistral 7b.
- SemEval-2024 task 2: Safe biomedical natural language inference for clinical trials. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). Association for Computational Linguistics.
- NLI4CT: Multi-evidence natural language inference for clinical trial reports. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16745–16764, Singapore. Association for Computational Linguistics.
- SemEval-2023 task 7: Multi-evidence natural language inference for clinical trial data. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 2216–2226, Toronto, Canada. Association for Computational Linguistics.
- Kamal Raj Kanakarajan and Malaikannan Sankarasubbu. 2023. Saama AI research at SemEval-2023 task 7: Exploring the capabilities of flan-t5 for multi-evidence natural language inference in clinical trial data. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 995–1003, Toronto, Canada. Association for Computational Linguistics.
- Green algorithms: Quantifying the carbon footprint of computation. Advanced Science, 8(12).
- A comparative study of pretrained language models for long clinical text. Journal of the American Medical Informatics Association, 30(2):340–347.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. In International Conference on Learning Representations.
- MUFFIN: Curating multi-faceted instructions for improving instruction following. In The Twelfth International Conference on Learning Representations.
- A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 216–223, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Bhavish Pahwa and Bhavika Pahwa. 2023. BpHigh at SemEval-2023 task 7: Can fine-tuned cross-encoders outperform GPT-3.5 in NLI tasks on clinical trial data? In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1936–1944, Toronto, Canada. Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- Stanford MLab at SemEval 2023 task 7: Neural methods for clinical trial report NLI. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1769–1775, Toronto, Canada. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models.
- Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nature Medicine, 29.
- Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
- Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.
- Mathilde Aguiar (2 papers)
- Pierre Zweigenbaum (9 papers)
- Nona Naderi (5 papers)