Text Simplification of Scientific Texts for Non-Expert Readers (2307.03569v1)
Abstract: Reading levels are highly individual and can depend on a text's language, a person's cognitive abilities, or knowledge on a topic. Text simplification is the task of rephrasing a text to better cater to the abilities of a specific target reader group. Simplification of scientific abstracts helps non-experts to access the core information by bypassing formulations that require domain or expert knowledge. This is especially relevant for, e.g., cancer patients reading about novel treatment options. The SimpleText lab hosts the simplification of scientific abstracts for non-experts (Task 3) to advance this field. We contribute three runs employing out-of-the-box summarization models (two based on T5, one based on PEGASUS) and one run using ChatGPT with complex phrase identification.
- An exploration on-demand article recommender system for cancer patients information provisioning, in: FLAIRS 2021, 2021. URL: https://doi.org/10.32473/flairs.v34i1.128339.
- S. S. Al-Thanyyan, A. M. Azmi, Automated text simplification: A survey, ACM Comput. Surv. 54 (2021). URL: https://doi.org/10.1145/3442695.
- K. C. Sheang, H. Saggion, Controllable sentence simplification with a unified text-to-text transfer transformer, in: INLG 2021, ACL, Aberdeen, Scotland, UK, 2021, pp. 341–352. URL: https://aclanthology.org/2021.inlg-1.38.
- S. Agrawal, M. Carpuat, How to control text simplification? an empirical study of control tokens for meaning preserving controlled simplification, CoRR abs/2305.14993 (2023). URL: https://doi.org/10.48550/arXiv.2305.14993. arXiv:2305.14993.
- Overview of simpletext - clef-2023 track on automatic simplification of scientific texts, in: CLEF 2023, CEUR Workshop Proceedings, CEUR-WS.org, 2023.
- Automatic simplification of scientific texts: Simpletext lab at CLEF-2022, in: ECIR 2022, volume 13186 of LNCS, Springer, 2022, pp. 364–373. URL: https://doi.org/10.1007/978-3-030-99739-7_46.
- Overview of simpletext 2021 - CLEF workshop on text simplification for scientific information access, in: CLEF 2021, volume 12880 of LNCS, Springer, 2021, pp. 432–449. URL: https://doi.org/10.1007/978-3-030-85251-1_27.
- Using a pre-trained simplet5 model for text simplification in a limited corpus, in: CLEF2022, volume 3180 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 2826–2831. URL: https://ceur-ws.org/Vol-3180/paper-241.pdf.
- A. Rubio, P. Martínez, HULAT-UC3M at simpletext@clef-2022: Scientific text simplification using BART, in: CLEF 2022, volume 3180 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 2845–2851. URL: https://ceur-ws.org/Vol-3180/paper-243.pdf.
- Source sentence simplification for statistical machine translation, Comput. Speech Lang. 45 (2017) 221–235. URL: https://doi.org/10.1016/j.csl.2016.12.001.
- Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res. 21 (2020) 140:1–140:67. URL: http://jmlr.org/papers/v21/20-074.html.
- Language models are few-shot learners, CoRR abs/2005.14165 (2020). URL: https://arxiv.org/abs/2005.14165. arXiv:2005.14165.
- Language models are unsupervised multitask learners, 2019. URL: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf.
- S. Wu, H. Huang, CYUT team2 simpletext shared task report in CLEF-2022, in: CLEF 2022, volume 3180 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 2862–2866. URL: https://ceur-ws.org/Vol-3180/paper-246.pdf.
- L. Talec-Bernard, Is using an AI to simplify a scientific text really worth it?, in: CLEF 2022, volume 3180 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 2858–2861. URL: https://ceur-ws.org/Vol-3180/paper-245.pdf.
- BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: ACL 2020, 2020, pp. 7871–7880. URL: https://doi.org/10.18653/v1/2020.acl-main.703.
- University of amsterdam at the CLEF 2022 simpletext track, in: CLEF 2022, volume 3180 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 2832–2844. URL: https://ceur-ws.org/Vol-3180/paper-242.pdf.
- How language model hallucinations can snowball, CoRR abs/2305.13534 (2023). URL: https://doi.org/10.48550/arXiv.2305.13534.
- Learning rich representation of keyphrases from text, in: NAACL-HLT (Findings) 2022, ACL, 2022, pp. 891–906. URL: https://doi.org/10.18653/v1/2022.findings-naacl.67.
- Colbertv2: Effective and efficient retrieval via lightweight late interaction, arXiv preprint arXiv:2112.01488 (2021). URL: https://arxiv.org/abs/2112.01488.
- C. Macdonald, N. Tonellotto, Declarative experimentation in information retrieval using pyterrier, in: ICTIR 2020, ACM, 2020, pp. 161–168. URL: https://doi.org/10.1145/3409256.3409829.
- PEGASUS: pre-training with extracted gap-sentences for abstractive summarization, in: ICML 2020, volume 119 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 11328–11339. URL: http://proceedings.mlr.press/v119/zhang20ae.html.
- R. F. Flesch, A new readability yardstick., The Journal of applied psychology 32 3 (1948) 221–33. URL: https://doi.org/10.1037/h0057532.
- V. Demberg, F. Keller, Data from eye-tracking corpora as evidence for theories of syntactic processing complexity, Cognition 109 (2008) 193–210. URL: https://doi.org/10.1016/j.cognition.2008.07.008.
- Controllable sentence simplification, in: LREC 2020, European Language Resources Association, 2020, pp. 4689–4698. URL: https://aclanthology.org/2020.lrec-1.577/.
- CLEF 2023 simpletext track - what happens if general users search scientific texts?, in: ECIR 2023, volume 13982 of LNCS, Springer, 2023, pp. 536–545. URL: https://doi.org/10.1007/978-3-031-28241-6_62.