A Library for Automatic Natural Language Generation of Spanish Texts (2405.17280v1)
Abstract: In this article we present a novel system for natural language generation (NLG) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the NLG task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting NLG library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.
- Andersen (2016). Fairy tales of Hans Christian Andersen (Spanish). Available 28/11/2016 at https://es.wikisource.org/wiki/Cuentos_clásicos_ para_niños.
- Anonymous (2016). Traditional Spanish tales. Available 28/11/2016 at http://loscuentostradicionales.blogspot.com.es.
- Ancora-verb: A lexical resource for the semantic annotation of corpora. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proc. of the 6th International Conference on LREC. Marrakech, Morocco: ELRA.
- Appelt, D. E. (1985). Planning English Sentences. New York, NY, USA: Cambridge University Press.
- Freeling 1.3: Syntactic and semantic services in an open-source NLP library. In Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy: ELRA.
- Modeling local coherence: An entity-based approach. Computational Linguistics, 34, 1–34.
- Bateman, J. A. (1997). Enabling technology for multilingual natural language generation: The KPML development environment. Nat. Lang. Eng., 3, 15–55. URL: http://dx.doi.org/10.1017/S1351324997001514. doi:10.1017/S1351324997001514.
- A method towards the fully automatic merging of lexical resources. In Proc. of Workshop on Language Resources, Technology and Services in the Sharing Paradigm (pp. 8–15). ACL.
- Belz, A. (2008). Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat. Lang. Eng., 14, 431–455. URL: http://dx.doi.org/10.1017/S1351324907004664. doi:10.1017/S1351324907004664.
- Robust pcfg-based generation using automatically acquired lfg approximations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 1033–1040). Association for Computational Linguistics.
- La conjugación de verbos en español y su morfología. LULU.
- Towards automatic generation of natural language generation systems. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1–7). Association for Computational Linguistics.
- Method and apparatus for building an intelligent automated assistant. URL: https://www.google.com/patents/US20070100790 uS Patent App. 11/518,292.
- Unifying lexical resources. In Proc. of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes.
- Dale, R. (1989). Cooking up referring expressions. In Proceedings of the 27th annual meeting on Association for Computational Linguistics (pp. 68–75). Association for Computational Linguistics.
- Dale, R. (1992). Generating Referring Expressions: Constructing Descriptions in a Domain of Objects and Processes. ACL-MIT Series in Natural Language Processing. MIT Press.
- Using natural language generation in automatic route description. Journal of Research and practice in Information Technology, 37, 89.
- Computational interpretations of the gricean maxims in the generation of referring expressions. Cognitive science, 19, 233–263.
- Constructions pronominales dans Dicovalence et le lexique-grammaire–intégration dans le Lefff. In Proc. of the 27th Lexicon-Grammar Conference. L’Aquila, Italy. URL: https://hal.inria.fr/inria-00524741.
- Assessing the reliability and validity of expert interviews. European Union Politics, 6, 315–337.
- Training a natural language generator from unaligned data. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 451–461). volume 1.
- Fiedler, A. (2005). Natural language proof explanation. In Mechanizing Mathematical Reasoning, Essays in Honor of Jörg H. Siekmann on the Occasion of His 60th Birthday (pp. 342–363). URL: https://doi.org/10.1007/978-3-540-32254-2_20. doi:10.1007/978-3-540-32254-2_20.
- Using speakers’ referential intentions to model early cross-situational word learning. Psychological science, 20, 578–585.
- Automatic Natural Language Generation Applied to Alternative and Augmentative Communication for Online Video Content Services using SimpleNLG for Spanish. In Proceedings of the 15th Web for All Conference on The Future of Accessible Work. ACM. Accepted, waiting for publication.
- Multiple adjunction in feature-based tree-adjoining grammar. Computational Linguistics, 41, 41–70.
- Gavilanes, M. F. (2012). Adquisición y representación del conocimiento mediante procesamiento del lenguaje natural. Ph.D. thesis Universidade da Coruña.
- Gervás, P. (2001). An expert system for the composition of formal spanish poetry. In Applications and Innovations in Intelligent Systems VIII: Proceedings of ES2000, the Twentieth SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence, Cambridge, December 2000 (pp. 19–32). London: Springer London. URL: https://doi.org/10.1007/978-1-4471-0275-5_2. doi:10.1007/978-1-4471-0275-5_2.
- Using natural-language processing to produce weather forecasts. IEEE Expert: Intelligent Systems and Their Applications, 9, 45–53. URL: https://doi.org/10.1109/64.294135. doi:10.1109/64.294135.
- Grimm (2016). Grimm’s fairy tales (Spanish). Available 28/11/2016 at http://www.grimmstories.com/es.
- Introducción a la lingüística computacional. Visor.
- Hovy, E. H. (1993). Automated discourse generation using discourse structure relations. Artificial Intelligence, 63, 341 – 385. URL: http://www.sciencedirect.com/science/article/pii/0004370293900213. doi:https://doi.org/10.1016/0004-3702(93)90021-3.
- Janssen, M. (2005). Open source lexical information network. In Proc. of the 3rd International Workshop on Generative Approaches to the Lexicon (pp. 400–410).
- Janssen, M. (2009). Lexical functions vs. inflectional functions. In Proc. 4th International Conference on Meaning-Text Theory (pp. 189–198).
- Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th International Natural Language Generation Conference INLG ’10 (pp. 243–250). Stroudsburg, PA, USA: Association for Computational Linguistics. URL: http://dl.acm.org/citation.cfm?id=1873738.1873776.
- Krippendorff, K. (2011). Computing krippendorff’s alpha-reliability.
- Krippendorff, K. (2012). Content analysis: An introduction to its methodology. Sage.
- Content selection challenge - University of Aberdeen entry. In Proceedings of the 14th European Workshop on Natural Language Generation (pp. 208–209). Sofia, Bulgaria: Association for Computational Linguistics. URL: http://www.aclweb.org/anthology/W13-2133.
- Imitation learning for language generation from unaligned data. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1101–1112).
- Halogen statistical sentence generator. In Proceedings of the Association for Computational Linguistics’02 Demonstrations Session, Philadelphia (pp. 102–103).
- Shed: An online diet counselling system. In DLSU Research Congress.
- Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 71–78). Association for Computational Linguistics.
- Automatic generation of radiology reports from images and automatic rule out of images without findings. US Patent App. 15/158,375.
- Maggiori, E. (2013). Desarrollo de una gramática para aserciones simples en español y su implementación en prolog. In XVI Concurso de Trabajos Estudiantiles (EST)-JAIIO 42 (2013).
- Personage: Personality generation for dialogue. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 496–503).
- Mann, W. (1982). Text generation. Comput. Linguist., 8, 62–69. URL: http://dl.acm.org/citation.cfm?id=972932.972935.
- McKeown, K. R. (1985). Text generation: using discourse strategies and focus constraints to generate natural language text. Cambridge University Press Cambridge [Cambridgeshire] ; New York. URL: http://www.loc.gov/catdir/enhancements/fy1001/84019889-t.html.
- What to talk about and how? selective generation using lstms with coarse-to-fine alignment. arXiv preprint arXiv:1509.00838, .
- Dialogue management using scripts. URL: https://encrypted.google.com/patents/EP1891625A1?cl=und eP Patent App. EP20,060,759,358.
- A morphological and syntactic wide-coverage lexicon for Spanish: The Leffe. In RANLP. Borovets, Bulgaria. URL: https://hal.inria.fr/inria-00616693.
- Probabilistic models for disambiguation of an hpsg-based chart generator. In Proceedings of the Ninth International Workshop on Parsing Technology (pp. 93–102). Association for Computational Linguistics.
- Towards the automatic merging of language resources. In First International Workshop on Lexical Resources: an ESSLLI 2011 Workshop; 2011 Aug 1-5; Ljubljana, SI. Ljubljana: ESSLLI; 2011. p. 70-77. ESSLLI.
- Why we need new evaluation metrics for nlg. arXiv preprint arXiv:1707.06875, .
- Freeling 2.1: Five years of open-source language processing tools. In Proc. of the 7th International Conference on LREC, 17-23, Valletta, Malta. URL: http://www.lrec-conf.org/proceedings/lrec2010/summaries/14.html.
- Freeling 3.0: Towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012). Istanbul, Turkey: ELRA.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311–318). Association for Computational Linguistics.
- A description logic ontology for fairy tale generation. In In Forth Int. Conf. on Language Resources and Evaluation: Workshop on Language Resources for Linguistic Creativity (pp. 56–61).
- Sentiment analysis of suicide notes: A shared task. Biomedical informatics insights, 5, BII–S9042.
- The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In Proceedings of the workshop on frontiers in corpus annotations ii: Pie in the sky (pp. 76–83). Association for Computational Linguistics.
- On the role of linguistic descriptions of data in the building of natural language generation systems. Fuzzy Sets Syst., 285, 31–51. URL: https://doi.org/10.1016/j.fss.2015.06.019. doi:10.1016/j.fss.2015.06.019.
- Building applied natural language generation systems. Nat. Lang. Eng., 3, 57–87. URL: http://dx.doi.org/10.1017/S1351324997001502. doi:10.1017/S1351324997001502.
- Building Natural Language Generation Systems. New York, NY, USA: Cambridge University Press.
- Automatic-generation of technical documentation. Applied Artificial Intelligence, 9, 259–287.
- Lessons from a Failure: Generating Tailored Smoking Cessation Letters. Artificial Intelligence, 144, 41–58. URL: http://www.csd.abdn.ac.uk/{̃}ereiter/papers/aij03.pdf.
- Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167, 137–169.
- Introducción a la gramática generativa. Gredos.
- Sager, N. (1967). Syntactic analysis of natural language. Advances in Computers, 8, 153–188. URL: https://doi.org/10.1016/S0065-2458(08)60696-8. doi:10.1016/S0065-2458(08)60696-8.
- Sagot, B. (2010). The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proc. of the 7th International Conference on LREC. Valletta, Malta: ELRA.
- Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 ACL ’09 (pp. 208–216). Stroudsburg, PA, USA: Association for Computational Linguistics. URL: http://dl.acm.org/citation.cfm?id=1687878.1687909.
- Schank, R. C. (1975). Conceptual Information Processing. New York, NY, USA: Elsevier Science Inc.
- Evaluating evaluation methods for generation in the presence of variation. In Computational Linguistics and Intelligent Text Processing: 6th International Conference, CICLing 2005, Mexico City, Mexico, February 13-19, 2005. Proceedings (pp. 341–351). Berlin, Heidelberg: Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-540-30586-6_38. doi:10.1007/978-3-540-30586-6_38.
- Tarjan, R. (1971). Depth-first search and linear graph algorithms. In Switching and Automata Theory, 1971., 12th Annual Symposium on (pp. 114–121). IEEE.
- Integrating language and vision to generate natural language descriptions of videos in the wild. In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014) (pp. 1218–1227). Dublin, Ireland. URL: http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=127457.
- Marquis: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24, 914–952.
- Generating tailored, comparative descriptions with contextually appropriate intonation. Computational Linguistics, 36, 159–201.
- Generating basic skills reports for low-skilled readers. Natural Language Engineering, 14, 495–525.