EaSyGuide : ESG Issue Identification Framework leveraging Abilities of Generative Large Language Models (2306.06662v2)
Abstract: This paper presents our participation in the FinNLP-2023 shared task on multi-lingual environmental, social, and corporate governance issue identification (ML-ESG). The task's objective is to classify news articles based on the 35 ESG key issues defined by the MSCI ESG rating guidelines. Our approach focuses on the English and French subtasks, employing the CerebrasGPT, OPT, and Pythia models, along with the zero-shot and GPT3Mix Augmentation techniques. We utilize various encoder models, such as RoBERTa, DeBERTa, and FinBERT, subjecting them to knowledge distillation and additional training. Our approach yielded exceptional results, securing the first position in the English text subtask with F1-score 0.69 and the second position in the French text subtask with F1-score 0.78. These outcomes underscore the effectiveness of our methodology in identifying ESG issues in news articles across different languages. Our findings contribute to the exploration of ESG topics and highlight the potential of leveraging advanced LLMs for ESG issue identification.
- Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models.
- Pythia: A suite for analyzing large language models across training and scaling.
- Multi-lingual esg issue identification. In Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing (FinNLP) and the Second Multimodal AI For Financial Forecasting (Muffin).
- Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster.
- The social origins of esg?: An analysis of innovest and kld. SSRN Electronic Journal, 40:1–40. Last revised: September 14, 2019.
- Larger-scale transformers for multilingual masked language modeling.
- Esg and financial performance: aggregated evidence from more than 2000 empirical studies. Journal of sustainable finance & investment, pages 210–233.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
- Deberta: Decoding-enhanced bert with disentangled attention.
- Juyeon Kang and Ismail El Maarouf. 2022. FinSim4-ESG shared task: Learning semantic similarities for the financial domain. extended edition to ESG insights. In Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP), pages 211–217, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization.
- Dymitr Ruta and Bogdan Gabrys. 2005. Classifier selection for majority voting. Information Fusion, 6(1):63–81. Diversity in Multiple Classifier Systems.
- Zero-shot learning – a comprehensive evaluation of the good, the bad and the ugly.
- GPT3Mix: Leveraging large-scale language models for text augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Opt: Open pre-trained transformer language models.