An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code (2307.02443v1)
Abstract: LLMs trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained LLMs for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on LLMs for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use LLMs to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint.
- “The FAIR Guiding Principles for Scientific Data Management and Stewardship” In Scientific Data 3.1, 2016, pp. 160018 DOI: 10/bdd4
- “Towards FAIR Principles for Research Software” In Data Science 3.1, 2020, pp. 37–59 DOI: 10/gg66q6
- “VideoBERT: A Joint Model for Video and Language Representation Learning” In Int’l Conf. Comp. Vision, 2019, pp. 7464–7473
- “Language Models Are Few-Shot Learners” In Int’l Conf. Neural Information Processing Sys., NIPS’20 Red Hook, NY, USA: Curran, 2020, pp. 1877–1901
- Emma Strubell, Ananya Ganesh and Andrew McCallum “Energy and Policy Considerations for Deep Learning in NLP” In Meeting of the Association for Computational Linguistics, 2019, pp. 3645–3650 DOI: 10/ggbgzx
- “Green AI” In Comm. ACM 63.12, 2020, pp. 54–63 DOI: 10/ghvhs3
- “Green AI: Do Deep Learning Frameworks Have Different Costs?” In Int’l Conf. Softw. Eng., 2022, pp. 1082–1094 DOI: 10/gq4wm2
- Sindhu Tipirneni, Ming Zhu and Chandan K. Reddy “StructCoder: Structure-Aware Transformer for Code Generation” arXiv, 2022 DOI: 10.48550/arXiv.2206.05239
- “Distributed Representations of Words and Phrases and Their Compositionality” In Int’l Conf. Neural Information Processing Sys. Lake Tahoe, Nevada: Curran, 2013, pp. 3111–3119
- “Evaluating Large Language Models Trained on Code” arXiv, 2021 DOI: 10.48550/arXiv.2107.03374
- “CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation” In Neural Information Processing Sys. Track on Datasets and Benchmarks, 2021
- “HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing” arXiv, 2020 DOI: 10.48550/arXiv.2002.05829
- “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model” arXiv, 2022 DOI: 10.48550/arXiv.2211.05100
- Daniel S Katz, Fotis E Psomopoulos and Leyla Jael Castro “Working towards Understanding the Role of FAIR for Machine Learning.” In Ws. Data and Research Objects Management for Linked Open Science, 2021, pp. 1–6
- “On the Reproducibility and Replicability of Deep Learning in Software Engineering” In ACM Trans. Softw. Eng. and Methodology 31.1, 2021, pp. 15:1–15:46 DOI: 10/gq4s96
- “Towards a Software Sustainability-Quality Model: Insights from a Multi-Case Study” In Int’l Conf. Research Challenges in Information Science, 2019 DOI: 10/gsdpcb
- “Framing Sustainability as a Property of Software Quality” In Comm. ACM 58.10, 2015, pp. 70–78 DOI: 10/f3ppcb
- “A Software Sustainability-Quality Model”, 2018
- Bill Tomlinson “Greening through IT: Information Technology for Environmental Sustainability”, 2010 DOI: 10.7551/mitpress/8261.001.0001
- “Enhancing Software Engineering Processes towards Sustainable Software Product Design” In Integration of Environmental Information in Europe Shaker Verlag, 2010
- Abram Hindle “Green Mining: A Methodology of Relating Software Change and Configuration to Power Consumption” In Emp. Softw. Eng. 20.2 Springer, 2015, pp. 374–409 DOI: 10/f7bbjj
- “Empirical Evaluation of the Energy Impact of Refactoring Code Smells” In Int’l Conf. ICT for Sustainability 52, 2018, pp. 365–383 DOI: 10/grbc94
- “The GREENSOFT Model: A Reference Model for Green and Sustainable Software and Its Engineering” In Sustainable Computing: Informatics and Systems 1.4, 2011, pp. 294–304 DOI: 10/b9scx4
- Matias Martinez, Silverio Martínez-Fernández and Xavier Franch “Energy Consumption of Automated Program Repair” arXiv, 2022 DOI: 10.48550/arXiv.2211.12104
- “Data-Centric Green AI: An Exploratory Empirical Study” In Int’l Conf. ICT for Sustainability, 2022, pp. 1–11
- “Estimation of Energy Consumption in Machine Learning” In J. Parallel and Distributed Computing 134, 2019, pp. 75–88 DOI: 10/ggbx92
- “Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs” In IEEE Int’l Conf.s on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications Atlanta, GA, USA: IEEE, 2016, pp. 477–484 DOI: 10/grdjz2
- “Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning” In J. Machine Learning Research 21.1 JMLR, 2022, pp. 248:10039–248:10081
- Maria Gutierrez, Ma Angeles Moraga and Felix Garcia “Analysing the Energy Impact of Different Optimisations for Machine Learning Models” In Int’l Conf. ICT for Sustainability Plovdiv, Bulgaria: IEEE, 2022, pp. 46–52 DOI: 10/grb825
- Eva Garcia-Martin, Niklas Lavesson and Håkan Grahn “Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree” In Green, Pervasive, and Cloud Computing 10232 Cham: Springer International Publishing, 2017, pp. 267–281 DOI: 10.1007/978-3-319-57186-7˙21
- Roberto Verdecchia, June Sallou and Luís Cruz “A Systematic Review of Green AI” arXiv, 2023 DOI: 10.48550/arXiv.2301.11047
- “Systematic Literature Studies: Database Searches vs. Backward Snowballing” In Int’l Symp. Empirical Softw. Eng. and Measurement New York, New York, USA: ACM, 2012, pp. 29–38 DOI: 10/f2324f
- Claes Wohlin “Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering” In Int’l Conf. Evaluation and Assessment in Softw. Eng. ACM, 2014, pp. 1–10 DOI: 10/f22cpf
- Kai Petersen, Sairam Vakkalanka and Ludwik Kuzniarz “Guidelines for Conducting Systematic Mapping Studies in Software Engineering: An Update” In Information and Softw. Technology 64, 2015, pp. 1–18 DOI: 10/f3pxzb
- “A Literature Study of Embeddings on Source Code” arXiv, 2019 DOI: 10.48550/arXiv.1904.03061
- “A Survey on Machine Learning Techniques for Source Code Analysis” arXiv, 2021 DOI: 10.48550/arXiv.2110.09610
- “A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research” In ACM Trans. Softw. Eng. and Methodology 31.2 ACM, 2022, pp. 32:1–32:58 DOI: 10/gqcrq6
- “Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code” arXiv, 2022 DOI: 10.48550/arXiv.2205.11739
- “DeepSim: Deep Learning Code Functional Similarity” In ACM J. Meeting Eur. Softw. Eng. Conf. and Symp. Found. Softw. Eng. Lake Buena Vista FL USA: ACM, 2018, pp. 141–151 DOI: 10/ghhrrs
- “Functional Code Clone Detection with Syntax and Semantics Fusion Learning” In ACM SIGSOFT Int’l Symp. Softw. Testing and Analysis Virtual Event USA: ACM, 2020, pp. 516–527 DOI: 10/ghhrrp
- Daniel DeFreez, Aditya V. Thakur and Cindy Rubio-González “Path-Based Function Embedding and Its Application to Specification Mining” arXiv, 2018 DOI: 10.48550/arXiv.1802.07779
- “Neural Detection of Semantic Code Clones Via Tree-Based Convolution” In IEEE/ACM 27th Int’l Conf. Program Comprehension Montreal, QC, Canada: IEEE, 2019, pp. 70–80 DOI: 10/gq4wj2
- Shaked Brody, Uri Alon and Eran Yahav “A Structural Model for Contextual Code Changes” arXiv, 2020 DOI: 10.48550/arXiv.2005.13209
- “Structural Language Models of Code” In Int’l Conf. Machine Learning PMLR, 2020, pp. 245–256
- “A Retrieve-and-Edit Framework for Predicting Structured Outputs” In Int’l Conf. Neural Information Processing Sys., NIPS’18 Red Hook, NY, USA: Curran, 2018, pp. 10073–10083
- “CODIT: Code Editing With Tree-Based Neural Models” In IEEE Trans. Softw. Eng. 48.4, 2022, pp. 1385–1399 DOI: 10/gnxdg4
- “IntelliCode Compose: Code Generation Using Transformer” arXiv, 2020 DOI: 10.48550/arXiv.2005.08025
- “Pythia: AI-assisted Code Completion System” In ACM SIGKDD Int’l Conf. Knowledge Discovery & Data Mining, KDD ’19 New York, NY, USA: ACM, 2019, pp. 2727–2735 DOI: 10/gf7nbt
- “Neural Sketch Learning for Conditional Program Generation” arXiv, 2018 DOI: 10.48550/arXiv.1703.05698
- “Incorporating External Knowledge through Pre-training for Natural Language to Code Generation” In Annual Meeting of the Association for Computational Linguistics Online: ACL, 2020, pp. 6045–6052 DOI: 10/gn2466
- “A Syntactic Neural Model for General-Purpose Code Generation” arXiv, 2017 DOI: 10.48550/arXiv.1704.01696
- Nan Jiang, Thibaud Lutellier and Lin Tan “CURE: Code-Aware Neural Machine Translation for Automatic Program Repair” In IEEE/ACM 43rd Int’l Conf. Softw. Eng., 2021, pp. 1161–1173 DOI: 10/gk6bg9
- Rahul Gupta, Aditya Kanade and Shirish Shevade “Deep Reinforcement Learning for Syntactic Error Repair in Student Programs” In AAAI Conf. Artificial Intelligence 33, 2019, pp. 930–937 DOI: 10/ghkh99
- “TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer” In Int’l Conf. Machine Learning 139 Virtual Event: PMLR, 2021, pp. 780–791
- “SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair” In IEEE Trans. Softw. Eng., 2019 DOI: 10/ggssk2
- “Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities” In Int’l Conf. Softw. Analysis, Evolution and Reengineering, 2019, pp. 479–490 DOI: 10/ghbg28
- “CC2Vec: Distributed Representations of Code Changes” In Int’l Conf. Softw. Eng., 2020, pp. 518–529 DOI: 10/ghjj4c
- “Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair” In IEEE/ACM Int’l Conf. Autom. Softw. Eng., ASE ’20 New York, NY, USA: ACM, 2020, pp. 981–992 DOI: 10/gjqxpp
- He Ye, Matias Martinez and Martin Monperrus “Neural Program Repair with Execution-Based Backpropagation” In Int’l Conf. Softw. Eng., ICSE ’22 New York, NY, USA: ACM, 2022, pp. 1506–1518 DOI: 10/gqrnmm
- Zimin Chen, Steve Kommrusch and Martin Monperrus “Neural Transfer Learning for Repairing Security Vulnerabilities in C Code” In IEEE Trans. Softw. Eng. 49.1, 2023, pp. 147–165 DOI: 10/gpg8qw
- “On Learning Meaningful Code Changes Via Neural Machine Translation” In Int’l Conf. Softw. Eng., 2019, pp. 25–36 DOI: 10/ggssjx
- “A Multi-Perspective Architecture for Semantic Code Search” In Annual Meeting of the Association for Computational Linguistics Online: ACL, 2020, pp. 8563–8568 DOI: 10/gr6ngs
- “CoSQA: 20,000+ Web Queries for Code Search and Question Answering” arXiv, 2021 DOI: 10.48550/arXiv.2105.13239
- Ziyu Yao, Jayavardhan Reddy Peddamail and Huan Sun “CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning” In Int’l World Wide Web Conf., WWW ’19 New York, NY, USA: ACM, 2019, pp. 2203–2214 DOI: 10/ghcqx5
- Geert Heyman and Tom Van Cutsem “Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent” arXiv, 2020 DOI: 10.48550/arXiv.2008.12193
- “CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees” arXiv, 2021 DOI: 10.48550/arXiv.2108.12987
- “Improved Code Summarization via a Graph Neural Network” arXiv, 2020 DOI: 10.48550/arXiv.2004.02843
- “Improved Automatic Summarization of Subroutines via Attention to File Context” In Int’l Conf. Mining Softw. Repositories Seoul Republic of Korea: ACM, 2020, pp. 300–310 DOI: 10/gmf67z
- “A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts” arXiv, 2021 DOI: 10.48550/arXiv.2103.07164
- “Unsupervised Translation of Programming Languages” arXiv, 2020 DOI: 10.48550/arXiv.2006.03511
- “CodeQA: A Question Answering Dataset for Source Code Comprehension” arXiv, 2021 DOI: 10.48550/arXiv.2109.08365
- “StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow” In Int’l World Wide Web Conf. Lyon, France: ACM, 2018, pp. 1693–1703 DOI: 10/gkh84z
- Yi Li, Shaohua Wang and Tien N. Nguyen “Vulnerability Detection with Fine-Grained Interpretations” In ACM J. Meeting Eur. Softw. Eng. Conf. and Symp. Found. Softw. Eng., ESEC/FSE 2021 New York, NY, USA: ACM, 2021, pp. 292–303 DOI: 10/gmvfdr
- “VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection” arXiv, 2022 DOI: 10.48550/arXiv.2205.12424
- “DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network” In ACM Trans. Softw. Eng. and Methodology 30.3, 2021, pp. 1–33 DOI: 10/gk52pz
- “Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks” In Proceedings of the ACM on Progr. Languages 3.OOPSLA, 2019, pp. 1–30 DOI: 10/gg3j6n
- “Cross-Project Transfer Representation Learning for Vulnerable Function Discovery” In IEEE Trans. Industrial Informatics 14.7, 2018, pp. 3289–3297 DOI: 10/gdwfhd
- “Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection” In IEEE Trans. Information Forensics and Security 16, 2021, pp. 1943–1958 DOI: 10/gkgf4k
- “DeepBugs: A Learning Approach to Name-Based Bug Detection” In Proceedings of the ACM on Progr. Languages 2.OOPSLA, 2018, pp. 1–25 DOI: 10/ggwxh2
- “Deep Learning Based Vulnerability Detection: Are We There Yet?” arXiv, 2020 DOI: 10.48550/arXiv.2009.07235
- “Automating Just-in-Time Comment Updating” In Int’l Conf. Autom. Softw. Eng. Virtual Event Australia: ACM, 2020, pp. 585–597 DOI: 10/gjqxnb
- “Deep Just-In-Time Inconsistency Detection Between Comments and Source Code” arXiv, 2020 DOI: 10.48550/arXiv.2010.01625
- “DeepCommenter: A Deep Code Comment Generation Tool with Hybrid Lexical and Syntactical Information” In ACM J. Meeting Eur. Softw. Eng. Conf. and Symp. Found. Softw. Eng. Virtual Event USA: ACM, 2020, pp. 1571–1575 DOI: 10/ghncfw
- “A General Path-Based Representation for Predicting Program Properties” arXiv, 2018 DOI: 10.48550/arXiv.1803.09544
- “Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python” In Int’l Conf. Softw. Eng. New York, NY, USA: ACM, 2022, pp. 2241–2252 DOI: 10/grnbp3
- Veselin Raychev, Martin Vechev and Andreas Krause “Predicting Program Properties from ”Big Code”” In Symp. Princ. Prog. Lang. 50 ACM, 2015, pp. 111–124 DOI: 10/f7dbgv
- Rabee Sohail Malik, Jibesh Patra and Michael Pradel “NL2Type: Inferring JavaScript Function Types from Natural Language Information” In Int’l Conf. Softw. Eng. IEEE, 2019, pp. 304–315 DOI: 10/gg3j6v
- “Deep Learning Type Inference” In Joint Eur. Softw. Eng. Conf. and Symp. Found. Softw. Eng. New York, NY, USA: ACM, 2018, pp. 152–162 DOI: 10/gf8npn
- “LambdaNet: Probabilistic Type Inference Using Graph Neural Networks” arXiv, 2020 DOI: 10.48550/arXiv.2005.02161
- “Suggesting Accurate Method and Class Names” In Joint Eur. Softw. Eng. Conf. and Symp. Found. Softw. Eng. ACM, 2015, pp. 38–49 DOI: 10/gf8np5
- “ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation” arXiv, 2021 DOI: 10.48550/arXiv.2104.08006
- “CodeBERT: A Pre-Trained Model for Programming and Natural Languages” arXiv, 2020 DOI: 10.48550/arXiv.2002.08155
- “DOBF: A Deobfuscation Pre-Training Objective for Programming Languages” arXiv, 2021 DOI: 10.48550/arXiv.2102.07492
- “CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation” arXiv, 2021 DOI: 10.48550/arXiv.2109.00859
- “Unified Pre-training for Program Understanding and Generation” arXiv, 2021 DOI: 10.48550/arXiv.2103.06333
- “Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks” arXiv, 2021 DOI: 10.48550/arXiv.2102.02017
- “GraphCodeBERT: Pre-training Code Representations with Data Flow” arXiv, 2021 DOI: 10.48550/arXiv.2009.08366
- “CodeTrans: Towards Cracking the Language of Silicon’s Code Through Self-Supervised Deep Learning and High Performance Computing” arXiv, 2021 DOI: 10.48550/arXiv.2104.02443
- “Global Relational Models of Source Code” In Int’l Conf. Learning Representations, 2022
- Nelson Tavares de Sousa and Wilhelm Hasselbring “JavaBERT: Training a Transformer-Based Model for the Java Programming Language” arXiv, 2021 DOI: 10.48550/arXiv.2110.10404
- “Code2vec: Learning Distributed Representations of Code” In Princ. Prog. Lang. ACM, 2019, pp. 1–29 DOI: 10/ggssk3
- “Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code” In ACM/IEEE 42nd Int’l Conf. Softw. Eng., ICSE ’20 New York, NY, USA: ACM, 2020, pp. 1073–1085 DOI: 10/ghjj45
- “GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses” In Int’l Conf. Mining Softw. Repositories Pittsburgh Pennsylvania: ACM, 2022, pp. 524–536 DOI: 10/gq9r39
- “SPT-code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations” In Int’l Conf. Softw. Eng. Pittsburgh Pennsylvania: ACM, 2022, pp. 2006–2018 DOI: 10/gqsgnq
- “CoTexT: Multi-task Learning with Code-Text Transformer” arXiv, 2021 DOI: 10.48550/arXiv.2105.08645
- “Learning and Evaluating Contextual Embedding of Source Code” In Int’l Conf. Machine Learning PMLR, 2020, pp. 5110–5121
- Disha Shrivastava, Hugo Larochelle and Daniel Tarlow “On-the-Fly Adaptation of Source Code Models” In NeurIPS 2020 Ws. Comp.-Assisted Prog., 2020
- “CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation” arXiv, 2021 DOI: 10.48550/arXiv.2102.04664
- “Contrastive Code Representation Learning” In Conf. Empirical Methods in Natural Lang. Processing OnlinePunta Cana, Dominican Republic: ACL, 2021, pp. 5954–5971 DOI: 10/gqc5ts
- “Language-Agnostic Representation Learning of Source Code from Structure and Context” arXiv, 2021 DOI: 10.48550/arXiv.2103.11318
- Noam Yefet, Uri Alon and Eran Yahav “Adversarial Examples for Models of Code” In Proceedings of the ACM on Progr. Languages, 2020, pp. 1–30 DOI: 10/ghn4mz
- “Embedding Java Classes with Code2vec: Improvements from Variable Obfuscation” In Int’l Conf. Mining Softw. Repositories, MSR ’20 New York, NY, USA: ACM, 2020, pp. 243–253 DOI: 10/ghn4m9
- “Code2seq: Generating Sequences from Structured Representations of Code” arXiv, 2019 DOI: 10.48550/arXiv.1808.01400
- “Semantic Source Code Models Using Identifier Embeddings” arXiv, 2019 DOI: 10.48550/arXiv.1904.06929
- Loïc Lannelongue, Jason Grealey and Michael Inouye “Green Algorithms: Quantifying the Carbon Footprint of Computation” In Adv. Science 8.12, 2021, pp. 2100707 DOI: 10/gm2snp
- “Quantifying the Carbon Emissions of Machine Learning” arXiv, 2019 DOI: 10.48550/arXiv.1910.09700
- Alexandra Sasha Luccioni, Sylvain Viguier and Anne-Laure Ligozat “Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model” arXiv, 2022 DOI: 10.48550/arXiv.2211.02001
- “A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model” In BigScience Episode #5 – Ws. Challenges & Perspectives in Creating Large Lang. Models virtual+Dublin: ACL, 2022, pp. 84–94 DOI: 10/grq5pm
- Lorenzo Posani, Alessio Paccoia and Marco Moschettini “The Carbon Footprint of Distributed Cloud Storage” arXiv, 2019 DOI: 10.48550/arXiv.1803.06973
- “Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport” In Proceedings of the IEEE 99.1, 2011, pp. 149–167 DOI: 10/dcrw65
- Lorenz M Hilty and Wolfgang Lohmann “The Five Most Neglected Issues in ”Green IT”” In CEPIS Upgrade 12.4, 2011, pp. 5
- “CodeBERT: A Pre-Trained Model for Programming and Natural Languages” In Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1536–1547 DOI: 10/gj58gj
- “A Map of Threats to Validity of Systematic Literature Reviews in Software Engineering” In Asia-Pacific Softw. Eng. Conf., 2016, pp. 153–160 DOI: 10/gm9x9q
- “A Survey of App Store Analysis for Software Engineering” In IEEE Trans. Softw. Eng. 43.9, 2017, pp. 817–847 DOI: 10/grspp9
- Georgios Kalaitzoglou, Magiel Bruntink and Joost Visser “A Practical Model for Evaluating the Energy Efficiency of Software Applications” In Int’l Conf. ICT for Sustainability, 2014 DOI: 10/grbcv3
- “Improving Reproducibility in Machine Learning Research (a Report from the NeurIPS 2019 Reproducibility Program)” In J. Machine Learning Research 22.1, 2022, pp. 164:7459–164:7478
- “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In Conf. Fairness, Accountability, and Transparency Virtual Event Canada: ACM, 2021, pp. 610–623 DOI: 10/gh677h
- Max Hort (10 papers)
- Anastasiia Grishina (8 papers)
- Leon Moonen (23 papers)