Teaching Machines to Code: Smart Contract Translation with LLMs (2403.09740v1)
Abstract: The advent of LLMs has marked a significant milestone in the realm of artificial intelligence, with their capabilities often matching or surpassing human expertise in various domains. Among these achievements, their adeptness in translation tasks stands out, closely mimicking the intricate and preliminary processes undertaken by human translators to ensure the fidelity and quality of the translated content. Despite the advancements in utilizing LLMs for translating programming code across different languages, the domain of smart contract translation, particularly into languages not previously encountered by the LLM, remains largely unexplored. In our research, we present a pioneering approach, SolMover, which harnesses the synergy of two distinct LLMs within a unified framework. This framework is designed to grasp coding principles and apply this understanding to the translation of code into an unfamiliar language. Our study delves into the capacity of LLMs to mimic human learning processes, offering an in-depth evaluation of our methodology for converting smart contracts written in Solidity to Move, a language with limited resources. The framework employs one LLM to decipher coding conventions for the new language, creating a blueprint for the second LLM, which, lacking planning abilities, possesses coding expertise. The empirical evidence from our experiments suggests that SolMover substantially enhances performance compared to gpt-3.5-turbo-1106, and achieves superior results over competitors such as Palm2 and Mixtral-8x7B-Instruct. Additionally, our analysis highlights the efficacy of our bug mitigation strategy in elevating code quality across all models, even outside the SolMover framework.
- Holistic evaluation of language models. ArXiv preprint, abs/2211.09110, 2022. URL https://arxiv.org/abs/2211.09110.
- Sparks of artificial general intelligence: Early experiments with gpt-4. ArXiv preprint, abs/2303.12712, 2023. URL https://arxiv.org/abs/2303.12712.
- Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark. ArXiv preprint, abs/2303.13648, 2023. URL https://arxiv.org/abs/2303.13648.
- Boosting theory-of-mind performance in large language models via prompting. ArXiv preprint, abs/2304.11490, 2023. URL https://arxiv.org/abs/2304.11490.
- Is chatgpt a good translator? a preliminary study. In ArXiv, 2023.
- In-context examples selection for machine translation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada, 2023. Association for Computational Linguistics. doi:10.18653/v1/2023.findings-acl.564. URL https://aclanthology.org/2023.findings-acl.564.
- Prompting large language model for machine translation: A case study. ArXiv preprint, abs/2301.07069, 2023a. URL https://arxiv.org/abs/2301.07069.
- Prompting palm for translation: Assessing strategies and performance. ArXiv preprint, abs/2211.09102, 2022. URL https://arxiv.org/abs/2211.09102.
- Chain-of-dictionary prompting elicits translation in large language models. ArXiv preprint, abs/2305.06575, 2023. URL https://arxiv.org/abs/2305.06575.
- Yehoshua Bar-Hillel. A demonstration of the nonfeasibility of fully automatic high quality translation. Advances in computers, 1:158–163, 1960.
- Elliott Macklovitch. The future of mt is now and bar-hillel was (almost entirely) right. In Proceedings of the Fourth Bar-Ilan Symposium on the Foundations of Artificial Intelligence. url: http://rali. iro. umontreal. ca/Publications/urls/bisfai95. ps, 1995.
- Jeff Benson. Uniswap trading volume exploded by 450% to $7 billion. here’s why, 2021. URL https://decrypt.co/63280/uniswap-trading-volume-exploded-7-billion-heres-why.
- Lessons learned from blockchain applications of trusted execution environments and implications for future research. In Workshop on Hardware and Architectural Support for Security and Privacy, HASP ’21, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450396141. doi:10.1145/3505253.3505259. URL https://doi.org/10.1145/3505253.3505259.
- Defaas: Decentralized function-as-a-service for emerging dapps and web3. In 2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 1–3, 2023a. doi:10.1109/ICBC56567.2023.10174945.
- Who is smarter? an empirical study of ai-based smart contract creation. In 2023 5th Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS), pages 1–8, 2023b. doi:10.1109/BRAINS59668.2023.10316829.
- The move prover. In Shuvendu K. Lahiri and Chao Wang, editors, Computer Aided Verification, pages 137–150, Cham, 2020. Springer International Publishing. ISBN 978-3-030-53288-8.
- Who is smarter? an empirical study of ai-based smart contract creation. In 2023 5th Conference on Blockchain Research & Applications for Innovative Networks and Services (BRAINS), pages 1–8. IEEE, 2023c.
- Daniel Gile. Basic concepts and models for interpreter and translator training. Number 8 in Benjamins Translation Library. John Benjamins, Amsterdam, 2009.
- Textbooks are all you need. arXiv preprint arXiv:2306.11644, 2023.
- Reaugkd: Retrieval-augmented knowledge distillation for pre-trained language models. 2023b.
- Reacc: A retrieval-augmented code completion framework. arXiv preprint arXiv:2203.07722, 2022.
- Introduction - the move book, 2023. URL https://move-language.github.io/move/.
- The move language - the move book, 2022. URL https://move-book.com/.
- Introduction - move patterns: Design patterns for resource based programming, 2023a. URL https://www.move-patterns.com/.
- move · github, 2023b. URL https://github.com/move-language/move/tree/main/language/documentation/tutorial.
- Sui basics - sui move by example, 2023. URL https://examples.sui.io/basics/index.html.
- hf-codegen, 2023. URL https://github.com/sayakpaul/hf-codegen/blob/main/data/prepare_dataset.py.
- Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- ShareGPT. Sharegpt: Share your wildest chatgpt conversations with one click., 2023. Available at: https://sharegpt.com/.
- Josef Urban. Mptp–motivation, implementation, first experiments. Journal of Automated Reasoning, 33:319–339, 2004.
- fungible tokens, 2023. URL https://github.com/MystenLabs/sui/tree/main/sui_programmability/examples/fungible_tokens.
- move/language/documentation/examples/experimental/basic-coin at main · move-language/move · github, 2023. URL https://github.com/move-language/move/tree/main/language/documentation/examples/experimental/basic-coin.
- Diem, 2023. URL https://github.com/0LNetworkCommunity/libra-legacy-v6/blob/main/language/diem-framework/modules/Diem.move.
- Token, 2023. URL https://github.com/starcoinorg/starcoin-framework/blob/main/sources/Token.move.
- Gas, 2023. URL https://github.com/0LNetworkCommunity/libra-legacy-v6/blob/main/language/diem-framework/modules/0L/GAS.move.
- Nft, 2023. URL https://github.com/MystenLabs/sui/tree/main/sui_programmability/examples/nfts.
- starcoin-framework/sources/merklenft.move at main · starcoinorg/starcoin-framework · github, 2023. URL https://github.com/starcoinorg/starcoin-framework/blob/main/sources/MerkleNFT.move.
- Defi, 2023. URL https://github.com/MystenLabs/sui/tree/main/sui_programmability/examples/defi.
- move/language/documentation/examples/experimental/coin-swap at main · move-language/move · github, 2023. URL https://github.com/move-language/move/tree/main/language/documentation/examples/experimental/coin-swap.
- Github - elements-studio/starswap-core: The swap project on starcoin such as uniswap a sushiswap, 2023. URL https://github.com/Elements-Studio/starswap-core.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Can you feel the moe? mixtral available with over 100 tokens per second through together platform!, 2023. URL https://www.together.ai/blog/mixtral.
- How good are gpt models at machine translation? a comprehensive evaluation. ArXiv preprint, abs/2302.09210, 2023. URL https://arxiv.org/abs/2302.09210.
- py2java: Python to Java Language Translator. https://pypi.org/project/py2java/.
- Troy Melhase et al. java2python: Simple but Effective Tool to Translate Java Source Code into Python. https://github.com/natural/java2python.
- Lexical Statistical Machine Translation for Language Migration. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering, pages 651–654, 2013.
- Phrase-based Statistical Translation of Programming Languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, pages 173–184, 2014.
- Using Machine Translation for Converting Python 2 to Python 3 Code. Technical report, PeerJ PrePrints, 2015.
- Dominik Schultes. SequalsK – A Bidirectional Swift-Kotlin-Transpiler. In 2021 IEEE/ACM 8th International Conference on Mobile Software Engineering and Systems (MobileSoft), pages 73–83. IEEE, 2021.
- Swift: The Powerful Programming Language that is Also Easy to Learn. https://developer.apple.com/swift/.
- Kotlin Programming Language: Concise. Cross-platform. Fun. https://kotlinlang.org/.
- In Rust We Trust: A Transpiler from Unsafe C to Safer Rust. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pages 354–355, 2022.
- Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547, Online, November 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.findings-emnlp.139.
- CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint arXiv:2102.04664, 2021.
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
- Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2668, Online, June 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.naacl-main.211. URL https://aclanthology.org/2021.naacl-main.211.
- Tree-to-Tree Neural Networks for Program Translation. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
- TransCoder-ST: Leveraging Automated Unit Tests for Unsupervised Code Translation. In International Conference on Learning Representations (ICLR), 2022. URL https://openreview.net/forum?id=cmt-6KtR4c4.
- Compilable Neural Code Generation with Compiler Feedback. In Findings of the Association for Computational Linguistics: ACL 2022, pages 9–19, 2022.
- Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps. In 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pages 170–177. IEEE, 2021.
- Measuring the impact of programming language distribution. In International Conference on Machine Learning, pages 26619–26645. PMLR, 2023.
- Rabimba Karanjai (18 papers)
- Lei Xu (172 papers)
- Weidong Shi (42 papers)