Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees (2201.07984v4)

Published 20 Jan 2022 in cs.AI and cs.LG

Abstract: Using the pre-trained LLMs to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these LLMs to solve programming language-related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three different downstream tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Rong Liang (2 papers)
  2. Tiehua Zhang (27 papers)
  3. Yujie Lu (42 papers)
  4. Yuze Liu (11 papers)
  5. Zhen Huang (114 papers)
  6. Xin Chen (457 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.