Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees (2108.12987v2)

Published 30 Aug 2021 in cs.SE

Abstract: Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://anonymous.4open.science/r/CAST/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ensheng Shi (16 papers)
  2. Yanlin Wang (76 papers)
  3. Lun Du (50 papers)
  4. Hongyu Zhang (147 papers)
  5. Shi Han (74 papers)
  6. Dongmei Zhang (193 papers)
  7. Hongbin Sun (119 papers)
Citations (55)

Summary

We haven't generated a summary for this paper yet.