Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Design and Implementation of Code Completion System Based on LLM and CodeBERT Hybrid Subsystem (2509.08215v1)

Published 10 Sep 2025 in cs.DC

Abstract: In the rapidly evolving industry of software development, coding efficiency and accuracy play significant roles in delivering high-quality software. Various code suggestion and completion tools, such as CodeBERT from Microsoft and GPT-3.5 from OpenAI, have been developed using deep learning techniques and integrated into IDEs to assist software engineers' development. Research has shown that CodeBERT has outstanding performance in code summarization and capturing code semantics, while GPT-3.5 demonstrated its adept capability at code generation. This study focuses on implementing a hybrid model that integrates CodeBERT and GPT-3.5 models to accomplish code suggestion and autocomplete tasks, leveraging the context-aware effectiveness of CodeBERT and taking advantage of advanced code generation abilities of GPT-3.5. Evaluated in three main metrics: accuracy, quality of generated code and performance efficiency with various software and hardware, the hybrid model outperforms benchmarks, demonstrating its feasibility and effectiveness. Robustness testing further confirms the reliability and stability of the hybrid model. This study not only emphasizes the importance of deep learning in the software development industry, but also reveals the potential of synthesizing complementary deep learning models to fully exploit strengths of each model.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates that fusing CodeBERT and GPT-3.5 yields a 13.75% improvement in F1-score for code token prediction.
  • It leverages a dynamic, attention-based fusion layer to efficiently balance contextual encoding with generative synthesis.
  • Empirical results confirm superior code executability and robustness, supporting its practical integration into modern IDEs.

Hybrid Code Completion via CodeBERT and GPT-3.5: System Design and Empirical Evaluation

Introduction

The paper presents a hybrid code completion system that integrates CodeBERT and GPT-3.5, targeting enhanced code suggestion and autocompletion in software development environments. The motivation stems from the complementary strengths of CodeBERT—contextual code understanding—and GPT-3.5—generative code synthesis. The system is evaluated on the CodeXGLUE Python code completion benchmark, with a focus on accuracy, code generation quality, efficiency, and robustness. The results demonstrate that the hybrid approach yields superior performance compared to standalone models.

System Architecture

The proposed architecture consists of two primary subsystems:

  • CodeBERT Encoder: Responsible for contextual encoding and semantic feature extraction from code snippets. CodeBERT is pre-trained on large-scale code corpora and fine-tuned for code completion, utilizing a 12-layer transformer encoder (hidden size 768, 12 attention heads).
  • GPT-3.5 Generator: Functions as the backend generative model, performing autoregressive code generation conditioned on features from CodeBERT.

A feature fusion layer is introduced to combine the outputs of both models. The fusion mechanism employs a learnable parameter a∈[0,1]a \in [0,1] to dynamically weight the contributions of CodeBERT and GPT-3.5 features, adapting to the input context. The fusion is further enhanced by a dynamic attention mechanism, allowing the system to modulate feature importance based on code context.

The training pipeline is staged: CodeBERT is first fine-tuned, followed by training the fusion layer, and finally joint optimization of the entire system. Cross-entropy loss is used for optimization, and inference is performed via conditional probability over the next token, leveraging the fused hidden state.

Experimental Setup

  • Dataset: The CodeXGLUE Python code completion dataset, filtered and preprocessed for quality and diversity.
  • Hardware: NVIDIA A40 GPU, ensuring sufficient computational resources for large-scale model training and inference.
  • Software: Stable versions of deep learning frameworks and dependencies to guarantee reproducibility.

Evaluation Metrics

The system is evaluated using a multi-dimensional metric suite:

  • Accuracy: Standard classification accuracy for code token prediction.
  • BLEU Score: Measures n-gram overlap between generated and reference code, assessing generation quality.
  • Code Executability: Fraction of generated code that executes without errors.
  • Semantic Consistency: Degree to which generated code preserves intended semantics.
  • Average Response Time (ART): Real-time inference latency.
  • Inference Speed: Tokens generated per second.
  • Memory Usage: GPU memory footprint during inference.
  • Robustness: Performance under normal, noisy, incomplete, and abnormal input scenarios.

Empirical Results

Accuracy and Generation Quality

The hybrid model achieves an F1-Score of 0.91, representing a 13.75% improvement over the baseline. BLEU score reaches 0.8, with code executability at 0.92 and semantic consistency at 0.88. These results indicate that the hybrid system not only predicts the correct tokens with high precision and recall but also generates code that is both syntactically valid and semantically faithful.

Efficiency and Resource Utilization

The hybrid model maintains an average response time of 68ms and an inference speed of 213 tokens/s. While memory usage increases compared to single-model baselines, the system remains suitable for practical deployment in development environments where low latency is critical.

Robustness

Robustness testing demonstrates that the hybrid model sustains high accuracy and stability across diverse input conditions, including noisy and incomplete code. The system's recovery ability and stability index remain above 0.84 in all scenarios, confirming its generalization and operational reliability.

Implementation Considerations

  • Model Fusion: The learnable fusion parameter aa and dynamic attention mechanism are critical for balancing the contributions of CodeBERT and GPT-3.5. Careful tuning of aa is necessary to avoid overfitting to either model's biases.
  • Resource Requirements: The hybrid approach incurs higher memory usage due to the simultaneous operation of two large models. Deployment in resource-constrained environments may require model distillation or pruning.
  • Scalability: The system is designed for GPU acceleration, but further optimization (e.g., quantization, mixed-precision inference) may be required for large-scale IDE integration.
  • Extensibility: The architecture is modular, allowing for substitution of backend models (e.g., replacing GPT-3.5 with more recent LLMs) or integration of additional modalities (e.g., AST-based features).

Implications and Future Directions

The results substantiate the claim that hybridizing context-aware and generative models yields measurable gains in code completion tasks. This approach is particularly effective in scenarios where both semantic understanding and generative flexibility are required. The demonstrated robustness and efficiency suggest practical viability for integration into modern IDEs and developer tools.

Future research may explore:

  • Adaptive Model Selection: Dynamically routing inputs to the most suitable subsystem based on code context or complexity.
  • Model Compression: Reducing the computational footprint of the hybrid system for edge deployment.
  • Cross-Language Generalization: Extending the approach to multi-language codebases and evaluating transferability.
  • Human-in-the-Loop Feedback: Incorporating developer feedback to further refine model outputs and adapt to evolving coding standards.

Conclusion

The hybrid code completion system combining CodeBERT and GPT-3.5 demonstrates superior performance across accuracy, code quality, efficiency, and robustness metrics on the CodeXGLUE benchmark. The feature fusion and dynamic attention mechanisms are instrumental in leveraging the complementary strengths of both models. The findings highlight the practical and theoretical value of model fusion in code intelligence, setting a foundation for further advancements in AI-assisted software development.

Youtube Logo Streamline Icon: https://streamlinehq.com