Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Priority Sampling of Large Language Models for Compilers (2402.18734v1)

Published 28 Feb 2024 in cs.LG, cs.CL, and cs.PF

Abstract: LLMs show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Fixing hardware security bugs with large language models. arXiv preprint arXiv:2302.01215, 2023.
  2. SantaCoder: don’t reach for the stars! arXiv:2301.03988, 2023.
  3. Learning C to x86 Translation: An Experiment in Neural Compilation. arXiv:2108.07639, 2021.
  4. Mirostat: A neural text decoding algorithm that directly controls perplexity. arXiv preprint arXiv:2007.14966, 2020.
  5. Evaluating Large Language Models Trained on Code. arXiv:2107.03374, 2021.
  6. Kyunghyun Cho. Noisy parallel approximate decoding for conditional recurrent language model. arXiv preprint arXiv:1605.03835, 2016.
  7. Large language models for compiler optimization. arXiv preprint arXiv:2309.07062, 2023.
  8. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In ISSTA, 2023.
  9. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018.
  10. Emil Julius Gumbel. Statistical theory of extreme valuse and some practical applications. Nat. Bur. Standards Appl. Math. Ser. 33, 1954.
  11. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
  12. Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In International Conference on Machine Learning, pages 3499–3508. PMLR, 2019.
  13. Unsupervised Translation of Programming Languages. arXiv:2006.03511, 2020.
  14. Implicit unlikelihood training: Improving neural text generation with reinforcement learning. arXiv preprint arXiv:2101.04229, 2021.
  15. On sampling top-k recommendation evaluation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2114–2124, 2020.
  16. StarCoder: may the source be with you! arXiv:2305.06161, 2023.
  17. Competition-level code generation with AlphaCode. Science, 378(6624), 2022.
  18. Determinantal beam search. arXiv preprint arXiv:2106.07400, 2021.
  19. Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666, 2022.
  20. OpenAI. GPT-4 Technical Report. arXiv:2303.08774, 2023.
  21. Conformal nucleus sampling. arXiv preprint arXiv:2305.02633, 2023.
  22. Code Llama: Open Foundation Models for Code. arXiv:2308.12950, 2023.
  23. Adaptive Test Generation Using a Large Language Model. arXiv:2302.06527, 2023.
  24. Incremental sampling without replacement for sequence models. In International Conference on Machine Learning, pages 8785–8795. PMLR, 2020.
  25. Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424, 2016.
  26. Arithmetic sampling: parallel diverse decoding for large language models. In International Conference on Machine Learning, pages 35120–35136. PMLR, 2023.
  27. Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319, 2019.
  28. Efficient guided generation for large language models. arXiv e-prints, pages arXiv–2307, 2023.
  29. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, 2023.
  30. Self-evaluation guided beam search for reasoning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  31. A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56(3):1–37, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Dejan Grubisic (6 papers)
  2. Chris Cummins (23 papers)
  3. Volker Seeker (6 papers)
  4. Hugh Leather (23 papers)
Citations (3)

Summary

Priority Sampling of LLMs for Compilers: An Overview

The paper, "Priority Sampling of LLMs for Compilers," introduces a novel deterministic sampling technique called Priority Sampling for enhancing the performance of LLMs in code generation and optimization tasks. LLMs have demonstrated substantial efficiency across various software engineering applications, such as code generation, translation, bug detection, and documentation. However, the efficacy of such models is often dependent on the sampling techniques employed during code generation. Traditional methods like Nucleus Sampling have limitations, including the necessity to tune temperature coefficients for specific contexts, leading to repetitive or incoherent samples. These limitations motivate the need for more refined sampling techniques to boost performance and sample uniqueness, which this paper addresses through Priority Sampling.

Priority Sampling is designed to produce unique and reliable samples based on model confidence. This deterministic method explores the search tree by expanding paths with the highest token probability. It eschews temperature tuning, offering a more straightforward and predictable sampling process. The method also supports regular expression constraints, ensuring structured and semantically valid code generation, which is particularly beneficial for tasks like compiler optimization.

Key Findings and Results

The empirical results presented validate the superiority of Priority Sampling over traditional techniques. When applied to optimizing LLVM passes—a task where the model predicts optimization strategies matching those from a long-running autotuner—the algorithm demonstrated significant advances. Priority Sampling consistently outperformed Nucleus Sampling across varying sample counts, markedly improving the model's effectiveness from a 2.87% to a 5% improvement over the baseline -Oz optimization level with just 30 samples. Furthermore, Priority Sampling even surpassed the performance of the autotuner used as a benchmark to set training labels, showcasing its capacity to generate novel and efficient optimization sequences.

A noteworthy aspect of the experimental evaluation is Priority Sampling's efficiency in sample usage. The technique reaches 91% of the autotuner's improvement with merely five samples, underscoring its sample efficiency. This result is striking, as the autotuner requires extensive exploration of potential optimization passes, whereas Priority Sampling quickly attains comparable performance with minimal sample input.

Algorithmic Insights

The Priority Sampling algorithm relies on constructing a search tree where paths are expanded based on the model's confidence in the token sequences. By maintaining a priority queue, the algorithm judiciously selects which node to expand next, effectively balancing exploration with exploitation. This approach contrasts sharply with stochastic sampling methods that rely on probabilistic token selection, thereby avoiding redundancy and ensuring diversity. The control offered by regular expressions provides an additional layer of verifiability to the generated code, which is critical in constrained domains like compiler optimization.

The algorithm has computational complexity aligned with typical sampling methods, i.e., O(T*(inference + Klog(V))), making it competitive in terms of efficiency while offering benefits in sample diversity and determinism. The memory overhead is minimized by maintaining a constant-sized priority queue, enhancing its applicability in real-world scenarios where resource constraints may be a consideration.

Implications and Future Directions

The Priority Sampling approach has several practical and theoretical implications. Practically, its application in compiler optimization indicates significant potential for improving the generalization abilities of LLMs in performance-sensitive domains. Theoretically, the technique raises interesting questions about the nature and structure of knowledge encoded in LLMs. The results suggest that comprehensive exploration techniques like Priority Sampling can unlock and leverage latent knowledge within these models, which previously required extensive fine-tuning to harness.

Future research could explore the integration of Priority Sampling with other structured generation techniques to enhance its utility across different domains of AI. Additionally, examining the effect of different model architectures and configurations on Priority Sampling's efficacy could offer further insights. Lastly, approaches to parallelize or otherwise accelerate the algorithm without sacrificing the determinism and uniqueness guarantees remain a compelling area for further investigation.

In conclusion, Priority Sampling provides a compelling alternative to traditional sampling methods, with distinct advantages in producing structured, diverse, and performant outputs from LLMs. Its implementation and successes in the domain of compiler optimization highlight an avenue for broader application and exploration in AI-driven code generation tasks.

Youtube Logo Streamline Icon: https://streamlinehq.com