- The paper presents an innovative DSL that streamlines mapper generation by abstracting complex C++ APIs and reducing low-level code complexity.
- LLM-driven optimization iteratively refines mapping strategies, achieving performance improvements up to 1.34x in scientific and matrix multiplication tasks.
- Experimental results confirm that the automated mappers consistently match or surpass expert-designed solutions, underscoring the potential for broader AI-driven system design.
The paper presents a novel approach to optimizing parallel program performance by automating the generation of mappers through a domain-specific language (DSL) and LLM optimizers. This work addresses the complexities associated with task-based programming, where efficient mapping of computations to processors and data to memory is crucial yet labor-intensive.
Key Contributions
- Domain-Specific Language (DSL) for Mapper Generation: The paper introduces a DSL that abstracts away intricate low-level C++ APIs, simplifying the code generation process for LLMs. This DSL reduces code complexity significantly, allows high-level mappings, and structures the search space for code generation.
- LLM-Driven Optimization: By framing mapper generation as a discrete optimization problem, the authors leverage LLMs to explore the vast search space of potential mappings. The generated mappers are optimized iteratively using feedback from system performance evaluations.
- Performance Improvements: Experimental results demonstrate that LLM-generated mappers can outperform expert-designed mappers, achieving up to 1.34x speedup in scientific applications and up to 1.31x in parallel matrix multiplication algorithms.
Methodology
The paper highlights two main challenges: generating syntactically correct mapper code and finding optimal mapping strategies. The DSL is designed to enable LLMs to effectively generate mapping code by encapsulating critical mapping decisions. These include processor selection, memory placement, data layout, and index mapping.
The paper uses an agent-based system implemented with Trace, allowing feedback-driven iterative refinement of DSL mappers. This system feeds performance metrics back to the LLM optimizer, enabling the refinement of mappers under a structured search space defined by the DSL.
Experimental Validation
Experiments involved multiple applications, including circuit simulation and matrix multiplication algorithms. The DSL reduced the lines of code required from hundreds to tens, proving advantageous over traditional C++ implementations. The generated mappers consistently matched or surpassed the performance of expert mappers.
The authors also conducted an ablation paper on feedback types, reinforcing that high-quality feedback significantly guides LLM optimizers toward efficient mappers.
Implications and Future Work
The paper suggests that LLM-based optimization using a DSL can significantly reduce the workload of performance engineers while achieving substantial performance gains. This opens avenues for broader application of LLM optimizers in complex system design challenges beyond parallel programming, such as automatic tuning in various software systems.
Future directions could include refining the DSL for broader applicability, exploring more optimization techniques within LLM frameworks, and extending the approach to other domains that require software optimization.
Conclusion
This work advances the field by integrating DSL and LLM technologies to automate and optimize parallel program performance, demonstrating practical effectiveness in complex computational tasks. The approach not only alleviates the tediousness of manual mapper coding but also explores the potential of LLMs to automate sophisticated system-level code generation tasks, setting the stage for further innovations in AI-driven system design.